Deep Dive

In-depth technical articles on AI, GPU inference, and developer tools

AI Architecture

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...

2026-03-22 AI Architecture

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Introduction Built a local-first RAG research tool that runs entirely on a single GPU....

2026-03-22 AI Architecture

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...

2026-03-22 AI Architecture

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...

2026-03-21 AI Architecture

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

After watching the GTC 2026 keynote, what stayed with me wasn't Vera Rubin's "35x" number. It was the...

2026-03-21 AI Architecture

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Shogi — Japanese chess — has a thriving computer engine scene that most Western developers have never...

2026-03-21 AI Architecture

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

No piece of software has divided the Linux community more bitterly than systemd. Depending on who you...

2026-03-21 AI Architecture

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

Here's a pattern that will sound familiar: you carefully write a README with architecture decisions,...

2026-03-21 AI Architecture

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

If you've ever written WHERE column LIKE '%keyword%' on a table with more than a million rows, you...

2026-03-21 AI Architecture

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

SoyLM started as a simple idea: build a RAG (Retrieval-Augmented Generation) tool that runs entirely...

2026-03-21 AI Architecture

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

PatentLLM started as a free patent search tool. Making it a paid product meant answering one question...

2026-03-21 AI Architecture

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

I built a patent analysis pipeline that combines SQLite FTS5 search with Gemini's analytical...

2026-03-21 AI Architecture

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

I added a Flutter Web app to my phone's home screen, expecting a glorified bookmark. Instead, it...

2026-03-21 AI Architecture

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Every developer I know who tries Tailscale has the same reaction: "Wait, that's it? It just......

2026-03-21 AI Architecture

Building a 5-in-1 Local LLM App with Flutter Web and Flask

The idea started simple: build five small apps that demonstrate what a local LLM can do with private...

2026-03-21 AI Architecture

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

There's a particular kind of friction in database-driven development that most of us have learned to...

2026-03-21 AI Architecture

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

I recently scraped every unagi (eel) restaurant in Japan using the Google Places Text Search API. The...

2026-03-20 AI Architecture

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

Why This Comparison Exists I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090...

2026-03-14 AI Architecture

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Introduction As a practical testing ground for verifying reasoning optimization and model...

2026-03-14 AI Architecture

Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Introduction: Taking Back Control of the AI "Brain" For modern engineers, LLMs (Large...

2026-03-14 AI Architecture

I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

The Time I Posted a Patent Search Engine to Reddit r/LocalLLaMA and Received 65 Upvotes and...

2026-03-14 AI Architecture

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

Why Combine Them? When designing AI workloads, it is not easy to simultaneously satisfy...

2026-03-08 AI Architecture

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

What is Context Caching? Google Gemini's Context Caching is a feature that caches context...

2026-03-08 AI Architecture

Building a Free Research Agent with DuckDuckGo Search + Local LLM

Why DuckDuckGo + Local LLM? When conducting research, using paid APIs (such as Brave...

2026-03-08 AI Architecture

Building a 5-in-1 App with Local LLM and Flutter

Introduction "I want to leverage AI without sending data to the cloud." The biggest...

2026-03-08 AI Architecture

LoRA and FT Are Unnecessary: How to Approach Distilled Models

Introduction Fine-tuning (FT) a distilled model is either ineffective or leads to...

2026-03-08 AI Architecture

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese,...

2026-03-08 AI Architecture

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

What We Want to Achieve Traditional Minecraft bots primarily relied on command-based...

2026-03-08 AI Architecture

Using Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data

Using Local LLMs as a "Batch Processing Engine" — Designing Automated Artifact Generation...

2026-03-08 AI Architecture

Fast Searching 4 Million Patent Records with FTS5

Introduction: The Limitations of LIKE Search When searching for "battery" in PatentLLM's...

2026-03-08