Deep Dive
In-depth technical articles on AI, GPU inference, and developer tools
Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research
Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...
AI ArchitectureBuilt a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling
Introduction Built a local-first RAG research tool that runs entirely on a single GPU....
AI ArchitectureAI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures
AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...
AI ArchitectureAI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring
Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...
AI ArchitectureThe Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"
After watching the GTC 2026 keynote, what stayed with me wasn't Vera Rubin's "35x" number. It was the...
AI ArchitectureTraining a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate
Shogi — Japanese chess — has a thriving computer engine scene that most Western developers have never...
AI ArchitectureLennart Poettering and the systemd Wars: The Most Controversial Software in Linux History
No piece of software has divided the Linux community more bitterly than systemd. Depending on who you...
AI ArchitectureThe README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)
Here's a pattern that will sound familiar: you carefully write a README with architecture decisions,...
AI ArchitectureFrom 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records
If you've ever written WHERE column LIKE '%keyword%' on a table with more than a million rows, you...
AI ArchitectureSoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File
SoyLM started as a simple idea: build a RAG (Retrieval-Augmented Generation) tool that runs entirely...
AI ArchitectureAdding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan
PatentLLM started as a free patent search tool. Making it a paid product meant answering one question...
AI ArchitectureWhen Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline
I built a patent analysis pipeline that combines SQLite FTS5 search with Gemini's analytical...
AI ArchitectureFlutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App
I added a Flutter Web app to my phone's home screen, expecting a glorified bookmark. Instead, it...
AI ArchitectureTailscale Deep Dive: Why Developers Are Ditching Traditional VPNs
Every developer I know who tries Tailscale has the same reaction: "Wait, that's it? It just......
AI ArchitectureBuilding a 5-in-1 Local LLM App with Flutter Web and Flask
The idea started simple: build five small apps that demonstrate what a local LLM can do with private...
AI ArchitectureClaude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation
There's a particular kind of friction in database-driven development that most of us have learned to...
AI ArchitectureHow Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't
I recently scraped every unagi (eel) restaurant in Japan using the Google Places Text Search API. The...
AI ArchitecturevLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090
Why This Comparison Exists I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090...
AI ArchitectureWhat I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models
Introduction As a practical testing ground for verifying reasoning optimization and model...
AI ArchitectureTurn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit
Introduction: Taking Back Control of the AI "Brain" For modern engineers, LLMs (Large...
AI ArchitectureI Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions
The Time I Posted a Patent Search Engine to Reddit r/LocalLLaMA and Received 65 Upvotes and...
AI ArchitectureGemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM
Why Combine Them? When designing AI workloads, it is not easy to simultaneously satisfy...
AI ArchitectureReduce API Costs for Large-Scale Document Analysis with Gemini Context Caching
What is Context Caching? Google Gemini's Context Caching is a feature that caches context...
AI ArchitectureBuilding a Free Research Agent with DuckDuckGo Search + Local LLM
Why DuckDuckGo + Local LLM? When conducting research, using paid APIs (such as Brave...
AI ArchitectureBuilding a 5-in-1 App with Local LLM and Flutter
Introduction "I want to leverage AI without sending data to the cloud." The biggest...
AI ArchitectureLoRA and FT Are Unnecessary: How to Approach Distilled Models
Introduction Fine-tuning (FT) a distilled model is either ineffective or leads to...
AI ArchitectureRunning NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support
NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese,...
AI ArchitectureGiving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes
What We Want to Achieve Traditional Minecraft bots primarily relied on command-based...
AI ArchitectureUsing Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data
Using Local LLMs as a "Batch Processing Engine" — Designing Automated Artifact Generation...
AI ArchitectureFast Searching 4 Million Patent Records with FTS5
Introduction: The Limitations of LIKE Search When searching for "battery" in PatentLLM's...