Deep Dive

In-depth technical articles on AI, GPU inference, and developer tools

AI Architecture

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...

AI Architecture

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Introduction Built a local-first RAG research tool that runs entirely on a single GPU....

AI Architecture

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...

AI Architecture

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...

AI Architecture

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

After watching the GTC 2026 keynote, what stayed with me wasn't Vera Rubin's "35x" number. It was the...

AI Architecture

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Shogi — Japanese chess — has a thriving computer engine scene that most Western developers have never...

AI Architecture

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

No piece of software has divided the Linux community more bitterly than systemd. Depending on who you...

AI Architecture

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

Here's a pattern that will sound familiar: you carefully write a README with architecture decisions,...

AI Architecture

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

If you've ever written WHERE column LIKE '%keyword%' on a table with more than a million rows, you...

AI Architecture

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

SoyLM started as a simple idea: build a RAG (Retrieval-Augmented Generation) tool that runs entirely...

AI Architecture

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

PatentLLM started as a free patent search tool. Making it a paid product meant answering one question...

AI Architecture

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

I built a patent analysis pipeline that combines SQLite FTS5 search with Gemini's analytical...

AI Architecture

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

I added a Flutter Web app to my phone's home screen, expecting a glorified bookmark. Instead, it...

AI Architecture

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Every developer I know who tries Tailscale has the same reaction: "Wait, that's it? It just......

AI Architecture

Building a 5-in-1 Local LLM App with Flutter Web and Flask

The idea started simple: build five small apps that demonstrate what a local LLM can do with private...

AI Architecture

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

There's a particular kind of friction in database-driven development that most of us have learned to...

AI Architecture

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

I recently scraped every unagi (eel) restaurant in Japan using the Google Places Text Search API. The...

AI Architecture

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

Why This Comparison Exists I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090...

AI Architecture

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Introduction As a practical testing ground for verifying reasoning optimization and model...

AI Architecture

Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Introduction: Taking Back Control of the AI "Brain" For modern engineers, LLMs (Large...

AI Architecture

I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

The Time I Posted a Patent Search Engine to Reddit r/LocalLLaMA and Received 65 Upvotes and...

AI Architecture

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

Why Combine Them? When designing AI workloads, it is not easy to simultaneously satisfy...

AI Architecture

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

What is Context Caching? Google Gemini's Context Caching is a feature that caches context...

AI Architecture

Building a Free Research Agent with DuckDuckGo Search + Local LLM

Why DuckDuckGo + Local LLM? When conducting research, using paid APIs (such as Brave...

AI Architecture

Building a 5-in-1 App with Local LLM and Flutter

Introduction "I want to leverage AI without sending data to the cloud." The biggest...

AI Architecture

LoRA and FT Are Unnecessary: How to Approach Distilled Models

Introduction Fine-tuning (FT) a distilled model is either ineffective or leads to...

AI Architecture

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese,...

AI Architecture

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

What We Want to Achieve Traditional Minecraft bots primarily relied on command-based...

AI Architecture

Using Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data

Using Local LLMs as a "Batch Processing Engine" — Designing Automated Artifact Generation...

AI Architecture

Fast Searching 4 Million Patent Records with FTS5

Introduction: The Limitations of LIKE Search When searching for "battery" in PatentLLM's...