Deep Dive
In-depth technical articles on AI, GPU inference, and developer tools
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
A single-file FastAPI gateway that auto-starts vLLM on demand and stops it after idle, freeing VRAM for other GPU worklo...
DatabaseDatabases Are the New AI Moat: Why DB-First Architecture Changes Everything
Stop feeding raw data to LLMs. The real moat in AI is your proprietary database — not the model.
Developer ToolsI Built a SQLite Editor in 180 Lines, Then Rebuilt It in 730 for the Browser
Background I work with a lot of SQLite databases — patent data, court rulings, government...
ossCanonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS
Introduction In late 2025, Canonical founder Mark Shuttleworth quietly but clearly...
AI ArchitectureRunning Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research
Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...
AI ArchitectureBuilt a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling
Introduction Built a local-first RAG research tool that runs entirely on a single GPU....
AI ArchitectureAI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures
AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...
Developer ToolsFrontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition
Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral...
GPU & InferenceToday's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends
Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC...
Web & InfrastructureAI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability
Today's Highlights The rapid evolution of AI is bringing about significant transformations...
Developer ToolsData Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools
Today's Highlights The field of AI development is evolving day by day. Especially for...
LLMNext-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano
Today's Highlights Hello, fellow personal developers and AI researchers! In today's tech...
AI ArchitectureAI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring
Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...
GPU & Inference2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX
Today's Highlights In 2026, AI's evolution is remarkable, with accelerated adoption in...
Web & InfrastructureNew Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress
Today's Highlights This is soy-tuber, an individual developer, especially an AI...
GPU & InferenceRTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...
Hello everyone! I'm soy-tuber, an AI researcher and individual developer. I usually push my RTX 5090...
AI ArchitectureThe Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"
After watching the GTC 2026 keynote, what stayed with me wasn't Vera Rubin's "35x" number. It was the...
AI ArchitectureTraining a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate
Shogi — Japanese chess — has a thriving computer engine scene that most Western developers have never...
AI ArchitectureLennart Poettering and the systemd Wars: The Most Controversial Software in Linux History
No piece of software has divided the Linux community more bitterly than systemd. Depending on who you...
AI ArchitectureThe README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)
Here's a pattern that will sound familiar: you carefully write a README with architecture decisions,...
AI ArchitectureFrom 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records
If you've ever written WHERE column LIKE '%keyword%' on a table with more than a million rows, you...
AI ArchitectureSoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File
SoyLM started as a simple idea: build a RAG (Retrieval-Augmented Generation) tool that runs entirely...
AI ArchitectureAdding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan
PatentLLM started as a free patent search tool. Making it a paid product meant answering one question...
AI ArchitectureWhen Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline
I built a patent analysis pipeline that combines SQLite FTS5 search with Gemini's analytical...
AI ArchitectureFlutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App
I added a Flutter Web app to my phone's home screen, expecting a glorified bookmark. Instead, it...
AI ArchitectureTailscale Deep Dive: Why Developers Are Ditching Traditional VPNs
Every developer I know who tries Tailscale has the same reaction: "Wait, that's it? It just......
AI ArchitectureBuilding a 5-in-1 Local LLM App with Flutter Web and Flask
The idea started simple: build five small apps that demonstrate what a local LLM can do with private...
AI ArchitectureClaude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation
There's a particular kind of friction in database-driven development that most of us have learned to...
AI ArchitectureHow Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't
I recently scraped every unagi (eel) restaurant in Japan using the Google Places Text Search API. The...
Developer ToolsOpenAI Acquires Astral (uv / Ruff) — What It Really Means
Introduction OpenAI has acquired Astral — the company behind Python's blazing-fast package...
GPU & InferenceThe Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy
If you've been following my recent posts, you might have seen my repository and the issue I opened on...
GPU & InferencePunching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090
Disclaimer: This is an experimental build, not a production setup. NemoClaw is early-stage, the...
Developer ToolsWhy Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap
The Symptom: 93 Pages Invisible to Google I run a technical blog on FastAPI behind...
AI ArchitecturevLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090
Why This Comparison Exists I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090...
Developer ToolsUsing Python to Load Google Docs into AI — Drive API Minimal Permission Setup
Introduction: The Challenge of AI Not Being Able to Directly Read Google Documents "Please...
GPU & InferenceHardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations
Introduction: Gemini Flash Equivalent Locally? The Despair of a Slow Development...
AI ArchitectureWhat I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models
Introduction As a practical testing ground for verifying reasoning optimization and model...
AI ArchitectureTurn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit
Introduction: Taking Back Control of the AI "Brain" For modern engineers, LLMs (Large...
Web & InfrastructureAutomating Video Generation with Remotion and VOICEVOX: From Environment Setup to Performance Optimization
Introduction: An Approach to Automating Video Generation When attempting to generate...
Web & InfrastructureCloudflare Tunnel Practical Guide: Securely Exposing a Home AI Server Without Port Forwarding
Introduction: Utilizing the Computational Resources of the RTX 5090 For AI developers, a...
Web & InfrastructureAutomated Google Drive Backup with Rclone: Headless OAuth Authentication and systemd Configuration
Introduction: Data Management in AI Development and the Rclone Barrier In AI and LLM...
Developer ToolsClaude Code Practical Guide: Debugging, Test Automation, and CUDA Environment Setup with Opus 4.6
Introduction Claude Code is a CLI (Command Line Interface) tool provided by Anthropic that...
AI ArchitectureI Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions
The Time I Posted a Patent Search Engine to Reddit r/LocalLLaMA and Received 65 Upvotes and...
Developer ToolsCoders at Work — Index of All 15 Programmer Interviews
■ What is Coders at Work? Written by Peter Seibel (2009). A collection of long-form interviews with...
GPU & InferenceRTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison
I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090 with vLLM 0.15.1 and wanted to share...
otherTalent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka
Why Are We Betrayed by the Mirage of "Motivation"? Living in the modern era, we are caught...
otherThree Months of Code: What a Patent Lawyer Built from Zero
I built a multi-engine shogi AI, deployed it to rated games on Floodgate, and watched it lose to...
Developer ToolsI Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5
I'm a patent lawyer who started coding in December 2025. Today I'm launching a free patent search...
Developer ToolsOperational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2
WSL2 systemd Support To enable systemd in WSL2, configure /etc/wsl.conf. # Add to...
Web & InfrastructureAchieving Bidirectional Integration of Streamlit Backend Flutter Frontend in a WSL2 Environment
Solution for CORS Issues When making Streamlit accessible externally in a WSL2...
Developer ToolsA Regulatory Analysis Dashboard for Fast Searching NITE CHRIP Data using FTS5
NITE CHRIP Data Conversion Regulatory data provided by the Chemical Substance Risk...
Developer ToolsSearching Case Law PDFs with RAG — A Legal AI Search System using Gemini + SQLite FTS5
Challenges in Case Law Search Traditional court databases (e.g., courts.go.jp) have search...
Developer Toolsgoogle-generativeai google-genai Migration Guide
What Happened The google.generativeai package has been deprecated. Migration to the new...
AI ArchitectureGemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM
Why Combine Them? When designing AI workloads, it is not easy to simultaneously satisfy...
AI ArchitectureReduce API Costs for Large-Scale Document Analysis with Gemini Context Caching
What is Context Caching? Google Gemini's Context Caching is a feature that caches context...
Developer ToolsSkit: The Man Obsessed with Claude Code
Comedy Sketch: The Man Possessed by Claude Code Characters: Niiyama: The...
AI ArchitectureBuilding a Free Research Agent with DuckDuckGo Search + Local LLM
Why DuckDuckGo + Local LLM? When conducting research, using paid APIs (such as Brave...
Developer ToolsA Daily Report System to Automatically Aggregate Claude Code + Gemini CLI Usage History Every Morning with Cron
Why Automate Daily Reports Manually recording daily AI tool usage is a waste of time....
Developer ToolsReducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design
Problem If all coding conventions, test commands, and documentation for the entire project...
Web & InfrastructureImplementing Stripe Checkout Billing in PatentLLM
Introduction To commercialize PatentLLM, we implemented a billing system using Stripe...
AI ArchitectureBuilding a 5-in-1 App with Local LLM and Flutter
Introduction "I want to leverage AI without sending data to the cloud." The biggest...
Developer ToolsLeveraging Claude Code's MCP Server
Introduction: The Context Switching Problem in DB Operations SQLite is an excellent...
AI ArchitectureLoRA and FT Are Unnecessary: How to Approach Distilled Models
Introduction Fine-tuning (FT) a distilled model is either ineffective or leads to...
Developer ToolsLineage of OSS Supporting the AI Development Stack: Its Origins and Creators
Local AI development environments are built upon numerous open-source technologies. This article...
AI ArchitectureRunning NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support
NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese,...
Developer ToolsStrategic Data Organization Techniques Using SQLite, JSONL, XML, and TSV: Lessons
Introduction PatentLLM (patent search AI) and HanreiLLM (case law search AI) are both...
GPU & InferenceShogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games
What is dlshogi? dlshogi is a Shogi engine incorporating deep learning, consisting of a...
GPU & InferencePractical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an Open...
Introduction Recently, an article on Qiita titled "Running Nemotron-Nano-9B-v2-Japanese...
Developer ToolsPython Environment Management with uv: Introduction and Practical Use of a High-Speed Package Manager Replacing pip/venv
What is uv? uv is a Rust-based Python package manager developed by Astral (Charlie Marsh)....
Developer ToolsAutomatically Prevent Port Conflicts and Dangerous Commands Proactively with Claude Code's Hooks Feature
What are Claude Code hooks? Claude Code's hooks feature enables event-driven automation...
AI ArchitectureGiving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes
What We Want to Achieve Traditional Minecraft bots primarily relied on command-based...
Web & InfrastructureExposing Multiple Web Applications from a Home Server with Cloudflare Tunnel + Caddy
Introduction When publishing multiple web applications on a home server, obtaining a...
GPU & InferencePersonal AI Development Environment Built with RTX 5090 + WSL2 — A Practical Setup Fully Utilizing 32GB GPU
Why RTX 5090 + WSL2? The 32GB VRAM of the RTX 5090 is a practical choice for local...
GPU & InferenceIndividual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090
13 Project List The portfolio consists of the following categories: Legal...
AI ArchitectureUsing Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data
Using Local LLMs as a "Batch Processing Engine" — Designing Automated Artifact Generation...
AI ArchitectureFast Searching 4 Million Patent Records with FTS5
Introduction: The Limitations of LIKE Search When searching for "battery" in PatentLLM's...