Deep Dive

In-depth technical articles on AI, GPU inference, and developer tools

GPU & Inference

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

A single-file FastAPI gateway that auto-starts vLLM on demand and stops it after idle, freeing VRAM for other GPU worklo...

Database

Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Stop feeding raw data to LLMs. The real moat in AI is your proprietary database — not the model.

Developer Tools

I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 730 for the Browser

Background I work with a lot of SQLite databases — patent data, court rulings, government...

oss

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Introduction In late 2025, Canonical founder Mark Shuttleworth quietly but clearly...

AI Architecture

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...

AI Architecture

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Introduction Built a local-first RAG research tool that runs entirely on a single GPU....

AI Architecture

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...

Developer Tools

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral...

GPU & Inference

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC...

Web & Infrastructure

AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

Today's Highlights The rapid evolution of AI is bringing about significant transformations...

Developer Tools

Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Today's Highlights The field of AI development is evolving day by day. Especially for...

LLM

Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Today's Highlights Hello, fellow personal developers and AI researchers! In today's tech...

AI Architecture

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...

GPU & Inference

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

Today's Highlights In 2026, AI's evolution is remarkable, with accelerated adoption in...

Web & Infrastructure

New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

Today's Highlights This is soy-tuber, an individual developer, especially an AI...

GPU & Inference

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

Hello everyone! I'm soy-tuber, an AI researcher and individual developer. I usually push my RTX 5090...

AI Architecture

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

After watching the GTC 2026 keynote, what stayed with me wasn't Vera Rubin's "35x" number. It was the...

AI Architecture

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Shogi — Japanese chess — has a thriving computer engine scene that most Western developers have never...

AI Architecture

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

No piece of software has divided the Linux community more bitterly than systemd. Depending on who you...

AI Architecture

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

Here's a pattern that will sound familiar: you carefully write a README with architecture decisions,...

AI Architecture

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

If you've ever written WHERE column LIKE '%keyword%' on a table with more than a million rows, you...

AI Architecture

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

SoyLM started as a simple idea: build a RAG (Retrieval-Augmented Generation) tool that runs entirely...

AI Architecture

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

PatentLLM started as a free patent search tool. Making it a paid product meant answering one question...

AI Architecture

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

I built a patent analysis pipeline that combines SQLite FTS5 search with Gemini's analytical...

AI Architecture

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

I added a Flutter Web app to my phone's home screen, expecting a glorified bookmark. Instead, it...

AI Architecture

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Every developer I know who tries Tailscale has the same reaction: "Wait, that's it? It just......

AI Architecture

Building a 5-in-1 Local LLM App with Flutter Web and Flask

The idea started simple: build five small apps that demonstrate what a local LLM can do with private...

AI Architecture

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

There's a particular kind of friction in database-driven development that most of us have learned to...

AI Architecture

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

I recently scraped every unagi (eel) restaurant in Japan using the Google Places Text Search API. The...

Developer Tools

OpenAI Acquires Astral (uv / Ruff) — What It Really Means

Introduction OpenAI has acquired Astral — the company behind Python's blazing-fast package...

GPU & Inference

The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

If you've been following my recent posts, you might have seen my repository and the issue I opened on...

GPU & Inference

Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

Disclaimer: This is an experimental build, not a production setup. NemoClaw is early-stage, the...

Developer Tools

Why Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap

The Symptom: 93 Pages Invisible to Google I run a technical blog on FastAPI behind...

AI Architecture

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

Why This Comparison Exists I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090...

Developer Tools

Using Python to Load Google Docs into AI — Drive API Minimal Permission Setup

Introduction: The Challenge of AI Not Being Able to Directly Read Google Documents "Please...

GPU & Inference

Hardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations

Introduction: Gemini Flash Equivalent Locally? The Despair of a Slow Development...

AI Architecture

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Introduction As a practical testing ground for verifying reasoning optimization and model...

AI Architecture

Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Introduction: Taking Back Control of the AI "Brain" For modern engineers, LLMs (Large...

Web & Infrastructure

Automating Video Generation with Remotion and VOICEVOX: From Environment Setup to Performance Optimization

Introduction: An Approach to Automating Video Generation When attempting to generate...

Web & Infrastructure

Cloudflare Tunnel Practical Guide: Securely Exposing a Home AI Server Without Port Forwarding

Introduction: Utilizing the Computational Resources of the RTX 5090 For AI developers, a...

Web & Infrastructure

Automated Google Drive Backup with Rclone: Headless OAuth Authentication and systemd Configuration

Introduction: Data Management in AI Development and the Rclone Barrier In AI and LLM...

Developer Tools

Claude Code Practical Guide: Debugging, Test Automation, and CUDA Environment Setup with Opus 4.6

Introduction Claude Code is a CLI (Command Line Interface) tool provided by Anthropic that...

AI Architecture

I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

The Time I Posted a Patent Search Engine to Reddit r/LocalLLaMA and Received 65 Upvotes and...

Developer Tools

Coders at Work — Index of All 15 Programmer Interviews

■ What is Coders at Work? Written by Peter Seibel (2009). A collection of long-form interviews with...

GPU & Inference

RTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison

I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090 with vLLM 0.15.1 and wanted to share...

other

Talent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka

Why Are We Betrayed by the Mirage of "Motivation"? Living in the modern era, we are caught...

other

Three Months of Code: What a Patent Lawyer Built from Zero

I built a multi-engine shogi AI, deployed it to rated games on Floodgate, and watched it lose to...

Developer Tools

I Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5

I'm a patent lawyer who started coding in December 2025. Today I'm launching a free patent search...

Developer Tools

Operational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2

WSL2 systemd Support To enable systemd in WSL2, configure /etc/wsl.conf. # Add to...

Web & Infrastructure

Achieving Bidirectional Integration of Streamlit Backend Flutter Frontend in a WSL2 Environment

Solution for CORS Issues When making Streamlit accessible externally in a WSL2...

Developer Tools

A Regulatory Analysis Dashboard for Fast Searching NITE CHRIP Data using FTS5

NITE CHRIP Data Conversion Regulatory data provided by the Chemical Substance Risk...

Developer Tools

Searching Case Law PDFs with RAG — A Legal AI Search System using Gemini + SQLite FTS5

Challenges in Case Law Search Traditional court databases (e.g., courts.go.jp) have search...

Developer Tools

google-generativeai google-genai Migration Guide

What Happened The google.generativeai package has been deprecated. Migration to the new...

AI Architecture

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

Why Combine Them? When designing AI workloads, it is not easy to simultaneously satisfy...

AI Architecture

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

What is Context Caching? Google Gemini's Context Caching is a feature that caches context...

Developer Tools

Skit: The Man Obsessed with Claude Code

Comedy Sketch: The Man Possessed by Claude Code Characters: Niiyama: The...

AI Architecture

Building a Free Research Agent with DuckDuckGo Search + Local LLM

Why DuckDuckGo + Local LLM? When conducting research, using paid APIs (such as Brave...

Developer Tools

A Daily Report System to Automatically Aggregate Claude Code + Gemini CLI Usage History Every Morning with Cron

Why Automate Daily Reports Manually recording daily AI tool usage is a waste of time....

Developer Tools

Reducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design

Problem If all coding conventions, test commands, and documentation for the entire project...

Web & Infrastructure

Implementing Stripe Checkout Billing in PatentLLM

Introduction To commercialize PatentLLM, we implemented a billing system using Stripe...

AI Architecture

Building a 5-in-1 App with Local LLM and Flutter

Introduction "I want to leverage AI without sending data to the cloud." The biggest...

Developer Tools

Leveraging Claude Code's MCP Server

Introduction: The Context Switching Problem in DB Operations SQLite is an excellent...

AI Architecture

LoRA and FT Are Unnecessary: How to Approach Distilled Models

Introduction Fine-tuning (FT) a distilled model is either ineffective or leads to...

Developer Tools

Lineage of OSS Supporting the AI Development Stack: Its Origins and Creators

Local AI development environments are built upon numerous open-source technologies. This article...

AI Architecture

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese,...

Developer Tools

Strategic Data Organization Techniques Using SQLite, JSONL, XML, and TSV: Lessons

Introduction PatentLLM (patent search AI) and HanreiLLM (case law search AI) are both...

GPU & Inference

Shogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games

What is dlshogi? dlshogi is a Shogi engine incorporating deep learning, consisting of a...

GPU & Inference

Practical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an Open...

Introduction Recently, an article on Qiita titled "Running Nemotron-Nano-9B-v2-Japanese...

Developer Tools

Python Environment Management with uv: Introduction and Practical Use of a High-Speed Package Manager Replacing pip/venv

What is uv? uv is a Rust-based Python package manager developed by Astral (Charlie Marsh)....

Developer Tools

Automatically Prevent Port Conflicts and Dangerous Commands Proactively with Claude Code's Hooks Feature

What are Claude Code hooks? Claude Code's hooks feature enables event-driven automation...

AI Architecture

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

What We Want to Achieve Traditional Minecraft bots primarily relied on command-based...

Web & Infrastructure

Exposing Multiple Web Applications from a Home Server with Cloudflare Tunnel + Caddy

Introduction When publishing multiple web applications on a home server, obtaining a...

GPU & Inference

Personal AI Development Environment Built with RTX 5090 + WSL2 — A Practical Setup Fully Utilizing 32GB GPU

Why RTX 5090 + WSL2? The 32GB VRAM of the RTX 5090 is a practical choice for local...

GPU & Inference

Individual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090

13 Project List The portfolio consists of the following categories: Legal...

AI Architecture

Using Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data

Using Local LLMs as a "Batch Processing Engine" — Designing Automated Artifact Generation...

AI Architecture

Fast Searching 4 Million Patent Records with FTS5

Introduction: The Limitations of LIKE Search When searching for "battery" in PatentLLM's...