Deep Dive

In-depth technical articles on AI, GPU inference, and developer tools

My RAG's Model Had Already Read the Books - Per-Book Verdicts, a Fake Regression, and Catching a 9B Leaking Prior Knowledge

I run a local RAG over all seven Harry Potter novels with a 9B model that knows the franchise cold. Re-running my eval o...

2026-07-25 GPU & Inference

Solving 7×7 Killall-Go Opening JA on a Single RTX 5090

We reproduce the NeurIPS 2023 online fine-tuning Killall-Go solver natively on a single RTX 5090 and prove the 7x7 openi...

2026-07-25 GPU & Inference

Solving Cho Chikun Life-and-Death Problems on a Single RTX 5090

A 2x2 cross-generation benchmark (hardware x algorithm) of the Relevance-Zone life-and-death solver on Cho Chikun's prob...

2026-07-25 GPU & Inference

The Same RTX 5090, but the GPU Sat Idle — a CPU-Bound Go Solver and the Case for L2 Cache

Second in the RTX 5090 series — but the GPU sits ~23% idle. This Go life-and-death solver is CPU-bound, and the prime su...

2026-07-18 GPU & Inference

One RTX 5090 vs a 12-GPU Cluster — Benchmarking a Decade of GPUs on the Same Go Proof

A 2023 paper solved a tiny 7x7 Go position on a 12-GPU cluster. I re-ran the exact same solver on a single RTX 5090 — id...

2026-07-18 GPU & Inference

INT8 Q/DQ Calibration on Blackwell: 1.8× the TRT 10 + FP16 Baseline

A practical walkthrough of doing INT8 post-training quantization the right way on RTX 5090 + TensorRT 11. 1,500 stratifi...

2026-06-10 Web & Infrastructure

Cloud Is a Luxury Car — Two Philosophies of Building Data Apps in 2026

There are two coherent ways to build data applications in 2026. One pays a vendor to skip the assembly. The other assemb...

2026-05-19 Web & Infrastructure

Cloudflare Tunnel as the Indie Developer's Public IP

For most of the internet's history, exposing a service on your own machine to the public web has been a small nightmare....

2026-05-19 AI Architecture

The Insight-Free Property of Vendor RAGs — A Feature, Not a Bug

Ask a vendor-run documentation RAG to compare its product to a competitor and you will get back the politest non-answer ...

2026-05-19 Developer Tools

The `uv` Era — Disposable Python Environments and What OpenAI's Astral Acquisition Means

uv made venv obsolete. PEP 723 made requirements.txt optional. And in March 2026, OpenAI bought the whole stack. Here is...

2026-05-19 SQLite & Databases

Building a Hybrid RAG in 200 Lines — SQLite + FTS5 + sqlite-vec + RRF

A complete walkthrough of a production-shaped hybrid retrieval system in a single Python file. BM25 keyword search via F...

2026-05-19 SQLite & Databases

Cortex Search vs Hybrid SQLite RAG — A Cost and Latency Teardown

Snowflake's Cortex Search is one of the cleanest enterprise RAG offerings on the market. A laptop running SQLite + FTS5 ...

2026-05-19 Web & Infrastructure

Inside Streamlit's Re-Run Model — Why Hot Reload Feels Instant

Streamlit's whole architecture is built on one slightly insulting idea — just re-run the script from top to bottom every...

2026-05-19 AI Architecture

Why Snowflake's Bet on Streamlit Just Works — And Where Solo Builders Still Win

An 18-hour build later, here's why Snowflake's strategy with Streamlit is the cleanest enterprise RAG play on the market...

2026-05-19 Open Source

How AI Quietly Revived Open Source — A Closing Note on the People Who Made the Pieces

This series has been a long argument about how to build data applications in 2026. The argument only works because the u...

2026-05-19 SQLite & Databases

Maybe SQLite Is Still Better Than DuckDB for My Workloads

Why SQLite still wins for incremental workloads, when DuckDB earns its keep, and why VectorDBs are a black box you might...

2026-05-06 GPU & Inference

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

A single-file FastAPI gateway that auto-starts vLLM on demand and stops it after idle, freeing VRAM for other GPU worklo...

2026-03-26 SQLite & Databases

Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Stop feeding raw data to LLMs. The real moat in AI is your proprietary database — not the model.

2026-03-26 Developer Tools

I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 730 for the Browser

Background I work with a lot of SQLite databases — patent data, court rulings, government...

2026-03-24 Open Source

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Introduction In late 2025, Canonical founder Mark Shuttleworth quietly but clearly...

2026-03-23 AI Architecture

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Introduction Andrej Karpathy (OpenAI co-founder) released autoresearch — an experiment...

2026-03-22 AI Architecture

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Introduction Built a local-first RAG research tool that runs entirely on a single GPU....

2026-03-22 AI Architecture

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures ...

2026-03-22 Developer Tools

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral...

2026-03-22 GPU & Inference

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC...

2026-03-22 Web & Infrastructure

AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

Today's Highlights The rapid evolution of AI is bringing about significant transformations...

2026-03-22 Developer Tools

Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Today's Highlights The field of AI development is evolving day by day. Especially for...

2026-03-22 LLM

Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Today's Highlights Hello, fellow personal developers and AI researchers! In today's tech...

2026-03-22 AI Architecture

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Hello everyone! I'm soy-tuber, a solo developer and AI researcher. As I immerse myself daily in AI...

2026-03-21 GPU & Inference

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

Today's Highlights In 2026, AI's evolution is remarkable, with accelerated adoption in...

2026-03-21 Web & Infrastructure

New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

Today's Highlights This is soy-tuber, an individual developer, especially an AI...

2026-03-21 GPU & Inference

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

Hello everyone! I'm soy-tuber, an AI researcher and individual developer. I usually push my RTX 5090...

2026-03-21 AI Architecture

Talent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka

Why Are We Betrayed by the Mirage of "Motivation"? Living in the modern era, we are caught...

2026-03-14 Other

Practical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an Open...

Introduction Recently, an article on Qiita titled "Running Nemotron-Nano-9B-v2-Japanese...

2026-03-08 Developer Tools

Deep Dive

My RAG's Model Had Already Read the Books - Per-Book Verdicts, a Fake Regression, and Catching a 9B Leaking Prior Knowledge

Solving 7×7 Killall-Go Opening JA on a Single RTX 5090

Solving Cho Chikun Life-and-Death Problems on a Single RTX 5090

The Same RTX 5090, but the GPU Sat Idle — a CPU-Bound Go Solver and the Case for L2 Cache

One RTX 5090 vs a 12-GPU Cluster — Benchmarking a Decade of GPUs on the Same Go Proof

INT8 Q/DQ Calibration on Blackwell: 1.8× the TRT 10 + FP16 Baseline

Cloud Is a Luxury Car — Two Philosophies of Building Data Apps in 2026

Cloudflare Tunnel as the Indie Developer's Public IP

The Insight-Free Property of Vendor RAGs — A Feature, Not a Bug

The `uv` Era — Disposable Python Environments and What OpenAI's Astral Acquisition Means

Building a Hybrid RAG in 200 Lines — SQLite + FTS5 + sqlite-vec + RRF

Cortex Search vs Hybrid SQLite RAG — A Cost and Latency Teardown

Inside Streamlit's Re-Run Model — Why Hot Reload Feels Instant

Why Snowflake's Bet on Streamlit Just Works — And Where Solo Builders Still Win

How AI Quietly Revived Open Source — A Closing Note on the People Who Made the Pieces

Maybe SQLite Is Still Better Than DuckDB for My Workloads

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 730 for the Browser

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Building a 5-in-1 Local LLM App with Flutter Web and Flask

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

OpenAI Acquires Astral (uv / Ruff) — What It Really Means

The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

Why Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

Using Python to Load Google Docs into AI — Drive API Minimal Permission Setup

Hardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Automating Video Generation with Remotion and VOICEVOX: From Environment Setup to Performance Optimization

Cloudflare Tunnel Practical Guide: Securely Exposing a Home AI Server Without Port Forwarding

Automated Google Drive Backup with Rclone: Headless OAuth Authentication and systemd Configuration

Claude Code Practical Guide: Debugging, Test Automation, and CUDA Environment Setup with Opus 4.6

I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

Coders at Work — Index of All 15 Programmer Interviews

RTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison

Talent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka

Three Months of Code: What a Patent Lawyer Built from Zero

I Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5

Operational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2

Achieving Bidirectional Integration of Streamlit Backend Flutter Frontend in a WSL2 Environment

A Regulatory Analysis Dashboard for Fast Searching NITE CHRIP Data using FTS5

Searching Case Law PDFs with RAG — A Legal AI Search System using Gemini + SQLite FTS5

google-generativeai google-genai Migration Guide

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

Skit: The Man Obsessed with Claude Code

Building a Free Research Agent with DuckDuckGo Search + Local LLM

A Daily Report System to Automatically Aggregate Claude Code + Gemini CLI Usage History Every Morning with Cron

Reducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design

Implementing Stripe Checkout Billing in PatentLLM

Building a 5-in-1 App with Local LLM and Flutter

Leveraging Claude Code's MCP Server

LoRA and FT Are Unnecessary: How to Approach Distilled Models