PatentLLM Tech Blog

AI

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Built a local-first RAG research tool that runs entirely on a single GPU. Nemotron Nano 9B v2 on vLLM + FastAPI + SQLite FTS5 with a two-step Extract → Execute flow. Tool calling +...

AI

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

A deep dive into the fork that replaces Claude Code with Qwen 3.5 9B + ollama in Karpathy's autoresearch framework. Run fully autonomous ML research on a single GPU with zero API c...

oss

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Canonical, the company behind Ubuntu, is targeting an IPO with $292M revenue and 88% gross margins. If they go public, it will symbolize the new era of Linux/OSS business. We trace...

AI

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

Using the Google Places Text Search API, I scraped 1,914 unagi restaurants across all 47 Japanese prefectures with under 1.6% noise. This article dissects why BM25/FTS5 can never r...

AI

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

How I replaced slow LIKE queries with SQLite FTS5 full-text search on a 1.73 million row patent database, achieving 100x+ speedup with BM25 ranking and boolean query support....

AI

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

How to set up Anthropic's official SQLite MCP server with Claude Code to run queries, inspect schemas, and manage databases directly from your AI coding assistant....

AI

Building a 5-in-1 Local LLM App with Flutter Web and Flask

How I rebuilt five separate HTML prototypes into a single Flutter Web app backed by a Flask API, using 874MB of Claude Code session history as the data source for local LLM analysi...

AI

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

How I debugged a patent analysis pipeline where Gemini generated plausible-but-fake patent numbers because the FTS5 queries returned zero results, and the three fixes that made it ...

AI

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

Practical walkthrough of integrating Stripe Checkout into a Python SaaS targeting US patent law firms, including graceful degradation, local subscription caching, and the decision ...

AI

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

AI coding assistants like Claude Code don't automatically read your README before making changes. Here are three strategies that enforce documentation-first workflows....

AI

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

A deep dive into converting a PyTorch shogi (Japanese chess) model to ONNX for TensorRT inference, and what MCTS parameter tuning taught me about why raw model size isn't everythin...

AI

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

How I built SoyLM, a single-file RAG tool using FastAPI, SQLite FTS5, and a local LLM, and what I learned about documentation-driven development when Reddit pointed out my README w...

AI

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

A practical explanation of how Flutter Web apps become installable PWAs, the difference between Flutter's native compilation and its web target, and why Google built Flutter this w...

AI

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

A technical deep dive into Tailscale's architecture including WireGuard foundations, DERP relay servers, NAT traversal, and why the mesh-network approach is replacing traditional h...

AI

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

The story of systemd from Lennart Poettering's frustration with SysVinit to the most heated technical debate in Linux history, the Devuan fork, and why systemd won despite the cont...

AI

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

Analyzing NVIDIA's open-source strategy revealed at GTC 2026. From NemoClaw to Vera Rubin, Physical AI, and cuDF/cuVS — why NVIDIA bet on open, viewed through the lens of Linux his...

GPU Inference

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edition]

For individual developers with RTX 40 Series GPUs, soy-tuber provides a practical explanation on how to run LLMs at low cost and high speed, utilizing the latest OSS inference engi...

Dev Tools

PatentLLM Blog →日本語

Today

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

2026-03-21

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

Building a 5-in-1 Local LLM App with Flutter Web and Flask

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edition]

2026-03-19

OpenAI Acquires Astral (uv / Ruff) — What It Really Means

2026-03-18

Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

2026-03-17

Why Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap

2026-03-14

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

2026-03-13

Three Months of Code: What a Patent Lawyer Built from Zero

2026-03-08

SQLite vs JSONL vs XML vs TSV — Data Wrangling for AI Projects