PatentLLM Tech Blog

Dev Tools

Strategic Data Organization Techniques Using SQLite, JSONL, XML, and TSV: Lessons from Implementing PatentLLM and Hanrei-DB

This article explains the data organization know-how gained from developing the patent search app PatentLLM and the case law search app Hanrei-DB. It covers the practical use cases...

AI Architecture

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

This article details the steps for locally running NVIDIA's released Japanese-specific 9B parameter LLM, 'Nemotron-Nano-9B-v2-Japanese'. It features the Mamba SSM architecture and ...

Dev Tools

Python Environment Management with uv: Introduction and Practical Use of a High-Speed Package Manager Replacing pip/venv

This article explains how to introduce and practically use uv, a Rust-based Python package manager developed by Astral. It covers its speed (10 to 100 times faster than pip), venv-...

Dev Tools

Lineage of OSS Supporting the AI Development Stack: Its Origins and Creators

This article organizes the origins and design philosophies of the major OSS components that constitute the AI development stack, following their historical development. We will pro...

AI Architecture

Using Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data with Nemotron

This article outlines a design for running NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM to analyze and structure development environment data using batch processing. It discusse...

Dev Tools

Skit: The Man Obsessed with Claude Code

A manzai comedy skit between Engineer Takechi, whose brain has been completely taken over by Claude Code, and his junior colleague Niiyama, who desperately tries to pull him back t...

AI Architecture

An Era Where LoRA and FT Are Unnecessary: How to Approach Distilled Models

We apply insights gained from distillation experiments with Shogi AI to LLMs. Fine-tuning (FT) a distilled model is either meaningless or harmful, and LoRA can be replaced by promp...

AI Architecture

Fast Searching 1.73 Million Patent Records with FTS5

Searching 1.73 million patent records, which was impractical with SQLite's LIKE search, is solved with FTS5 full-text search. This article explains the implementation steps for inv...

Dev Tools

Leveraging Claude Code's MCP Server

This article explains how to complete SQLite database operations within an AI assistant by utilizing Claude Code's MCP (Model Context Protocol) server functionality. We will cover ...

AI Architecture

Building a 5-in-1 App with Local LLM and Flutter

This article explains the process of building an AI development support app that integrates five functions into one app using Flutter Web. It details how to run a local LLM with vL...

Web / Infra

Implementing Stripe Checkout Billing in PatentLLM

A practical record of implementing Stripe Checkout billing for 'PatentLLM', a SaaS for US IP law firms. This article explains a design that does not store card information on the s...

GPU Inference

Practical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an OpenAI-Compatible API

This article explains how to launch NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM and integrate it into your custom application as an OpenAI-compatible API. It eliminates the nee...

Web / Infra

Exposing Multiple Web Applications from a Home Server with Cloudflare Tunnel + Caddy

This article explains how to securely expose multiple web applications from a WSL2 environment by combining Cloudflare Tunnel and Caddy reverse proxy. This configuration eliminates...

Dev Tools

Automatically Prevent Port Conflicts and Dangerous Commands Proactively with Claude Code's Hooks Feature

This article explains a method for proactively and automatically preventing the execution of port conflicts and dangerous commands (e.g., rm -rf, git push --force) by leveraging Cl...

Dev Tools

Reducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design

This article introduces a method to solve the problem of excessive token consumption when all information is written in CLAUDE.md, by using a two-layer structure: a Tier 1 index (l...

Dev Tools

A Daily Report System to Automatically Aggregate Claude Code + Gemini CLI Usage History Every Morning with Cron

This article explains how to build a daily report system that automatically aggregates Claude Code and Gemini CLI usage history every morning with a cron job, visualizing token con...

AI Architecture

Building a Free Research Agent with DuckDuckGo Search + Local LLM

This article explains how to build a free research agent that doesn't require an API key, by combining the ddgs library and a local LLM (Nemotron). It also includes an implementati...

AI Architecture

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

This article explains methods to reduce API costs and shorten processing time for large-scale data analysis by leveraging Google Gemini's Context Caching feature. Specific examples...

AI Architecture

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

This article introduces an implementation pattern that balances cost, quality, and privacy by combining Gemini 2.5 Flash and Nemotron 9B. It also explains the design of a common in...

Dev Tools

google-generativeai → google-genai Migration Guide

Due to the deprecation of the google.generativeai package, this guide explains the specific steps to migrate to the google-genai SDK. We will show examples of import changes, Gener...

Dev Tools

Searching Case Law PDFs with RAG — A Legal AI Search System using Gemini + SQLite FTS5

This article explains how to build a legal AI search system that converts court case law PDFs into text, achieves full-text search with SQLite FTS5, and automates issue extraction ...

AI Architecture

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

This article explains an implementation method for giving Minecraft NPCs natural language-based situational judgment and response capabilities by running a local LLM (Nemotron 9B) ...

AI Architecture

2-Stage Pipeline: Local LLM Generation + Cloud LLM Refinement — Nemotron × Gemini 2.5 Flash

This article explains the design and implementation of a 2-stage pipeline that generates content with Nemotron 9B and refines and fact-checks it with Gemini 2.5 Flash. We also intr...

Dev Tools

A Regulatory Analysis Dashboard for Fast Searching NITE CHRIP Data using FTS5

This article explains the method for building a dashboard that indexes data from the Chemical Substance Risk Information Platform (CHRIP) with SQLite FTS5 and performs regulatory s...

GPU Inference

Individual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090

This article explains the common infrastructure design and resource management strategy for operating 13 projects, including Shogi AI, LLM applications, and legal systems, on a sin...

GPU Inference

Personal AI Development Environment Built with RTX 5090 + WSL2 — A Practical Setup Fully Utilizing 32GB GPU

This article explains how to build an AI development environment that maximizes the utilization of RTX 5090's 32GB VRAM in a WSL2 environment, allowing vLLM, TensorRT, Shogi AI, an...

GPU Inference

Shogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games

This is a record of operating the dlshogi Shogi engine on an RTX 5090 with TensorRT FP8 quantization. We explain the structure of the Fuka40B model, the effects of quantization, Fl...

Web / Infra

Achieving Bidirectional Integration of Streamlit Backend × Flutter Frontend in a WSL2 Environment

This article explains an implementation method for integrating Streamlit and Flutter apps on WSL2 using Nginx proxy and WebSocket. It also covers solving CORS issues and introduces...

Dev Tools

Operational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2

This article explains how to enable systemd in WSL2 and manage vLLM server, Flask API, and periodic tasks as systemd services. It also covers startup order dependencies and log mon...

Dev Tools

I Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5

A free patent search engine covering 3.5M US patents (2016-2025). Built with SQLite FTS5, BM25 ranking, CPC classification filtering, and Nemotron 9B for tech tag classification....

Dev Tools

Coders at Work: A Guide to All 15 Interviews with Programming Legends

A chapter-by-chapter guide to Peter Seibel's 'Coders at Work', introducing all 15 programmers interviewed — from UNIX creator Ken Thompson to Erlang inventor Joe Armstrong, JavaScr...