PatentLLM Tech Blog

AI

Run Nemotron 9B Japanese Locally — Mamba SSM & Thinking Mode

This article details the steps for locally running NVIDIA's released Japanese-specific 9B parameter LLM, 'Nemotron-Nano-9B-v2-Japanese'. It features the Mamba SSM architecture and ...

Dev Tools

uv Guide: The Fast Python Package Manager Replacing pip/venv

This article explains how to introduce and practically use uv, a Rust-based Python package manager developed by Astral. It covers its speed (10 to 100 times faster than pip), venv-...

Dev Tools

The OSS Lineage Behind AI Dev Stacks — Origins & Creators

This article organizes the origins and design philosophies of the major OSS components that constitute the AI development stack, following their historical development. We will pro...

AI

Local LLM as Batch Engine — Auto-Generate Outputs with Nemotron

This article outlines a design for running NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM to analyze and structure development environment data using batch processing. It discusse...

Dev Tools

Comedy Skit: The Man Possessed by Claude Code

A manzai comedy skit between Engineer Takechi, whose brain has been completely taken over by Claude Code, and his junior colleague Niiyama, who desperately tries to pull him back t...

AI

No LoRA, No Fine-Tuning Needed — Working with Distilled Models

We apply insights gained from distillation experiments with Shogi AI to LLMs. Fine-tuning (FT) a distilled model is either meaningless or harmful, and LoRA can be replaced by promp...

AI

Search 3.5M Patents at Speed with SQLite FTS5

Searching 3.54 million patent records, which was impractical with SQLite's LIKE search, is solved with FTS5 full-text search. This article explains the implementation steps for inv...

Dev Tools

Claude Code MCP Server — Practical Tips & Setup

This article explains how to complete SQLite database operations within an AI assistant by utilizing Claude Code's MCP (Model Context Protocol) server functionality. We will cover ...

AI

Build a 5-in-1 App with Local LLM + Flutter

This article explains the process of building an AI development support app that integrates five functions into one app using Flutter Web. It details how to run a local LLM with vL...

Web / Infra

Implementing Stripe Checkout for PatentLLM

A practical record of implementing Stripe Checkout billing for 'PatentLLM', a SaaS for US IP law firms. This article explains a design that does not store card information on the s...

GPU Inference

Run Nemotron 9B on vLLM with OpenAI-Compatible API

This article explains how to launch NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM and integrate it into your custom application as an OpenAI-compatible API. It eliminates the nee...

Web / Infra

Cloudflare Tunnel + Caddy — Serve Multiple Apps from Home

This article explains how to securely expose multiple web applications from a WSL2 environment by combining Cloudflare Tunnel and Caddy reverse proxy. This configuration eliminates...

Dev Tools

Claude Code Hooks — Auto-Prevent Port Conflicts & Dangerous Commands

This article explains a method for proactively and automatically preventing the execution of port conflicts and dangerous commands (e.g., rm -rf, git push --force) by leveraging Cl...

Dev Tools

Reduce Claude Code Token Usage — FTS5 Knowledge DB & Tier Index

This article introduces a method to solve the problem of excessive token consumption when all information is written in CLAUDE.md, by using a two-layer structure: a Tier 1 index (l...

Dev Tools

Auto-Generate Daily Reports from Claude Code & Gemini CLI Usage

This article explains how to build a daily report system that automatically aggregates Claude Code and Gemini CLI usage history every morning with a cron job, visualizing token con...

AI

Build a Free Research Agent with DuckDuckGo + Local LLM

This article explains how to build a free research agent that doesn't require an API key, by combining the ddgs library and a local LLM (Nemotron). It also includes an implementati...

AI

Cut API Costs with Gemini Context Caching for Large Documents

This article explains methods to reduce API costs and shorten processing time for large-scale data analysis by leveraging Google Gemini's Context Caching feature. Specific examples...

AI

Gemini Flash × Nemotron 9B — Optimal Cloud + Local LLM Roles

This article introduces an implementation pattern that balances cost, quality, and privacy by combining Gemini 2.5 Flash and Nemotron 9B. It also explains the design of a common in...

Dev Tools

google-generativeai to google-genai Migration Guide

Due to the deprecation of the google.generativeai package, this guide explains the specific steps to migrate to the google-genai SDK. We will show examples of import changes, Gener...

Dev Tools

Search Case Law PDFs with RAG — Gemini + SQLite FTS5 Legal AI

This article explains how to build a legal AI search system that converts court case law PDFs into text, achieves full-text search with SQLite FTS5, and automates issue extraction ...

AI

Give Minecraft NPCs a Brain with Local LLM — Nemotron + Mineflayer

This article explains an implementation method for giving Minecraft NPCs natural language-based situational judgment and response capabilities by running a local LLM (Nemotron 9B) ...

AI

Local + Cloud LLM Pipeline — Nemotron Generation × Gemini Refinement

This article explains the design and implementation of a 2-stage pipeline that generates content with Nemotron 9B and refines and fact-checks it with Gemini 2.5 Flash. We also intr...

Dev Tools

Search NITE CHRIP Data with FTS5 — Chemical Regulation Dashboard

This article explains the method for building a dashboard that indexes data from the Chemical Substance Risk Information Platform (CHRIP) with SQLite FTS5 and performs regulatory s...

GPU Inference

13 Projects on One RTX 5090 — Solo Dev Portfolio Strategy

This article explains the common infrastructure design and resource management strategy for operating 13 projects, including Shogi AI, LLM applications, and legal systems, on a sin...

PatentLLM Blog →日本語

2026-03-08

Run Nemotron 9B Japanese Locally — Mamba SSM & Thinking Mode

uv Guide: The Fast Python Package Manager Replacing pip/venv

The OSS Lineage Behind AI Dev Stacks — Origins & Creators

Local LLM as Batch Engine — Auto-Generate Outputs with Nemotron

Comedy Skit: The Man Possessed by Claude Code

No LoRA, No Fine-Tuning Needed — Working with Distilled Models

Search 3.5M Patents at Speed with SQLite FTS5

Claude Code MCP Server — Practical Tips & Setup

Build a 5-in-1 App with Local LLM + Flutter

Implementing Stripe Checkout for PatentLLM

Run Nemotron 9B on vLLM with OpenAI-Compatible API

Cloudflare Tunnel + Caddy — Serve Multiple Apps from Home

Claude Code Hooks — Auto-Prevent Port Conflicts & Dangerous Commands

Reduce Claude Code Token Usage — FTS5 Knowledge DB & Tier Index

Auto-Generate Daily Reports from Claude Code & Gemini CLI Usage

Build a Free Research Agent with DuckDuckGo + Local LLM

Cut API Costs with Gemini Context Caching for Large Documents

Gemini Flash × Nemotron 9B — Optimal Cloud + Local LLM Roles

google-generativeai to google-genai Migration Guide

Search Case Law PDFs with RAG — Gemini + SQLite FTS5 Legal AI

Give Minecraft NPCs a Brain with Local LLM — Nemotron + Mineflayer

Local + Cloud LLM Pipeline — Nemotron Generation × Gemini Refinement

Search NITE CHRIP Data with FTS5 — Chemical Regulation Dashboard

13 Projects on One RTX 5090 — Solo Dev Portfolio Strategy