PatentLLM Tech Blog

AI Architecture

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

This article details the steps for locally running NVIDIA's released Japanese-specific 9B parameter LLM, 'Nemotron-Nano-9B-v2-Japanese'. It features the Mamba SSM architecture and ...

AI Architecture

Using Local LLMs as a "Batch Processing Engine" — A Design for Automatically Generating Artifacts from Your Own Data with Nemotron

This article outlines a design for running NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM to analyze and structure development environment data using batch processing. It discusse...

AI Architecture

An Era Where LoRA and FT Are Unnecessary: How to Approach Distilled Models

We apply insights gained from distillation experiments with Shogi AI to LLMs. Fine-tuning (FT) a distilled model is either meaningless or harmful, and LoRA can be replaced by promp...

AI Architecture

Fast Searching 1.73 Million Patent Records with FTS5

Searching 1.73 million patent records, which was impractical with SQLite's LIKE search, is solved with FTS5 full-text search. This article explains the implementation steps for inv...

AI Architecture

Building a 5-in-1 App with Local LLM and Flutter

This article explains the process of building an AI development support app that integrates five functions into one app using Flutter Web. It details how to run a local LLM with vL...

AI Architecture

Building a Free Research Agent with DuckDuckGo Search + Local LLM

This article explains how to build a free research agent that doesn't require an API key, by combining the ddgs library and a local LLM (Nemotron). It also includes an implementati...

AI Architecture

Reduce API Costs for Large-Scale Document Analysis with Gemini Context Caching

This article explains methods to reduce API costs and shorten processing time for large-scale data analysis by leveraging Google Gemini's Context Caching feature. Specific examples...

AI Architecture

Gemini 2.5 Flash x Nemotron 9B — Optimal Division of Roles for Cloud LLM and Local LLM

This article introduces an implementation pattern that balances cost, quality, and privacy by combining Gemini 2.5 Flash and Nemotron 9B. It also explains the design of a common in...

AI Architecture

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

This article explains an implementation method for giving Minecraft NPCs natural language-based situational judgment and response capabilities by running a local LLM (Nemotron 9B) ...

AI Architecture

2-Stage Pipeline: Local LLM Generation + Cloud LLM Refinement — Nemotron × Gemini 2.5 Flash

This article explains the design and implementation of a 2-stage pipeline that generates content with Nemotron 9B and refines and fact-checks it with Gemini 2.5 Flash. We also intr...