This article details the steps for locally running NVIDIA's released Japanese-specific 9B parameter LLM, 'Nemotron-Nano-9B-v2-Japanese'. It features the Mamba SSM architecture and ...
This article outlines a design for running NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM to analyze and structure development environment data using batch processing. It discusse...
We apply insights gained from distillation experiments with Shogi AI to LLMs. Fine-tuning (FT) a distilled model is either meaningless or harmful, and LoRA can be replaced by promp...
Searching 1.73 million patent records, which was impractical with SQLite's LIKE search, is solved with FTS5 full-text search. This article explains the implementation steps for inv...
This article explains the process of building an AI development support app that integrates five functions into one app using Flutter Web. It details how to run a local LLM with vL...
This article explains how to build a free research agent that doesn't require an API key, by combining the ddgs library and a local LLM (Nemotron). It also includes an implementati...
This article explains methods to reduce API costs and shorten processing time for large-scale data analysis by leveraging Google Gemini's Context Caching feature. Specific examples...
This article introduces an implementation pattern that balances cost, quality, and privacy by combining Gemini 2.5 Flash and Nemotron 9B. It also explains the design of a common in...
This article explains an implementation method for giving Minecraft NPCs natural language-based situational judgment and response capabilities by running a local LLM (Nemotron 9B) ...
This article explains the design and implementation of a 2-stage pipeline that generates content with Nemotron 9B and refines and fact-checks it with Gemini 2.5 Flash. We also intr...