PatentLLM Tech Blog

GPU Inference

Practical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an OpenAI-Compatible API

This article explains how to launch NVIDIA's Nemotron-Nano-9B-v2-Japanese with vLLM and integrate it into your custom application as an OpenAI-compatible API. It eliminates the nee...

GPU Inference

Individual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090

This article explains the common infrastructure design and resource management strategy for operating 13 projects, including Shogi AI, LLM applications, and legal systems, on a sin...

GPU Inference

Personal AI Development Environment Built with RTX 5090 + WSL2 — A Practical Setup Fully Utilizing 32GB GPU

This article explains how to build an AI development environment that maximizes the utilization of RTX 5090's 32GB VRAM in a WSL2 environment, allowing vLLM, TensorRT, Shogi AI, an...

GPU Inference

Shogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games

This is a record of operating the dlshogi Shogi engine on an RTX 5090 with TensorRT FP8 quantization. We explain the structure of the Fuka40B model, the effects of quantization, Fl...