Deep Dive

In-depth technical articles on AI, GPU inference, and developer tools

GPU & Inference

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC...

GPU & Inference

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

Today's Highlights In 2026, AI's evolution is remarkable, with accelerated adoption in...

GPU & Inference

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

Hello everyone! I'm soy-tuber, an AI researcher and individual developer. I usually push my RTX 5090...

GPU & Inference

The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

If you've been following my recent posts, you might have seen my repository and the issue I opened on...

GPU & Inference

Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

Disclaimer: This is an experimental build, not a production setup. NemoClaw is early-stage, the...

GPU & Inference

Hardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations

Introduction: Gemini Flash Equivalent Locally? The Despair of a Slow Development...

GPU & Inference

RTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison

I've been running Nemotron Nano 9B v2 Japanese on an RTX 5090 with vLLM 0.15.1 and wanted to share...

GPU & Inference

Shogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games

What is dlshogi? dlshogi is a Shogi engine incorporating deep learning, consisting of a...

GPU & Inference

Practical Guide to Running Nemotron-Nano-9B-v2-Japanese with vLLM and Integrating it into Your Custom Application via an Open...

Introduction Recently, an article on Qiita titled "Running Nemotron-Nano-9B-v2-Japanese...

GPU & Inference

Personal AI Development Environment Built with RTX 5090 + WSL2 — A Practical Setup Fully Utilizing 32GB GPU

Why RTX 5090 + WSL2? The 32GB VRAM of the RTX 5090 is a practical choice for local...

GPU & Inference

Individual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090

13 Project List The portfolio consists of the following categories: Legal...