Daily Tech News

Curated AI & dev news from 15+ international sources

hardware

CUDA for AMD Lemonade, Intel Arc Pro Linux Gains, XPU Manager 2.0

Today's top GPU news highlights include AMD's Lemonade SDK gaining NVIDIA CUDA support, significant performance improvem...

hardware

Vortex 3.0 RISC-V GPGPU, Pragtical SDL GPU Backend, NVIDIA RTX Spark Launch

Today's top stories highlight significant advancements in open-source GPU hardware with Vortex 3.0 adding a 3D pipeline ...

hardware

Linux 7.1 Boosts Intel Arc, Flatpak Integrates ROCm, Vintage AMD Driver Refined

Recent developments enhance GPU performance and accessibility, with the Linux 7.1 kernel providing significant gains for...

hardware

Linux Kernel & Mesa Boost GPU Gaming, Vulkan Video Decoding in Firefox

This week's highlights feature significant advancements in Linux GPU performance, with kernel scheduler patches aimed at...

hardware

New AMD Anti-Lag for RADV, Ape Vulkan Driver in Zig, and Linux DRM Security Fixes

This week brings significant updates in GPU drivers and Linux kernel patches, enhancing performance and addressing criti...

hardware

Rust for CUDA Kernels, NVIDIA Nova, and AMDGPU Driver Updates in Linux 7.2

This week's highlights include a promising Rust-to-CUDA compiler, CUDA-Oxide 0.2, enabling safer GPU kernel development....

hardware

AMD GPU Benchmarks, HDMI 2.1 FRL Driver, and Multi-Device AI with GAIA on Linux

This week's highlights include Phoronix benchmarks of AMD's new Radeon RX 9070 GRE/XT on Linux 7.1 with Mesa 26.1, and A...

hardware

GPU Driver & Compiler Updates: RADV 100% Pixel Throughput, KRAID for Mali, Ubuntu ROCm SRU

Today's top stories highlight significant advancements in GPU drivers and compilers, including a groundbreaking 100% pix...

hardware

NVIDIA's NVK Vulkan Driver Boosts Mesh Shaders; Wayland Dominates Linux Desktops; Jetson Updates for Physical AI

Today's top GPU news highlights a significant open-source driver update for NVIDIA, a major architectural shift in Linux...

hardware

NVIDIA RTX Spark Superchip Unveiled, NBD-VRAM for GPU Swap, Local AI on RTX

This week, NVIDIA launched the RTX Spark superchip for desktops and laptops, boosting local AI capabilities. Developers ...

hardware

Next-Gen AV2 v1.0 Video Spec; Wine-Staging 11.10 Fixes Linux GPU Display; NVIDIA's Power-Efficient AI Factories

Today's top stories feature the release of the AV2 v1.0 specification, a foundational update for next-generation video c...

hardware

AMD Linux 7.2 Graphics & SteamOS VRR Drivers, NVIDIA Vera CPU Benchmarks

This week's top stories feature significant driver updates for AMD GPUs on Linux, including kernel enhancements for Linu...

hardware

AMD ROCm 7.2.4, Radeon Software 26.12, & Fwupd 2.1.4 Boost Linux GPU Support

AMD releases ROCm 7.2.4 with performance fixes and Radeon Software 26.12 with Ubuntu 26.04 support. Fwupd 2.1.4 also add...

hardware

Intel Arc & Arm Mali: New GPUs, Drivers & Benchmarks for Linux

Today's highlights cover new Intel Arc GPU benchmarks on Linux, the launch of Intel's Arc G-Series for handhelds, and a ...

hardware

CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

NVIDIA releases CUDA Toolkit 13.3, bringing new features and optimizations for GPU developers. Meanwhile, an AI system d...

hardware

FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

This week, discover a deep dive into FlashAttention CUDA kernel implementation for O(N) memory efficiency and a reported...

hardware

PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

Today's top stories reveal significant advancements in GPU performance optimization, with a 5x inference speedup on B200...

hardware

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

Today's top GPU news features detailed undervolting benchmarks for the NVIDIA RTX 5080, insights into a CGO-free CUDA Dr...

hardware

AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

This week's top GPU news features AMD's ambitious memory specifications for its upcoming Ryzen AI MAX 400 'Gorgon Halo' ...

hardware

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

NVIDIA's upcoming RTX 5090 cooling solutions are detailed, while driver-level optimizations like Resizable BAR deliver s...

hardware

Go+CUDA Optimization, LLM VRAM Benchmarks & NVIDIA G-SYNC Firmware 1.1.6

Today's top hardware news features significant advancements in GPU software optimization and performance. Discover how G...

hardware

LLM Compilers, GGUF Quantization, & Radeon RX 9060 Benchmarks

This week's top GPU news covers deep technical insights into LLM compiler autotuning for CUDA, practical benchmarks for ...

hardware

Intel Xe3P Leaks 160GB LPDDR5X; FlashAttention-2 in CuTe & Custom CUDA GPT-2 Engine

Intel's Xe3P "Crescent Island" GPU leaks reveal 160GB LPDDR5X VRAM, sidestepping HBM shortages and showcasing a powerful...

hardware

GPU Bottleneck Analyzer, NVIDIA Rubin VRAM Demands, and Qwen VRAM Optimization

This week's top GPU news features a new open-source tool for identifying PyTorch/CUDA bottlenecks, critical insights int...

hardware

GPU Hardware & Driver Update: RTX 5090 Benchmarks, llama.cpp MTP, Windows 11 Fix

This week's top GPU news features practical performance optimization on NVIDIA's RTX 5090, a critical driver fix for Win...

hardware

CUDA Cutile-rs Beta, AMD FSR 4.1 Release, & Forza Horizon 6 GPU Benchmarks

This week's top stories feature the beta launch of Cutile-rs, a Rust-based CUDA library leveraging Blackwell architectur...

hardware

Custom CUDA Kernels, Modded RTX 4090 48GB VRAM, & DLSS DLL Manager

This week's top stories dive into optimizing GPU performance, from architecting custom CUDA kernels for edge inference t...

hardware

RTX 5090, LLaMA.cpp TurboQuant, & Blackwell CUDA Scheduling Boosts GPU Performance

NVIDIA's new RTX 5090 introduces 32GB GDDR7 with advanced cooling, while the Blackwell architecture enhances CUDA throug...

hardware

AMD RDNA 4 & AI PRO GPUs Launch, FSR 4.1 Benchmarks, DGX Water Cooling

This week's top stories feature new AMD Radeon RDNA 4 and AI PRO GPU launches with details on VRAM and cooling, alongsid...

hardware

RTX 5080 Launched, Rust for CUDA, & LLM GPU Scheduling Deep Dive

This week's top GPU news highlights a new GeForce RTX 5080 variant, alongside advancements in GPU programming tools and ...

hardware

DeepSeek-V4-Flash Benchmarks, FlashRT CUDA Runtime, & V100 LLM Performance

This week highlights significant advancements in GPU-accelerated AI inference, with new benchmarks for optimized LLMs an...

hardware

CUDA-Oxide 0.1, RTX 5070 Launch, & BeeLlama.cpp Boost 3090 Inference

NVIDIA makes strides in developer tools with a Rust-to-CUDA compiler, while ZOTAC quietly launches an RTX 50 series GPU....

hardware

CUDA-Oxide 0.1 Lands; RTX 5090 Launches with 32GB & Hits 600 Tok/s

NVIDIA introduces CUDA-Oxide 0.1, an experimental Rust-to-CUDA compiler. Concurrently, the AORUS RTX 5090 INFINITY 32G o...

hardware

AMD MI350P, CUDA WarpReduction, & Adrenalin 26.5.1 Driver Updates

This week in hardware, AMD unveils the Instinct MI350P accelerator bringing CDNA 4 to PCIe cards, signaling new advancem...

hardware

RTX 5080 Sighted, ROCm 7.2.3 Released, & AMD RDNA4 Linux Drivers Emerge

Early sightings of NVIDIA's RTX 5080 mark a new GPU generation, while AMD pushes software with ROCm 7.2.3 and preps Linu...

hardware

AMD Ryzen AI Max+ PRO 495 Leak, RTX 5080 Tease, & Interactive CUDA Lessons

Today's highlights feature significant leaks on AMD's upcoming Ryzen AI Max+ PRO 495 APU with an integrated Radeon 8065S...

hardware

GPU Hardware & Drivers: Blackwell LLM Benchmarks, FPGA LLM Costs, AMDGPU HDMI 2.1

This week features practical GPU benchmarks on NVIDIA Blackwells for LLM inference, a deep dive into low-cost FPGA alter...

hardware

RTX 3090 vLLM Local LLM Speeds, NVIDIA NIM Inconsistencies, AMD Mesa Driver Plan

This week features new benchmarks for local LLM inference on the RTX 3090 using native vLLM for high token generation sp...

hardware

PFlash VRAM Optimization, NVIDIA 5090 NVFP4 Benchmarks, AMD HDMI 2.1 Linux Drivers

This week features a practical VRAM optimization technique achieving 10x speedup on NVIDIA GPUs, early benchmarks for NV...

hardware

GPU Hardware, VRAM Optimization & Next-Gen Driver Updates

This week features a deep dive into VRAM efficiency with a new Triton-based KV-cache compression engine, a look at DLSS ...

hardware

FlashQLA Kernels Accelerate AI; NVIDIA & AMD Unveil New GPUs

This week, Qwen introduced FlashQLA, high-performance attention kernels offering significant speedups for AI inference a...

hardware

NVIDIA RTX 5070 Laptop GPU Launches; AMD Preps AI Scheduler; Qwen GGUF Benchmarks

NVIDIA unveils the GeForce RTX 5070 Laptop GPU with GDDR7 memory, signaling a new era for mobile graphics. Meanwhile, AM...

hardware

CUDA & VRAM Optimization Shine: Custom Kernels, DFlash Throughput, Single-GPU LLM Arch

Today's highlights include cutting-edge CUDA developments for VRAM optimization, with a custom kernel for 1.58-bit terna...

hardware

RTX 5090 LLM 100 tps Benchmarks, RTX 5060 Ti eGPU with TBT5/OCuLink, NVIDIA Frame Gen

Today's top hardware news features cutting-edge GPU performance: NVIDIA's RTX 5090 clocks 100 tps with 256k context for ...

hardware

FlashAttention CUDA Speedup, RTX 5090 LLM Performance, & NVIDIA Blackwell GPU Launch

This week's top GPU news features a 40% FlashAttention speedup via CUDA memory optimization, breakthrough LLM inference ...

hardware

RTX 4090 Cooling, LLM KV Cache Quantization, & Deepseek V4 Flash Models

Today's highlights include a deep dive into optimal GPU cooling solutions for the RTX 4090, alongside advanced VRAM opti...

hardware

Deepseek TileKernels, RTX 3090 LLM Benchmarks & Nvidia Inference Dashboard

This week's top stories include Deepseek's new open-source CUDA kernel library for LLM inference, impressive Qwen3.6-27B...

hardware

CUDA Triton Optimization, RTX Remix VFX Update, and VSR Benchmarks

This week, we dive into advanced GPU optimization with custom Triton kernels for real-time AI inference on NVIDIA L4 GPU...

hardware

NVIDIA Pushes GPU Tech: DLSS 4.5, Streamline 2.11.1 SDKs & RTX Remix Updates

NVIDIA bolsters its GPU software ecosystem with the release of DLSS 4.5 SDK, featuring Dynamic Multi Frame Generation, a...

hardware

NVIDIA Vera Rubin 192GB SOCAMM2 Memory, SASS Reverse Engineering, & CUDA Kernel Dev

SK hynix has commenced mass production of 192GB SOCAMM2 memory for NVIDIA's future Vera Rubin platform, signaling a sign...

hardware

CUDA Kernels in Python, GDDR7 Memory Breakthrough, and Radeon RX 9060 XT Launch

This week brings significant advancements in GPU technology with a new Pythonic DSL for CUDA kernel development, a cruci...

hardware

NVIDIA Path Tracing, AMD RDNA 4m Drivers, & GPU MoE Offloading Benchmarks

This week features significant GPU advancements: NVIDIA's GDC presentation reveals faster path tracing techniques, while...

hardware

Qwen3.6 GGUF, RTX 4080 Cooling & Pragmata GPU Benchmarks Drive Performance

Today's highlights feature critical benchmarks for Qwen3.6 GGUF quantization, demonstrating significant VRAM optimizatio...

hardware

NVIDIA DLSS 4 & RTX VSR Updates, CUDA Shared Memory Optimization Challenges

This week, NVIDIA users get practical updates with DLSS 4 integration in Fortnite and an RTX VSR workaround for Edge, wh...

hardware

NVIDIA 50-Series GDDR7 Rumors, Mesa 26.1 AMD APU Drivers, WebGPU 1-bit LLMs

This week, NVIDIA's next-gen RTX 5060/Ti are rumored to adopt 9GB GDDR7 VRAM, signaling future memory bandwidth improvem...

hardware

LLM Auto-Tunes llama.cpp, SASS Latency Analysis, DLSS Frame Gen for RTX 40

This week features a significant performance boost for local LLMs via an AI-driven `llama.cpp` flag tuner. We also dive ...

hardware

CUDA-Accelerated EEG, AMD RX 9070 XT Power Melts, & Strix Halo LPDDR5X Specs

This week, discover a practical CUDA project for EEG data acceleration, crucial for scientific computing. Additionally, ...

hardware

CUDA Kernel Optimization & GPU Power Efficiency Tools

This week features cutting-edge CUDA kernel development, including an open-source repo for AI agents and theoretical ins...

hardware

RTX 5090 cuBLAS Bug, Neural Texture Compression, Multi-GPU vLLM Inference

Today's highlights include a significant performance bug found in cuBLAS for the unreleased RTX 5090, alongside a deep d...

hardware

CUDA SGEMM Bug on RTX 5090, Kernel-Fusing for SGEMV, & Radeon RX 9070 XT Price Surge

This week's top GPU news includes a critical cuBLAS performance bug affecting SGEMM on the NVIDIA RTX 5090, a deep dive ...

hardware

LLM GPU Breakthroughs: RT Cores, Llama.cpp Parallelism, AMD Optimizations

This week's top GPU news features innovative techniques for accelerating LLMs, including a novel use of NVIDIA RT Cores ...

hardware

New AMD RX 9000 GPUs, DLSS/FSR Mod, & Deep Dive into CUDA LLVM Bitcode

This week features the expansion of AMD's RX 9000 GPU lineup with new Sapphire models entering the market. NVIDIA and AM...

hardware

CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

This week's top GPU news features deep dives into CUDA memory optimization techniques with guides on GPU memory hierarch...

hardware

Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

This week, NVIDIA developers received a deep dive into optimizing Hopper/Blackwell Tensor Cores for enhanced memory band...

hardware

Local LLMs, Rust CUDA Kernels, & K8s GPU Drivers: Build More with Less

This week, we dive into accelerating local LLMs like Gemma 4 on RTX, exploring the cutting edge of Rust for CUDA kernel ...