Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Building a zero copy GPU accelerated screen recorder for linux (wayland)

Built a complete out-of-tree LLVM backend for a custom 32-bit SIMT GPU ISA

Burn wgpu draining my gpu memory

Building a GPU-based browser renderer from scratch (no Skia, no Chromium) — static Google homepage rendering

Zero-TVM: Replaced a TVM compiler pipeline with 10 hand-written GPU shaders — Phi-3 still runs in the browser

AURORA GPU RENDERING UPDATE RUST BROWSER ENGINE

# I spent 6 days building Zweriz — a scripting language for GPU code that's as easy to write as a Python script [early project, Linux only]

WebGPU in a browser beats PyTorch on a datacenter GPU – paper + live benchmarks

Rust Threads on the GPU

MSI Warns RAM Shortage Is Reducing GPU Supply by 20%

Matlab Alternatives 2026: Benchmarks, GPU, Browser and Compatibility Compared

$500 GPU outperforms Claude Sonnet on coding benchmarks

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

Pool spare GPU capacity to run LLMs at larger scale

Fun with CSF firmware (RK3588 GPU firmware)

AutoKernel: Autoresearch for GPU Kernels

beamterm 1.0 - sub-millisecond GPU terminal rendering for native and web

A CPU that runs entirely on GPU

nabla — Pure Rust GPU math engine: PyTorch-familiar API, zero C++ deps, 4 backends

Launch HN: Chamber (YC W26) – An AI Teammate for GPU Infrastructure

Reverse Engineering Apple's GPU Energy Model on the M4 Max

Analyzing Nvidia GB10's GPU

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

Why is GPU Python packaging still this broken?

I've built a multi warp 16 lane SIMT GPU core

Nvidia dominates gaming GPU market with 95 percent share as sales of AMD Radeon graphics plummet to a historical low of 5 percent

Show HN: Eyot, A programming language where the GPU is just another thread

Single-kernel fusion: fusing sequential GPU dispatches into one yields 159x over PyTorch on the same hardware

More →