Modern GPU Programming for MLSys

Making AMD GPUs competitive for LLM inference (2023)

WebLLM: Llama2 in the Browser

Making AMD GPUs competitive for LLM inference

GPU-Accelerated LLM on an Orange Pi

Bringing Open Large Language Models to Consumer Devices

Vicuna on iPhone

Web LLM

What Is ML Compilation