UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms

Anatomy of a high-performance EP kernel

The Economics of Speculative Decoding

Speculative KV coding: losslessly compressing KV cache by up to ~4×

Bringing Up DeepSeek-V4-Flash on AMD MI300X

Also-RANS: Asymmetric Numeral Systems for Entropy Coding