Field Notes on Scaling MoE Expert Parallelism with DeepEP

Hermes 4

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark

Pre-training a 15B parameter language model over the internet

Hermes 3 – Nous Research

World_sim: LLM prompted to act as a sentient CLI universe simulator