A Survey of AI Agent Protocols

Backslash: Rate Constrained Optimized Training of Large Language Models

LLMs for Materials and Chemistry: 34 Real-World Examples

There Ain’t No Such Thing as a Free Custom Memory Allocator

Self Rewarding Self Improving: Autonomous LLM Improvement

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

eqsat: An Equality Saturation Dialect for Non-destructive Rewriting

Structuring Competency-Based Courses Through Skill Trees

Human-Like Episodic Memory for Infinite Context LLMs

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats

The Algebra of Patterns (Extended Version)

Analyzing Modern Nvidia GPU Cores

CMU TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

RVSDG: An Intermediate Representation for Optimizing Compilers (2019)

Non-control-Data Attacks and Defenses: A review

Should We Respect LLMs? A Study on Influence of Prompt Politeness on Performance

My prediction after GPT-4o image generation

arXiv moving from Cornell servers to Google Cloud

Flat origami is Turing complete (2023)

A study of undefined behavior across foreign function boundaries in Rust libraries

Can reinforcement learning for LLMs scale beyond math and coding tasks? Probably

LLMs for Engineering: Teaching Models to Design High Powered Rockets

Garbage Collection for Rust: The Finalizer Frontier

Pydrofoil: Accelerating Sail-based instruction set simulators

Vision Transformers Need Registers

Lossless LLM compression for efficient GPU inference via dynamic-length float

The Leaderboard Illusion

Beyond Performance: Measuring the environmental impact of analytical databases

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

More →