LLM inference infrastructure for a systems audience