Writing an LLM compiler from scratch [Part 2]: Lowering to a GPU Schedule