rules_derive: deriving using macro_rules

MatX: Faster Chips for LLMs