Large Language Models, Knowledge Distillation, Mathematical Reasoning, Chain-of-Thought, Program-of-Thought
neural architecture search, Neural networks, smoothness, knowledge distillation, sharpness-aware minimization