Publications
* denotes equal contribution. Authorship in robust statistics is in alphabetical order.
Language Models
On-policy self-distillation via prompt optimization
Under review.
Looped diffusion language models
Under review.
Effective test-time scaling of discrete diffusion through iterative refinement
Under review. SPIGM workshop at ICML 2026.
Alignment as distribution learning: your preference model is explicitly a language model
Under review. FoPT workshop at COLT 2025.
Not all bits are equal: scale-dependent memory optimization strategies for reasoning models
ICLR 2026. ER workshop at NeurIPS 2025 (spotlight).
Can large language models develop strategic reasoning? Post-training insights from learning chess
ScalR workshop at COLM 2025.
Lexico: extreme KV cache compression via sparse coding over universal dictionaries
ICML 2025. ICLR Workshop on Sparsity in LLMs (spotlight).
Task diversity shortens the ICL plateau
TMLR 2025.
Robust Statistics
GLM regression with oblivious corruptions
COLT 2023.
ReLU regression with Massart noise
NeurIPS 2021.
Teaching
I have taught the following courses as a TA (*Head TA) in my undergraduate and graduate years.
UC Berkeley
UW Madison
Fa19: CS240 (Discrete Math)
Sp20, Sp21*: CS577 (Algorithms)
Fa20: CS787 (Advanced Algorithms)