Hi, I'm Shuyao. Chasing AGI with efficiency and agency.

RL Intern at Kimi; Incoming MSCS at Stanford. I explore two frontiers: improving the efficiency of singular models and designing agentic systems that are efficient and general.

Shuyao Xu

Experience

Kimi

Mar 2026 - Present
Intern · RL Team
  • Responsible for the coding post-training of agent swarm.

TikTok

Jun 2025 – Jan 2026
Machine Learning Engineer Intern
  • Improved account search relevance.

INF AI

Dec 2024 – May 2025
Research Intern · Host: Dr. Weidi Xu
  • Post-trained INFLogic-32B-RL via online RL. SOTA on ZebraLogicBench (85.1%).

Research

Efficiency

Shuyao Xu, C. Peng, J. Long, W. Xu, W. Chu, Y. Qi

Standard distillation discards incorrect teacher responses. We propose Reinforcement Distillation, utilizing negative reasoning traces as signals to improve student model performance on reasoning tasks.

Agency

Agentic Test-Time Scaling
Advisors: Prof. Yu Meng & Prof. Bryan Hooi

Parallel test-time scaling systems are usually dictated by human designs, which are not always optimal. We explore how LLM-powered agents can autonomously decide when and how to scale compute.

Blog: Fully Agentic Test-Time Scaling — LLM agents that autonomously decide when and how to scale compute.
Blog: Tournament-based Test-Time Scaling — Competition-driven reasoning to solve hard problems.

Education

Stanford University 2026 – 2028
M.S. in Computer Science
National University of Singapore 2022 – 2026
B.Comp in Computer Science (Honours) · GPA: 4.8 / 5.0

Teaching & Open Source

CS2103T Software Engineering TA NUS · Fall 2024 · Feedback: 4.4/5.0
MarkBind Contributor Features & mentoring junior contributors