Hello, I’m Mayee!

I am a final-year PhD student in Computer Science at Stanford University, advised by Prof. Christopher Ré and part of the Hazy Research Lab.

I’m interested in studying and improving the fundamentals of modern machine learning through data (often known as data-centric AI). On the model training side, I work on data mixing, curriculum learning, synthetic data, and data labeling. On the inference side, I work on techniques such as ensembling, routing, and verification. Currently, I am thinking about how to develop and operationalize a more principled understanding of how models learn from data (what skills does data teach the model? Does it matter if the data is synthetic or real?) I am currently a research intern at the Allen Institute for Artificial Intelligence (AI2), driving data mixing efforts for their OLMo open-source language models.

Previously, I graduated summa cum laude from Princeton University with a concentration in Operations Research and Financial Engineering (ORFE) and a certificate in Applications of Computing, where I worked with Prof. Elad Hazan and Prof. Miklos Racz.

Please get in touch with me via email if you would like to chat about research or collaboration!

Publications and Preprints


For a chronological order of my publications, please check out my Google Scholar/CV.

Training Data

Test-time Techniques

Data Labeling

Data Representations

Science/Health Applications

Model evaluation

Older