Hi, I'm Etash Guha

I'm a Member of Technical Staff at Anthropic on the pretraining team.
I work on pretraining at Anthropic. Previously, I was a Ph.D. student in the Stanford CS Department, where I researched how to design and improve training data curation protocols for training large text and image models. I was extremely fortunate to be advised by the amazing Professors Ludwig Schmidt and Yejin Choi, and was supported by the NSF Graduate Research Fellowship.
I was both an undergraduate student and research assistant at Georgia Tech where I worked with Vidya Muthukumar, Ashwin Pananjady, Jacob Abernethy, and Xiaoming Huo.

Featured Research Publications

OpenThoughts
pre-print
DataComp-LM
Conference on Neural Information Processing Systems Dataset and Benchmarks Track