Hi, I'm Etash Guha
I'm a Ph.D. student at the University of Washington Computer Science and Engineering Department.
I research how to design and improve training data curation protocols for training large text and image models. This includes synthetic data generation, data filtering, and online data sampling. I am extremely fortunate to be advised by the amazing Professors Ludwig Schmidt and Yejin Choi. I'm graciously supported by the NSF Graduate Research Fellowship.
I was a researcher at SambaNova Systems working on the reliability of Large Language Models. Most recently, I was a research intern under Dr. Emtiyaz Khan on the Approximate Bayesian Inference Team at RIKEN AIP in Tokyo, Japan. I was both an undergraduate student and research assistant at Georgia Tech where I worked with Vidya Muthukumar, Ashwin Pananjady, Jacob Abernethy, and Xiaoming Huo.
I have worked with researchers, traders, and software engineers while working at SambaNova Systems, FORT LP, and SAS.
Featured Research Publications
DataComp-LM
Conference on Neural Information Processing Systems Dataset and Benchmarks Track