Hi, I'm Etash Guha
I'm a Ph.D. student at the University of Washington Computer Science and Engineering Department.
I research how to design and improve training data curation protocols for training large text and image models. This includes synthetic data generation, data filtering, and online data sampling.
I am also currently a Researcher at SambaNova Systems working on the reliability of Large Language Models. Most recently, I was a Research Intern under Dr. Emtiyaz Khan on the Approximate Bayesian Inference Team at RIKEN AIP in Tokyo, Japan. I was both an undergraduate student and Research Assistant at Georgia Tech where I worked with Vidya Muthukumar, Ashwin Pananjady, Jacob Abernethy, and Xiaoming Huo.
I have worked with researchers, traders, and software engineers while working at SambaNova Systems, FORT LP, and SAS.
Featured Research Publications
DataComp-LM
Conference on Neural Information Processing Systems Dataset and Benchmarks Track
Conformalization of Sparse Generalized Linear Models
International Conference of Machine Learning 2023
Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix Sketching
ICLR 2024 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning
Solving Robust MDPs through No-Regret Dynamics
Transactions of Machine Learning Research
Inverse Reinforcement Learning
Conference on Uncertainty in Artificial Intelligence 2024
On Accelerated Perceptrons and Beyond
International Conference of Learning Representations 2023