Etash Guha
Researcher
I research how to design training data curation protocols for training large text and image models. This includes synthetic data generation, data filtering, and online data sampling
Education
Aug. 2024
Ph.D. in Computer Science and Engineering
University of Washington, Seattle, Washington
Researching data curation
Aug. 2018 — May 2022
B.S. in Computer Science
Georgia Institute of Technology, Atlanta, GA
Overall GPA: 3.97/4.00, Threads in Intelligence and Theory
Industry Research Experience
Oct. 2023 - Current
SambaNova Systems, Palo Alto, California
Researcher, NLP
Initiated the improvement of reliability and safety of open-source Large Language Models for large customers
May 2021 — May 2023
SambaNova Systems, Palo Alto, California
Research Intern, ML for PnR
Developed Learned Cost Model using Graph Neural Networks to predict quality of chip placement, beating several man-made heuristics
May 2020 — Aug. 2021
FORT LP, New York City, New York
Quantitative Research Intern, Transaction Analysis Group
Implemented Pipeline for analyzing slippage data and Neural Net strategy for Price prediction with NLP data
May 2019 - Aug. 2019
SAS, Cary, North Carolina
Software Engineering Intern, SAS Model Manager
Integrated Bidirectional Encoder Representations from Transformers NLP Model into SAS Products using PyTorch
Jan. 2019 - May 2019
Parmonic, Atlanta, Georgia
Using Python with libraries such as SciKit and OpenCV for Data and Video Analysis
Academic Research Experience
May 2023 - Oct. 2023
RIKEN Project for Artificial Intelligence, Tokyo, Japan
Advisor:
Mohammad Emtiyaz Khan
Collaborated with the Approximate Bayesian Inference Team under Emtiyaz Khan to use Bayesian principles to push ML safety. Developing simple and practical Uncertainty Quantification methods for Deep Learning resulting in ICLR 2024 submission.
May 2022 - May 2023
Georgia Institute of Technology, Atlanta, Georgia
Advisor:
Jacob Abernethy,
Xiaoming Huo
Developped Generalization Bounds for Magnitude-Based Pruning. Proved that Sparse and Wide Neural Networks Forget Less. Developped an No-Regret Framework to solve Robust Markov Decision Processes.
Aug 2018 - May 2022
Georgia Institute of Technology, Atlanta, Georgia
Advisor:
Jacob Abernethy,
Vidya Muthukumar,
Ashwin Pananjady
Working on designing general class of frameworks for two player Fenchel Games to model perceptron algorithms. Tested a Hebbian Plasticity based learning system and analyzed its computational capacity. Analyzing Efficient Algorithm for generating Conformal Prediction Sets. Developed an Inverse Reinforcement Learning algorithm for Linear Stochastic Bandits. Developed a learned methodology to efficiently and accurately generate solutions to NP-Hard problems.
Aug. 2019 - May 2019
IVALab, Atlanta, Georgia
Undergraduate Research Assistant, IVALab
Advisor:
Patricio Vela
Developed a more efficient autonomous exploration method for robots with 9.8% increased accuracy over standard Frontier Based Exploration using ROS and C++
Honors and Awards
2018-2022
Stamps President's Scholarship at Georgia Tech
Full-Ride Merit Scholarship given to 40 Freshman at Georgia Tech
2018-2022
Faculty List
Awarded to students with 4.0 GPA
2023
Fatima Fellowship
An International Mentorship Program for Aspiring Researchers in Computer Science given to 40 students
Patents
2022
US Patent on "Learned Cost Models For Performance Optimization On Dataflow Architecture"
Awarded a Patent based on work done at SambaNova Systems.
2022
US Patent on "Performance Optimization of Dataflow Processors"
Awarded a Patent based on work done at SambaNova Systems
2022
US Patent on "Estimating Resource Costs for Computing Tasks for a Reconfigurable Dataflow Computing System"
Awarded a Patent based on work done at SambaNova Systems
2018
US Patent on "Volume controllable toilet flush systems and methods of use"
Awarded a Patent based on novel toilet design for adjustable water usage
Publications
Conference
C11
Anas Awadalla,
Le Xue,
Oscar Lo,
Manli Shu,
Hannah Lee,
Etash Guha,
Matt Jordan,
Sheng Shen,
Mohamed Awadalla,
Silvio Savarese,
Caiming Xiong,
Ran Xu,
Yejin Choi,
Ludwig Schmidt
Conference on Neural Information Processing Systems Dataset and Benchmarks Track.
C10
Jeffery Li,
Alex Fang,
Georgios Smyrnis,
Maor Ivgi,
Matt Jordan,
Samir Gadre,
Hritik Bansal,
Etash Guha,
...,
Achal Dave,
Ludwig Schmidt,
Vaishaal Shankar
Conference on Neural Information Processing Systems Dataset and Benchmarks Track.
C9
Etash Guha,
Vihan Lakshman
International Conference of Machine Learning 2024; ICLR 2024 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning.
C8
Etash Guha,
Shlok Natarajan,
Thomas Möllenhoff,
Emtiyaz Khan,
Eugene Ndiaye
International Conference of Learning Representations 2024.
C7
Etash Guha,
Prasanjit Dubey,
Xiaoming Huo
ICLR 2024 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning.
C6
Etash Guha
Transactions of Machine Learning Research.
C5
Etash Guha,
Jim James,
Krishna Acharya,
Ashwin Pananjady,
Vidya Muthukumar
Conference on Uncertainty in Artificial Intelligence 2024.
C4
Etash Guha,
Eugene Ndiaye,
Xiaoming Huo
International Conference of Machine Learning 2023.
C3
Guanghui Wang,
Rafael Hanashiro,
Etash Guha,
Jacob Abernethy
International Conference of Learning Representations 2023.
C2
Haoran Sun,
Etash Guha,
Hanjun Dai,
Le Song
International OPT Workshop on Optimization for Machine Learning @ NeurIPS 2023.
C1
Etash Guha,
Tianxiao Jiang,
Andrew Deng,
Muthu Annamalai,
Jian Zhang
Design Automation Conference (poster) 2022.
Poster
P1
Etash Guha,
Patricio Vela
National Conference of Undergraduate Research 2019.
Service
Reviewer
International Conference of Learning Represenations
(ICLR)
2024
Conference on Neural Information Processing Systems
(NeurIPS)
2023
Conference on Artificial Intelligence and Statistics
(AISTATS)
2024
Optimization for Machine Learning Workshop @ NeurIPS
(OPT)
2023
Duality Principles for Modern Machine Learning Workshop @ ICML
(DP4ML)
2023