Noveen Sachdeva

I'm a research scientist at Google DeepMind where I work on all things, pretraining data for our flagship Gemini & Gemma series of models.

I received my PhD from UC San Diego advised by Prof. Julian McAuley working on data efficient machine learning. In past life, I was an undergrad at IIIT Hyderabad.

Google Scholar  /  Twitter

profile photo
Selected Research
How to Train Data-Efficient LLMs
Noveen Sachdeva, Ben Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed Chi, James Caverlee, Julian McAuley, Derek Cheng
arXiv, 2024
arXiv /
							@article{sachdeva2024askllm,
								title={How to Train Data-Efficient LLMs},
								author={Sachdeva, Noveen and Coleman, Benjamin and Kang, Wang-Cheng and Ni, Jianmo and Hong, Lichan and Chi, Ed H and Caverlee, James and McAuley, Julian and Cheng, Derek Zhiyuan},
								journal={arXiv preprint arXiv:2402.09668},
								year={2024}
							}
						
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley
arXiv, 2023
arXiv /
							@article{sachdeva2023farzi,
								title={Farzi Data: Autoregressive Data Distillation},
								author={Sachdeva, Noveen and He, Zexue and Kang, Wang-Cheng and Ni, Jianmo and Cheng, Derek Zhiyuan and McAuley, Julian},
								journal={arXiv preprint arXiv:2310.09983},
								year={2023}
							}
						
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley
The Web Conference (WWW), 2024
arXiv /
                                                @article{sachdeva2024policy,
                                                        title={Off-Policy Evaluation for Large Action Spaces via Policy Convolution},
                                                        author={Sachdeva, Noveen and Wang, Lequn and Liang, Dawen and Kallus, Nathan and McAuley, Julian},
                                                        booktitle={Proceedings of the Web Conference 2024},
                                                        series={WWW '24},
                                                        year={2024}
                                                }
                                        
Data Distillation: A Survey
Noveen Sachdeva, Julian McAuley
TMLR, 2023
Open Review / arXiv /
						@article{sachdeva2023data,
							title={Data Distillation: A Survey},
							author={Sachdeva, Noveen and McAuley, Julian},
							journal={Transactions on Machine Learning Research},
							year={2023},
							url={https://openreview.net/forum?id=lmXMXP74TO},
							note={Survey Certification}
						}
					
Infinite Recommendation Networks: A Data-Centric Approach
Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, Julian McAuley
NeurIPS, 2022
Open Review / arXiv / Code (∞-AE) / Code (Distill-CF) / Slides /
						@inproceedings{sachdeva2022b,
							title={Infinite Recommendation Networks: A Data-Centric Approach},
							author={Sachdeva, Noveen and Dhaliwal, Mehak Preet and Wu, Carole-Jean and McAuley, Julian},
							booktitle={Advances in Neural Information Processing Systems},
							publisher={Curran Associates, Inc.},
							year={2022}
						}
					
Research Internships
Google DeepMind
Jun 2023 - Aug 2024
w/ W.C. Kang & D. Cheng
Netflix Research
Jun 2022 - Sep 2022
w/ D. Liang & N. Kallus
Pinterest
Jun 2021 - Sep 2021
w/ Jiajing Xu
Microsoft Research
Jan 2020 - Jun 2020
w/ Manik Varma
UC San Diego
Aug 2019 - Oct 2019
w/ Julian McAuley
Cornell University
Jun 2019 - Jul 2019
w/ Thorsten Joachims
Education
UC San Diego
Ph.D. in Computer Science & Engineering
San Diego, CA   ·   2020 - 2024
IIIT Hyderabad
B.Tech & M.S. (by research) in Computer Science & Engineering
Hyderabad, India   ·   2015 - 2020

Credits to Jon Barron for creating this minimalist template!