Noveen Sachdeva

I'm a 4th-year PhD candidate at UC San Diego advised by Prof. Julian McAuley. I'm currently working at Google DeepMind on data-efficient LLMs.

My current research interests are data pruning (arXiv'24, WSDM'22), data distillation (arXiv'23, TMLR'23, NeurIPS'22), and offline bandits (WWW'24).

Google Scholar  /  Twitter  /  Github

profile photo
Recent News
[May'24] I passed my defense! My slides are here.
[Jan'24] "Off-Policy Evaluation for Large Action Spaces via Policy Convolution" accepted at WWW '24
[Jun'23] "Data Distillation: A Survey" accepted at TMLR '23
[Jun'23] Excited to start a research internship at Google DeepMind
[Sep'22] "Infinite Recommendation Networks" accepted at NeurIPS '22
[Jul'22] Gave a talk at Google Brain titled "Data-Centric Approaches to Recommendation" [slides]
Selected Research
How to Train Data-Efficient LLMs
Noveen Sachdeva, Ben Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed Chi, James Caverlee, Julian McAuley, Derek Cheng
arXiv, 2024
arXiv /
							@article{sachdeva2024askllm,
								title={How to Train Data-Efficient LLMs},
								author={Sachdeva, Noveen and Coleman, Benjamin and Kang, Wang-Cheng and Ni, Jianmo and Hong, Lichan and Chi, Ed H and Caverlee, James and McAuley, Julian and Cheng, Derek Zhiyuan},
								journal={arXiv preprint arXiv:2402.09668},
								year={2024}
							}
						
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley
arXiv, 2023
arXiv /
							@article{sachdeva2023farzi,
								title={Farzi Data: Autoregressive Data Distillation},
								author={Sachdeva, Noveen and He, Zexue and Kang, Wang-Cheng and Ni, Jianmo and Cheng, Derek Zhiyuan and McAuley, Julian},
								journal={arXiv preprint arXiv:2310.09983},
								year={2023}
							}
						
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley
The Web Conference (WWW), 2024
arXiv /
                                                @article{sachdeva2024policy,
                                                        title={Off-Policy Evaluation for Large Action Spaces via Policy Convolution},
                                                        author={Sachdeva, Noveen and Wang, Lequn and Liang, Dawen and Kallus, Nathan and McAuley, Julian},
                                                        booktitle={Proceedings of the Web Conference 2024},
                                                        series={WWW '24},
                                                        year={2024}
                                                }
                                        
Data Distillation: A Survey
Noveen Sachdeva, Julian McAuley
TMLR, 2023
Open Review / arXiv /
						@article{sachdeva2023data,
							title={Data Distillation: A Survey},
							author={Sachdeva, Noveen and McAuley, Julian},
							journal={Transactions on Machine Learning Research},
							year={2023},
							url={https://openreview.net/forum?id=lmXMXP74TO},
							note={Survey Certification}
						}
					
Infinite Recommendation Networks: A Data-Centric Approach
Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, Julian McAuley
NeurIPS, 2022
Open Review / arXiv / Code (∞-AE) / Code (Distill-CF) / Slides /
						@inproceedings{sachdeva2022b,
							title={Infinite Recommendation Networks: A Data-Centric Approach},
							author={Sachdeva, Noveen and Dhaliwal, Mehak Preet and Wu, Carole-Jean and McAuley, Julian},
							booktitle={Advances in Neural Information Processing Systems},
							publisher={Curran Associates, Inc.},
							year={2022}
						}
					
On Sampling Collaborative Filtering Datasets
Noveen Sachdeva, Carole-Jean Wu, Julian McAuley
[Oral] WSDM, 2022
ACM / Public PDF / Code / Slides /
							@inproceedings{sachdeva2022a,
								title = {On Sampling Collaborative Filtering Datasets},
								author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
								booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
								series = {WSDM '22},
								publisher = {Association for Computing Machinery},
								year = {2022}
							}
						
Experience
Google DeepMind
Research Intern   with   Dr. Wang-Cheng Kang & Dr. Jianmo Ni
Mountain View, CA   ·   Jun 2023 - Present

Data pruning for pre-training LLMs faster and better. [arXiv '24]

Netflix Research
Research Intern   with   Dr. Dawen Liang & Prof. Nathan Kallus
Los Gatos, CA   ·   Jun 2022 - Sep 2022

Scaling up off-policy evaluation for large action-spaces. [WWW '24]

Pinterest
Research Intern   with   Dr. Jinying Zhang & Dr. Jiajing Xu
San Francisco, CA   ·   Jun 2021 - Sep 2021

Improving the ads ranking systems through smart negative sampling strategies.

Microsoft Research
Research Intern   with   Dr. Manik Varma
Bengaluru, India   ·   Jan 2020 - Jun 2020

Incorporating graph patterns for scalable classifers at the million-label scale. [WWW '21]

UC San Diego
Research Assistant   with   Prof. Julian McAuley
San Diego, CA   ·   Aug 2019 - Oct 2019

Counter-intuitive: user-reviews don't seem to help generalization in recommender systems. [SIGIR '20]

Cornell University
Research Assistant   with   Prof. Thorsten Joachims
Ithaca, NY   ·   Jun 2019 - Jul 2019

How to perform off-policy learning when the logged data doesn't satisfy the overlap assumption? [KDD '20]

ICAR-CNR (National Research Council of Italy)
Research Assistant   with   Dr. Giuseppe Manco
Cosenza, Italy   ·   May 2018 - Jul 2018

Leveraging VAEs to model next-item user behavior in recommender systems. [WSDM '19]

Google Summer of Code
Summer Participant   with   ownCloud
Remote   ·   May 2017 - Aug 2017

Implemented a JS-library for Node.JS and the browser. [Media coverage]

Education
UC San Diego
Ph.D. in Computer Science & Engineering   ·   4.0
San Diego, CA   ·   2020 - Present
IIIT Hyderabad
B.Tech & M.S. (by research) in Computer Science & Engineering   ·   9.75 / 10.0
Hyderabad, India   ·   2015 - 2020

Thanks to Jon Barron for the minimalist template!