Learning from limited labeled data
Despite several recent advancements in unsupervised-learning, the performance of modern day deep learning methods still relies heavily on the size of labeled datasets. However, large-scale dataset annotation is both monotonous and painstaking. One of the focus of our group is to enable Deep Learning for various structured prediction tasks related to Speech and Natural Language processing in low resource and data-constrained settings. Our research explores better ways of harnessing human supervision beyond simply collecting gold labels [ICLR 2020, AAAI 2020], targeted and efficient data collection [ICASSP 2021], and adaptation of ML models trained in data-abundant domains to data-constrained domains [ACL 2021, NAACL 2021, Interspeech 2020]
Publications
- Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study
In ACL 2021, with Yash Khemchandani, Sarvesh Mehtani, Vaidehi Patil, Abhijeet Awasthi, Partha Talukdar, and Sunita Sarawagi
[Paper] [Code]
- Error-driven Fixed-Budget ASR Personalization for Accented Speakers
In ICASSP 2021, with Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, and Preethi Jyothi
[Paper] [Code] [Talk 📢]
- Training Data Augmentation for Code-Mixed Translation
In NAACL 2021, with Abhirut Gupta, Aditya Vavre, Sunita Sarawagi
[Paper] [Data]
- Black-box Adaptation of ASR for Accented Speech
In Interspeech 2020, Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi
[Paper] [Code]
- Learning from Rules Generalizing Labeled Exemplars
In ICLR 2020 (Spotlight), with Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, and Sunita Sarawagi
[Paper] [Code and data] [Talk 📢]
- Data Programming using Continuous and Quality-Guided Labeling Functions
In AAAI 2020, with Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi
[Paper] [Code and data]
- Labeled Memory Networks for Online Model Adaptation
In AAAI 2018, with Shiv Shankar and Sunita Sarawagi
[Paper]
Collaborators