Ongoing Projects

More information can be found in the following theme specific pages.

Synthetic Tabular Data Generation: Synthetic Tabular data can be generative from Deep models which can replicate the distribution of original datasets. These generated data can be used in downstream tasks like high imbalanced data classification. We also work on improving diffusion based generative models to support constraints and generate data with temporal aspects like Transactions in them.
Algorithmic-Recourse: How can we train models to predict and guide recourse actions for input instances, where human intervention is employed to implement recommended adjustments? This begs the need for causal inference to model the impact of recourse actions on outcomes before issuing recommendations.
Text-to-Code: Accurately, efficiently, and reliably converting natural language questions to code such as SQL or a semantic parse. We address topics like adapting models to new databases via few-shot learning, synthetic data generation, template constrained decoding, confidence calibration, schema subsetting, handling ambiguous queries, and grammar constrained decoding.
Time-series: How best to forecast a predictive variable if we are interested in a specific aggregate measure of the forecasted values? For example, how do we model if we are not interested in the absolute values but only top quartile?
Domain: Machine Learning models under-perform when input domain changes; some examples of domain drift for an automatic speech recognition system are: new or unknown accent, dialect or vernacular. How do we generalize or adapt to new domains?
ML-service: ML Services are trained with one-size-fits-all policy and are deployed in the wild. How can we improve the generalizability of the service?
Limited Labeled Data: In what ways could we reduce the need of labeled data used for training large Deep Learning models? We explore ways of harnessing human supervision beyond collection of just the gold labels, effective data collection under strict budgets, exploting language relatedness for multilingual NLP, and adaptation of ML models trained in data-abundant domains to data-constrained domains.