Purvasha Chakravarti



About Me

I am a Lecturer (Assistant Professor) in Statistical Science in the Department of Statistical Science at University College London (UCL). I joined UCL in June 2022, prior to which I was a Chapman Fellow in Mathematics in the Statistics Section of the Department of Mathematics at Imperial College London.

Previously, I have received a Ph.D. in Statistics from the Department of Statistics & Data Science at Carnegie Mellon University, under the supervision of Professor Larry Wasserman. I have also obtained a Masters in Machine Learning from the Machine Learning Department at CMU and a Bachelors and Masters in Statistics from the Indian Statistical Institute, Kolkata.

In my free time, I love to perform music, especially Bollywood music, paint and Zumba!

Research Interests

My research focuses on developing scalable and interpretable machine learning methods, with statistical significance guarantees, to analyze high-dimensional data. Specifically, it includes developing new hypothesis testing, clustering, anomaly detection, and interpretability algorithms and applying them to applications in science.

My current research includes the following themes:
  1. Inference for Clustering: developing and analyzing, both theoretically and empirically, high-dimensional clustering algorithms with significance guarantees.
  2. Signal Detection in Particle Physics: developing tests that can detect new signals in particle physics data sets in model-independent and model-dependent settings.
  3. Interpretability of High-dimensional Classifiers: developing active subspace search to model the surface of the classifier, identify directions with the most variability, and identify relationships between features that influence the classifier.

Publications and Reports

Purvasha Chakravarti, Mikael Kuusela, and Larry Wasserman.
Robust Signal Detection using a Classifier Decorrelated through Optimal Transport (CDOT).
In preparation.
We propose a signal detection algorithm, applied to particle physics, using a supervised classifier that is decorrelated with an auxiliary variable for the background data through optimal transport. We then fit a semi-parametric mixture model to the auxiliary variable for different thresholds of the transformed classifier to detect the presence of signal. This ensures that the signal detection procedure is robust to the systematic differences in the background distribution.

Purvasha Chakravarti, Mikael Kuusela, Jing Lei and Larry Wasserman.
Model-Independent Detection of New Physics Signals Using Semi-Supervised Classifier Tests.
Major revision at Annals of Applied Statistics (AOAS) (arXiv preprint arXiv:2102.07679). Supplementary Material (here)
In high energy physics, an important problem is to detect if there is any significant difference between the distribution of just background events (generated from an assumed Monte Carlo model) and the distribution of the actual observations, which could be a mixture of background and signal events. We propose three tests. Two are based on the performance of a classifier in differentiating between two distributions measured using the Area under the Curve (AUC) and the misclassification error (MCE) statistics. The third test is based on estimating the likelihood ratio using the classifier output.

Purvasha Chakravarti.
Inference for Clustering and Anomaly Detection.

Ph.D. thesis at Carnegie Mellon University. Ph.D. defense slides (here).
In this thesis, we take an inferential approach to searching for evidence that indicates the presence of two or more collections of data, with different distributions, in a single data set. It addresses two fundamental questions relating to this: (a) How can we perform clustering that results in statistically significant clusters? (b) In high energy physics, how can we detect new signals in experimental data, that are not explained by known physics models, without assuming a model for the new signal?

Purvasha Chakravarti, Sivaraman Balakrishnan, and Larry Wasserman.
Gaussian Mixture Clustering Using Relative Tests of Fit.

Submitted to Journal of the Royal Statistical Society Series B (Statistical Methodology). (arXiv preprint arXiv:1910.02566) Ph.D. thesis proposal slides (here). R Code (here).
We develop a test for whether a mixture of Gaussians provides a better fit to the data relative to a single Gaussian, without assuming that either model is correct. We then use this test to develop a clustering algorithm that comes equipped with significance guarantees. We show how the test can be used in a hierarchical as well as a sequential manner for clustering.

Yotam Hechtlinger, Purvasha Chakravarti, and Jining Qin.
A Generalization of Convolutional Neural Networks to Graph-Structured Data.

arXiv preprint arXiv:1704.08165 (2017).
We propose a novel spatial convolution that can be applied to graph-structured data. We use a random walk to quantify a general notion of neighborhood for graph-structured data, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure.

Statistical Analysis of the Chikungunya Fever (ADA Final Report; 2016).
Joint work with Bill Eddy and Virginia Dato.
We develop a multi-country compartment model and a multi-country time series model to predict the spread of Chikungunya in the Americas.

Sunder Ram Krishnan, Chandra Sekhar Seelamantula, and Purvasha Chakravarti.
Spatially Adaptive Kernel Regression Using Risk Estimation.
IEEE Signal Processing Letters 21.4 (2014): 445-448.
An important question in kernel regression is one of estimating the order and bandwidth parameters from available noisy data. We propose to solve the problem within a risk estimation framework. Considering an independent and identically distributed (i.i.d.) Gaussian observations model, we use Stein's unbiased risk estimator (SURE) to estimate a weighted mean-square error (MSE) risk, and optimize it with respect to the order and bandwidth parameters. We consider the problem of image restoration from uniform/non-uniform data, and show that the SURE approach to spatially adaptive kernel regression results in better quality estimation compared with its spatially non-adaptive counterparts.

Selected Presentations and Posters

Search for New Physics Signals Using Interpretable Classifiers.
Statistics Seminar in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics (DPMMS), University of Cambridge, scheduled for February 2022.
Maths Inspirational Lecture, Imperial College London, scheduled for January 2022.
School of Mathematics, The University of Edinburgh, December 2021.
School of Mathematics, Cardiff University, December 2021.
School of Mathematics and Statistics, University of Sheffield, November 2021.
School of Mathematical Sciences, Queen Mary University of London, November 2021.

Clustering with Significance Guarantees Using Relative Tests of Fit (RIFT).
The Analytics and Operations Group, Imperial College Business School, November 2021.

Model-Independent Detection of New Physics Signals Using Interpretable Semi-Supervised Classifier Tests.
Inter-experiment Machine Learning (IML) Working Group, CERN, 2021 (slides, talk).
Contributed Talk at Joint Statistical Meetings, Virtual Conference, 2021.
4th Inter-experiment Machine Learning (IML) Workshop, CERN, 2020 (slides, talk).
Statistical Inference & Machine Learning Workshop in the Big Data and Machine Learning in the Natural Sciences Group, Imperial College London, 2021.
Centres for Doctoral Training (CDT) Seminar, Imperial College London, 2020.

Inference for Clustering and Anomaly Detection.
Statistics seminar, Imperial College London, 2020.
Chapman Fellowship Interview, Department of Mathematics, Imperial College London, 2020.

Gaussian Mixture Clustering Using Relative Tests of Fit (RIFTs)
Contributed Talk at Joint Statistical Meetings, Denver, Colorado, 2019 (slides).
Working Group on Model-Based Clustering Summer Session, Ann Arbor, Michigan, 2018.

A Generalization of Convolutional Neural Networks to Graph-Structured Data
Poster at National Academy of Sciences Arthur M. Sackler Colloquium, The Science of Deep Learning, Washington, D.C., 2019.

Hierarchical Significance Testing for Gaussian Mixture Clustering
Contributed Talk at Joint Statistical Meetings, Vancouver, Canada, 2018.

Statistical Significance of k-Means Clustering
Speed Presentation at Joint Statistical Meetings, Baltimore, Maryland, 2017.

Statistical Analysis of the Chikungunya Fever
Women in Statistics and Data Science Conference, Charlotte, North Carolina, 2016.

Women in Statistics at Carnegie Mellon University
Jointly presented with Shannon Gallagher.
Women in Statistics and Data Science Conference, Charlotte, North Carolina, 2016.

Awards and Recognition

Honorable Mention for the 2019 Do-Bui Travel Award.
Received an Honorable Mention for Do-Bui Travel Award given by the Caucus for Women inStatistics (CWS), Joint Statistical Meetings, 2019.

National Level Scholarship
Obtained the INSPIRE Scholarship offered by the Department of Science and Technology (DST), Government of India from 2009 - 2014.

Summer Fellowship
Received Indian Academy of Science Fellowship, 2012.

Cyber Olympiad
Secured All India rank 19 in 5th National Cyber Olympiad held on 19th February, 2006.

Teaching Experience

Instructor
Data Science (Imperial College London MATH97309)
Spring 2022 with 70 students and Spring 2021 with 62 students. Postgraduate course for the MSc Statistics program.

Statistical Graphics and Visualization (CMU 36-315)
Summer 2020 with 32 students. 3rd year undergraduate course.

Introduction to Probability Theory (CMU 36-225)
Summer 2019 with 43 students. 2nd year undergraduate course.

Reasoning with Data (CMU 36-200)
Summer 2018 with 8 students. 1st year undergraduate course.

Introduction to Statistical Inference (CMU 36-226)
Summer 2016 with 26 students, 2017 with 31 students. 2nd year undergraduate course.

Teaching assistant
Sampling, Survey and Society (CMU 36-303)
Spring 2019.

Modern Regression (CMU 36-401)
Fall 2018, Fall 2014.

Advanced Methods for Data Analysis (CMU 36-402)
Spring 2018, Spring 2017, Spring 2016.

Intermediate Statistics (CMU 36-705)
Fall 2017, Fall 2016.

Introduction to Probability Theory (CMU 36-225)
Fall 2015.

Probability and Mathematical Statistics (Hons., CMU 36-625)
Spring 2015.

Curriculum Vitae

Please download my CV from here.

Contact

Email: p [dot] chakravarti [at] ucl [dot] ac [dot] uk