“Robustly Learning Mixtures of (Clusterable)
Gaussians via the SoS Proofs to Algorithms Method”
Sam Hopkins (UC Berkeley) April 15, 2021 | 5pm CEST | Zoom
Gaussian Mixture Models (GMMs) are the canonical generative models for non-homogeneous data: they have been studied in statistics and (later) computer science for more than 100 years. A seminal algorithm of Moitra and Vailiant shows that for every fixed k, mixtures of k Gaussians in d dimensions can be learned in time polynomial in d. However, this algorithm is highly non-robust: if the underlying distribution of samples is merely close to being a GMM (or if any samples are corrupted), their algorithm fails to learn the distribution.
We design and analyze a new, robust algorithm to learn high-dimensional GMMs, under the assumption that the mixture is clusterable — that is, all pairs of Gaussians in the mixture have total variation distance close to 1. Prior robust algorithms for learning GMMs work only for spherical Gaussians — that is, with covariance (upper bounded by) identity.
In this talk I will attempt to emphasize the SoS Proofs to Algorithms method which we use to design and analyze our algorithm. This method, which offers a mechanical way to turn a sufficiently simple proof of identifiability of parameters of a distribution into an efficient algorithm to learn those parameters, has proven to be exceptionally powerful in a wide range of high-dimensional statistics settings.
Meeting ID: 982 7897 3506