Inference and Estimation in High Dimensional Data Analysis
Ph.D. Dissertation, Stanford University, July 2014
Abstract

Modern technologies generate vast amounts of fine-grained data at an unprecedented speed. Nowadays, high-dimensional data, where the number of variables is much larger than the sample size, occur in many applications, such as healthcare, social networks, and recommendation systems, among others. The ubiquitous interest in these applications has spurred remarkable progress in the area of high-dimensional data analysis in terms of point estimation and computation. However, one of the fundamental inference task, namely quantifying uncertainty or assessing statistical significance, is still in its infancy for such models. In the first part of this dissertation, we present efficient procedures and corresponding theory for constructing classical uncertainty measures like confidence intervals and p-values for single regression coefficients in high-dimensional settings.

In the second part, we study the compressed sensing reconstruction problem, a wellknown example of estimation in high-dimensional settings. We propose a new approach to this problem that is drastically different from the classical wisdom in this area. Our construction of the sensing matrix is inspired by the idea of spatial coupling in coding theory and similar ideas in statistical physics. For reconstruction, we use an approximate message passing algorithm. This is an iterative algorithm that takes advantage of the statistical properties of the problem to improve convergence rate. Finally, we prove that our method can effectively solve the reconstruction problem at (information-theoretically) optimal undersampling rate and show its robustness to measurement noise.