Selecting the number of principal components: Estimation of the true rank of a noisy matrix

Yunjin Choi, Jonathan Taylor, Robert Tibshirani

Research output: Contribution to journalArticlepeer-review

52 Scopus citations

Abstract

Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of principal components. In order to address this challenge, we propose distribution-based methods with exact type 1 error controls for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples. Assuming Gaussian noise, we derive exact type 1 error controls based on the conditional distribution of the singular values of a Gaussian matrix by utilizing a post-selection inference framework, and extending the approach of [Taylor, Loftus and Tibshirani (2013)] in a PCA setting. In simulation studies, we find that our proposed methods compare well to existing approaches.

Original languageEnglish
Pages (from-to)2590-2617
Number of pages28
JournalAnnals of Statistics
Volume45
Issue number6
DOIs
StatePublished - Dec 2017

Keywords

  • Exact p-value
  • Hypothesis test
  • Principal components

Fingerprint

Dive into the research topics of 'Selecting the number of principal components: Estimation of the true rank of a noisy matrix'. Together they form a unique fingerprint.

Cite this