TY - JOUR

T1 - Primal path algorithm for compositional data analysis

AU - Jeon, Jong June

AU - Kim, Yongdai

AU - Won, Sungho

AU - Choi, Hosik

N1 - Publisher Copyright:
© 2020 Elsevier B.V.

PY - 2020/8

Y1 - 2020/8

N2 - We consider the LASSO estimator for compositional data in which covariates are nonnegative, and their sum is always one. Due to the linear constraint of the regression coefficients caused by the sum to one condition, standard algorithms for LASSO cannot be applied directly to compositional data. Hence, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. Additionally, the exact computation for the regression is not investigated under existing methods. In this paper, we first propose an exact solution path algorithm for a l1 regularized regression with high-dimensional compositional data and extend to a classification model. We also compare its computational speed with that of previously developed algorithms and then apply the proposed algorithm to analyzing income inequality data in economics and human gut microbiome data in biology. By analyzing simulated and real data sets, we illustrate that our specialized algorithm is significantly more efficient than the generalized LASSO algorithm for compositional data.

AB - We consider the LASSO estimator for compositional data in which covariates are nonnegative, and their sum is always one. Due to the linear constraint of the regression coefficients caused by the sum to one condition, standard algorithms for LASSO cannot be applied directly to compositional data. Hence, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. Additionally, the exact computation for the regression is not investigated under existing methods. In this paper, we first propose an exact solution path algorithm for a l1 regularized regression with high-dimensional compositional data and extend to a classification model. We also compare its computational speed with that of previously developed algorithms and then apply the proposed algorithm to analyzing income inequality data in economics and human gut microbiome data in biology. By analyzing simulated and real data sets, we illustrate that our specialized algorithm is significantly more efficient than the generalized LASSO algorithm for compositional data.

KW - Constraint

KW - Microbiome data

KW - Penalized regression

KW - Solution path algorithm

UR - http://www.scopus.com/inward/record.url?scp=85083100564&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2020.106958

DO - 10.1016/j.csda.2020.106958

M3 - Article

AN - SCOPUS:85083100564

SN - 0167-9473

VL - 148

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

M1 - 106958

ER -