TY - JOUR
T1 - Primal path algorithm for compositional data analysis
AU - Jeon, Jong June
AU - Kim, Yongdai
AU - Won, Sungho
AU - Choi, Hosik
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/8
Y1 - 2020/8
N2 - We consider the LASSO estimator for compositional data in which covariates are nonnegative, and their sum is always one. Due to the linear constraint of the regression coefficients caused by the sum to one condition, standard algorithms for LASSO cannot be applied directly to compositional data. Hence, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. Additionally, the exact computation for the regression is not investigated under existing methods. In this paper, we first propose an exact solution path algorithm for a l1 regularized regression with high-dimensional compositional data and extend to a classification model. We also compare its computational speed with that of previously developed algorithms and then apply the proposed algorithm to analyzing income inequality data in economics and human gut microbiome data in biology. By analyzing simulated and real data sets, we illustrate that our specialized algorithm is significantly more efficient than the generalized LASSO algorithm for compositional data.
AB - We consider the LASSO estimator for compositional data in which covariates are nonnegative, and their sum is always one. Due to the linear constraint of the regression coefficients caused by the sum to one condition, standard algorithms for LASSO cannot be applied directly to compositional data. Hence, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. Additionally, the exact computation for the regression is not investigated under existing methods. In this paper, we first propose an exact solution path algorithm for a l1 regularized regression with high-dimensional compositional data and extend to a classification model. We also compare its computational speed with that of previously developed algorithms and then apply the proposed algorithm to analyzing income inequality data in economics and human gut microbiome data in biology. By analyzing simulated and real data sets, we illustrate that our specialized algorithm is significantly more efficient than the generalized LASSO algorithm for compositional data.
KW - Constraint
KW - Microbiome data
KW - Penalized regression
KW - Solution path algorithm
UR - http://www.scopus.com/inward/record.url?scp=85083100564&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2020.106958
DO - 10.1016/j.csda.2020.106958
M3 - Article
AN - SCOPUS:85083100564
SN - 0167-9473
VL - 148
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
M1 - 106958
ER -