TY - JOUR
T1 - Gene selection and prediction for cancer classification using support vector machines with a reject option
AU - Choi, Hosik
AU - Yeo, Donghwa
AU - Kwon, Sunghoon
AU - Kim, Yongdai
PY - 2011/5/1
Y1 - 2011/5/1
N2 - In cancer classification based on gene expression data, it would be desirable to defer a decision for observations that are difficult to classify. For instance, an observation for which the conditional probability of being cancer is around 1/2 would preferably require more advanced tests rather than an immediate decision. This motivates the use of a classifier with a reject option that reports a warning in cases of observations that are difficult to classify. In this paper, we consider a problem of gene selection with a reject option. Typically, gene expression data comprise of expression levels of several thousands of candidate genes. In such cases, an effective gene selection procedure is necessary to provide a better understanding of the underlying biological system that generates data and to improve prediction performance. We propose a machine learning approach in which we apply the l1 penalty to the SVM with a reject option. This method is referred to as the l1 SVM with a reject option. We develop a novel optimization algorithm for this SVM, which is sufficiently fast and stable to analyze gene expression data. The proposed algorithm realizes an entire solution path with respect to the regularization parameter. Results of numerical studies show that, in comparison with the standard l1 SVM, the proposed method efficiently reduces prediction errors without hampering gene selectivity.
AB - In cancer classification based on gene expression data, it would be desirable to defer a decision for observations that are difficult to classify. For instance, an observation for which the conditional probability of being cancer is around 1/2 would preferably require more advanced tests rather than an immediate decision. This motivates the use of a classifier with a reject option that reports a warning in cases of observations that are difficult to classify. In this paper, we consider a problem of gene selection with a reject option. Typically, gene expression data comprise of expression levels of several thousands of candidate genes. In such cases, an effective gene selection procedure is necessary to provide a better understanding of the underlying biological system that generates data and to improve prediction performance. We propose a machine learning approach in which we apply the l1 penalty to the SVM with a reject option. This method is referred to as the l1 SVM with a reject option. We develop a novel optimization algorithm for this SVM, which is sufficiently fast and stable to analyze gene expression data. The proposed algorithm realizes an entire solution path with respect to the regularization parameter. Results of numerical studies show that, in comparison with the standard l1 SVM, the proposed method efficiently reduces prediction errors without hampering gene selectivity.
KW - Classification
KW - Lasso
KW - Reject option
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=79251596679&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2010.12.001
DO - 10.1016/j.csda.2010.12.001
M3 - Article
AN - SCOPUS:79251596679
SN - 0167-9473
VL - 55
SP - 1897
EP - 1908
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 5
ER -