TY - JOUR
T1 - Variable selection in Bayesian multiple instance regression using shotgun stochastic search
AU - Park, Seongoh
AU - Kim, Joungyoun
AU - Wang, Xinlei
AU - Lim, Johan
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/8
Y1 - 2024/8
N2 - In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.
AB - In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.
KW - Binding affinity prediction
KW - Hierarchical model
KW - MCMC
KW - Model selection
KW - Multiple instance learning
KW - Musk data
UR - http://www.scopus.com/inward/record.url?scp=85189526498&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2024.107954
DO - 10.1016/j.csda.2024.107954
M3 - Article
AN - SCOPUS:85189526498
SN - 0167-9473
VL - 196
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
M1 - 107954
ER -