-
Gao, X and Carroll, R. J. (2017) Data integration with high dimensionality. Biometrika, 104, 2, pp. 251–272
The website link: https://cran.r-project.org/web/packages/FusionLearn/index.html
The Fusion Lerning software is developed by Xin Gao, Adam Zhong and Raymond Carroll.
Description:
The fusion learning method uses model selection algorithm to learn from multiple data sets across different experimental platforms through group penalization. The responses of interest may include a mix of discrete and continuous variables. The responses may share the same set of predictors, however, the model and parameters differ across different platforms. Integrating information from different data sets can enhance the power of model selection.
The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as sample size increases. There are marginal likelihoods for each experiment. We specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Numerical results indicate that fusion learning can dramatically improve upon using only one data source.
This package "FusionLearn" is developed to perform the fusion learning tasks.
There are two built-in examples. The first example demonstrates how we learn the predictive model of breast cancer from two types of microarray date sets containing over 20,000 genes. The second example demonstrates how we simultaneously build the predictive models for three different economic indexes from a panel of stock index predictors.