Variable selection for multifactorial genomic data
Por:
S. TARAZONA, S. PRADO-LOPEZ, J. DOPAZO, A. FERRER and A. CONESA
Publicada:
15 ene 2012
Resumen:
Dimension reduction techniques are used to explore genomic data. Due to the large number of variables (genes) included in this kind of studies, variable selection methods are needed to identify the most responsive genes in order to get a better interpretation of the results or to conduct more specific experiments. These methods should be consistent with the amount of signal in the data. For this purpose, we introduce a novel selection strategy called minAS and also adapt other existing strategies, such us Gamma approximation, resampling techniques, etc. All of them are based on studying the distribution of statistics measuring the importance of the variables in the model. These strategies have been applied to the ASCA-genes analysis framework and more generally to dimension reduction techniques as PCA. The performance of the different strategies was evaluated using simulated data. The best performing methods were then applied on an experimental dataset containing the transcriptomic profiles of human embryonic stem cells cultured under different oxygen concentrations. The ability of the methods to extract relevant biological information from the data is discussed. (C) 2011 Elsevier B.V. All rights reserved.
Filiaciones:
:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain
Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, E-46071 Valencia, Spain
S. PRADO-LOPEZ:
Ctr Invest Principe Felipe, Cellular Reprogramming Lab, Valencia, Spain
:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain
A. FERRER:
Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, E-46071 Valencia, Spain
:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain
|