Variable selection for multifactorial genomic data

Por: S. TARAZONA, S. PRADO-LOPEZ, J. DOPAZO, A. FERRER and A. CONESA

Publicada: 15 ene 2012

Resumen:
Dimension reduction techniques are used to explore genomic data. Due to the large number of variables (genes) included in this kind of studies, variable selection methods are needed to identify the most responsive genes in order to get a better interpretation of the results or to conduct more specific experiments. These methods should be consistent with the amount of signal in the data. For this purpose, we introduce a novel selection strategy called minAS and also adapt other existing strategies, such us Gamma approximation, resampling techniques, etc. All of them are based on studying the distribution of statistics measuring the importance of the variables in the model. These strategies have been applied to the ASCA-genes analysis framework and more generally to dimension reduction techniques as PCA. The performance of the different strategies was evaluated using simulated data. The best performing methods were then applied on an experimental dataset containing the transcriptomic profiles of human embryonic stem cells cultured under different oxygen concentrations. The ability of the methods to extract relevant biological information from the data is discussed. (C) 2011 Elsevier B.V. All rights reserved.

Filiaciones:
:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain

Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, E-46071 Valencia, Spain

S. PRADO-LOPEZ:
Ctr Invest Principe Felipe, Cellular Reprogramming Lab, Valencia, Spain

:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain

A. FERRER:
Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, E-46071 Valencia, Spain

:
Ctr Invest Principe Felipe, Bioinformat & Genom Dept, Valencia, Spain

ISSN: 01697439

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS

Editorial
ELSEVIER SCIENCE BV, PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS, Países Bajos

Tipo de documento: Article
Volumen: 110 Número: 1
Páginas: 113-122

DOI: 10.1016/j.chemolab.2011.10.012

WOS Id: 000299712500014

Variable selection for multifactorial genomic data

MÉTRICAS