MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration

BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiment...

Descripción completa

Detalles Bibliográficos
Autores: Hernandez-Ferrer, Carles, 1987-, Ruiz-Arenas, Carlos, Beltran-Gomila, Alba, González Ruiz, Juan Ramón
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2017
País:España
Institución:Universitat Pompeu Fabra
Repositorio:Repositorio Digital de la UPF
OAI Identifier:oai:repositori.upf.edu:10230/34978
Acceso en línea:http://hdl.handle.net/10230/34978
http://dx.doi.org/10.1186/s12859-016-1455-1
Access Level:acceso abierto
Palabra clave:Genòmica
Programari
Descripción
Sumario:BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment.CONCLUSIONS: MultiDataSet is a suitable class for data integration under R and Bioconductor framework.