Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability

Motivation: Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce differen...

Descripción completa

Detalles Bibliográficos
Autores: Chang, Jia-Ming, 1978-, Floden, Evan Wade, Herrero, Javier, Gascuel, Olivier, Di Tommaso, Paolo, Notredame, Cedric
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2019
País:España
Institución:Universitat Pompeu Fabra
Repositorio:Repositorio Digital de la UPF
OAI Identifier:oai:repositori.upf.edu:10230/51908
Acceso en línea:http://hdl.handle.net/10230/51908
http://dx.doi.org/10.1093/bioinformatics/btz082
Access Level:acceso abierto
Palabra clave:Filogènia
Genètica
id ES_afa4b14df5f7d49c28e70aa4c023aa4e
oai_identifier_str oai:repositori.upf.edu:10230/51908
network_acronym_str ES
network_name_str España
repository_id_str
spelling Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliabilityChang, Jia-Ming, 1978-Floden, Evan WadeHerrero, JavierGascuel, OlivierDi Tommaso, PaoloNotredame, CedricFilogèniaGenèticaMotivation: Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives. Results: We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 1-to-1 orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50% and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP). Availability: The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. Supplementary information: Supplementary data are available at Bioinformatics online.This work was supported by the Spanish Ministry of Science Plan Nacional [BFU2008-00419 to P.D.T. and C.N.]; the Wellcome Trust [WT095908 to P.F.]; the INCEPTION project [PIA/ANR-16-CONV-0005 to O.G.]; the Taiwan Ministry of Science and Technology [106-2221-E-004-011-MY2 to J.-M.C.]. We acknowledge support of the European Molecular Biology Laboratory, the Spanish Ministry of Economy and Competitiveness, “Centro de Excelencia Severo Ochoa 2013-2017” and “The Human Project from Mind, Brain and Learning” of NCCU from the Higher Education Sprout Project by the Ministry of Education in TaiwanOxford University Press202120212019info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfapplication/pdfhttp://hdl.handle.net/10230/51908http://dx.doi.org/10.1093/bioinformatics/btz082reponame:Repositorio Digital de la UPFinstname:Universitat Pompeu FabraInglésinfo:eu-repo/grantAgreement/ES/3PN/BFU2008-00419© Jia-Ming Chang et al. 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly citedhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessoai:repositori.upf.edu:10230/519082026-06-12T07:21:37Z
dc.title.none.fl_str_mv Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
title Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
spellingShingle Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
Chang, Jia-Ming, 1978-
Filogènia
Genètica
title_short Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
title_full Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
title_fullStr Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
title_full_unstemmed Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
title_sort Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability
dc.creator.none.fl_str_mv Chang, Jia-Ming, 1978-
Floden, Evan Wade
Herrero, Javier
Gascuel, Olivier
Di Tommaso, Paolo
Notredame, Cedric
author Chang, Jia-Ming, 1978-
author_facet Chang, Jia-Ming, 1978-
Floden, Evan Wade
Herrero, Javier
Gascuel, Olivier
Di Tommaso, Paolo
Notredame, Cedric
author_role author
author2 Floden, Evan Wade
Herrero, Javier
Gascuel, Olivier
Di Tommaso, Paolo
Notredame, Cedric
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Filogènia
Genètica
topic Filogènia
Genètica
description Motivation: Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives. Results: We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 1-to-1 orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50% and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP). Availability: The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. Supplementary information: Supplementary data are available at Bioinformatics online.
publishDate 2019
dc.date.none.fl_str_mv 2019
2021
2021
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/10230/51908
http://dx.doi.org/10.1093/bioinformatics/btz082
url http://hdl.handle.net/10230/51908
http://dx.doi.org/10.1093/bioinformatics/btz082
dc.language.none.fl_str_mv Inglés
language_invalid_str_mv Inglés
dc.relation.none.fl_str_mv info:eu-repo/grantAgreement/ES/3PN/BFU2008-00419
dc.rights.none.fl_str_mv http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:Repositorio Digital de la UPF
instname:Universitat Pompeu Fabra
instname_str Universitat Pompeu Fabra
reponame_str Repositorio Digital de la UPF
collection Repositorio Digital de la UPF
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869416702188453888
score 15,81155