Linking taxa to function through contig clustering of microbial metagenomes
Update: 2014-03-31
Description
Co-authors: Johannes Alneberg (KTH Royal Institute of Technology, Stockholm, Sweden), Brynjar Smaari Bjarnason (KTH Royal Institute of Technology, Stockholm, Sweden), Ino de Bruijn (KTH Royal Institute of Technology, Stockholm, Sweden), Melanie Schirmer (University of Glasgow), Joshua Quick (University of Birmingham), Nicholas J. Loman (University of Birmingham), Anders F. Andersson (KTH Royal Institute of Technology, Stockholm, Sweden), Konstantinos Gerasimidis (University of Glasgow)
Taxonomic profiling of microbial communities can answer the question of “Who is there?” This can be achieved either through marker gene sequencing or true shotgun metagenomics. The latter because the functional genes of all community members are sequenced allows us to answer the additional question: “What are they doing?” However, there is a third question that is key to understanding microbial communities: “Who is doing what?” This question has received much less attention because to answer it requires the extraction of complete genomes from metagenomes. Assembly of metagenomes can generate millions of contigs, assembled genome fragments, with no information on which contig derives from which genome. Here I will present CONCOCT, a novel algorithm that combines sequence composition, coverage across multiple samples, and read-pair linkage to automatically cluster contigs into genomes. CONCOCT uses a dimensionality reduction coupled to a Gaus sian mixture model, fit using a variational Bayesian algorithm which automatically identifies the optimal number of clusters. We demonstrate high recall and precision rates on artificial as well as real human gut metagenome datasets. Linking contigs into genome clusters, allows the frequencies of those clusters to be related to metadata, revealing function. We apply this approach to fecal metagenomes obtained from the E. coli O104:H4 epidemic (Germany, 2011) and are able to directly extract the outbreak genome. We also use it to identify organisms associated with inflammation in samples from children with Crohn’s disease.
Related Links
http://arxiv.org/abs/1312.4038 - arXiv preprint
Taxonomic profiling of microbial communities can answer the question of “Who is there?” This can be achieved either through marker gene sequencing or true shotgun metagenomics. The latter because the functional genes of all community members are sequenced allows us to answer the additional question: “What are they doing?” However, there is a third question that is key to understanding microbial communities: “Who is doing what?” This question has received much less attention because to answer it requires the extraction of complete genomes from metagenomes. Assembly of metagenomes can generate millions of contigs, assembled genome fragments, with no information on which contig derives from which genome. Here I will present CONCOCT, a novel algorithm that combines sequence composition, coverage across multiple samples, and read-pair linkage to automatically cluster contigs into genomes. CONCOCT uses a dimensionality reduction coupled to a Gaus sian mixture model, fit using a variational Bayesian algorithm which automatically identifies the optimal number of clusters. We demonstrate high recall and precision rates on artificial as well as real human gut metagenome datasets. Linking contigs into genome clusters, allows the frequencies of those clusters to be related to metadata, revealing function. We apply this approach to fecal metagenomes obtained from the E. coli O104:H4 epidemic (Germany, 2011) and are able to directly extract the outbreak genome. We also use it to identify organisms associated with inflammation in samples from children with Crohn’s disease.
Related Links
http://arxiv.org/abs/1312.4038 - arXiv preprint
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel