Application of Bayesian model averaging and population Monte Carlo to inference from metagenomic mixture
Update: 2014-03-31
Description
Co-author: Vincent Plagnol (University College London Genetics Institute)
For many practical applications, for example to uncover the pathogen that caused an infection after the acute phase, very deep short read sequencing can be effective provided that we can reliably assign short sequencing reads to species. This problem of assignment of reads to species is complicated by the fact that, in the absence of very large contigs, most short reads reads match to multiple species. This is essentially a mixture model, where the complete knowledge of all species present in the mixture provides information about the assignment of each read individually. However, metagenomic data analysis rarely formulates the problem in these terms because the very large number of potential species typically makes the inference intractable. Here, we propose a Bayesian model averaging strategy designed to explore the high dimensional space of species present in a metagenomic mixture. We use approximate Bayesian computation and a Monte Carlo strategy to implement the search o f the most appropriate mixture models. Owing to the computationally intensive aspects of the work, we used a population Monte Carlo Markov Chain to leverage the use of parallel computing. We find that the methodolgy is effective to provide a full Bayesian inference for samples with > 10M reads, hence providing interpretable Bayes Factors and posterior probabilities for practical problems that regularly arise in a clinical context.
For many practical applications, for example to uncover the pathogen that caused an infection after the acute phase, very deep short read sequencing can be effective provided that we can reliably assign short sequencing reads to species. This problem of assignment of reads to species is complicated by the fact that, in the absence of very large contigs, most short reads reads match to multiple species. This is essentially a mixture model, where the complete knowledge of all species present in the mixture provides information about the assignment of each read individually. However, metagenomic data analysis rarely formulates the problem in these terms because the very large number of potential species typically makes the inference intractable. Here, we propose a Bayesian model averaging strategy designed to explore the high dimensional space of species present in a metagenomic mixture. We use approximate Bayesian computation and a Monte Carlo strategy to implement the search o f the most appropriate mixture models. Owing to the computationally intensive aspects of the work, we used a population Monte Carlo Markov Chain to leverage the use of parallel computing. We find that the methodolgy is effective to provide a full Bayesian inference for samples with > 10M reads, hence providing interpretable Bayes Factors and posterior probabilities for practical problems that regularly arise in a clinical context.
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel