In this episode of the Micro Binfie podcast, host Andrew Page is live from the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. He sits down with David Mahoney, a PhD student from Dalhousie University in Halifax, Nova Scotia. David shares his research on characterizing antimicrobial resistance (AMR) genes and their transfer within metagenomes, focusing on metagenomic assembly graphs. They delve into David’s background in food safety microbiology and his interest in the public health implications of genomics. He explains his exciting work on analyzing how AMR genes transfer across different environments, such as food production plants and clinical settings, using both new and existing data from Canada’s Genomics Research and Development Initiative. David also highlights his use of innovative methods like assembly graphs and graph-based approaches to uncover AMR gene flow and lateral gene transfers, including the potential of machine learning techniques such as graph convolutional neural networks.
In this episode of the Micro Binfie Podcast, host Andrew Page catches up with Torsten Seemann at the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. They discuss the rapid evolution of bioinformatics, the challenges faced by labs worldwide, and the explosion of tools post-COVID. Torsten shares insights into his work at Melbourne’s Microbiological Diagnostic Unit (MDU), the development of platforms like OzTracker for bacterial genomics, and how his lab plays a national and international role in data sharing. The conversation dives into the future of the widely-used variant calling tool Snippy, as Torsten reveals exciting updates funded by the Chan Zuckerberg Initiative, including nanopore read support and the ability to process pre-assembled genomes. They also explore the importance of maintaining open-source bioinformatics tools to prevent them from becoming obsolete. Tune in for an in-depth discussion on the state of genomics, software development, and the challenges and rewards of open-source collaboration.
In this episode of the Micro binfie Podcast, host Andrew Page sits down with Tim Dallman at the 10th Bioinformatics Hackathon in Bethesda, Maryland. Tim shares insights from his work at Utrecht University in the Netherlands, where he focuses on genomic surveillance and machine learning models to predict disease risk and severity. They discuss the challenges of integrating genomic variation into predictive models, the importance of high-quality metadata, and the complexities of working with pathogens like Shiga toxin-producing E. coli. Tim also talks about his role at the WHO Pandemic and Epidemic Intelligence Hub and how global collaboration can drive innovation in public health genomics. Tune in to hear about cutting-edge research, the importance of interdisciplinary teamwork, and how genomic data can be harnessed for future pandemic preparedness.
Host Andrew Page is joined by Robert Petit from the Wyoming Public Health Laboratory. Robert, a key developer of the Bactopia pipeline, shares insights into how this end-to-end tool is transforming bacterial genomic surveillance. They dive into the origins of Bactopia, its applications in public health, and Robert's experience leading genomic projects in a rural setting. Discover how Bactopia streamlines pathogen detection, improves documentation, and integrates with other tools to deliver fast and accurate results. Listen in as they discuss new innovations in bioinformatics, including visualizations and human-read filtering, and explore future projects like CamelHUMP, designed to simplify sequence-based typing. Recorded live at the Microbial Bioinformatics Hackathon in Bethesda, Maryland, this episode brings you the latest in pathogen genomics and the challenges and rewards of working on the frontier of public health.
Andrew and Lee talk with Christine and Cynney about the Haiti cholera outbreak Cynney Walters: https://www.linkedin.com/in/cynney-walters-763111190 Walters et al, "Genome sequences from a reemergence of Vibrio cholerae in Haiti, 2022 reveal relatedness to previously circulating strains" https://journals.asm.org/doi/abs/10.1128/jcm.00142-23
Nabil and Lee have a quick chat about minimum spanning trees (MST). Eburst paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-152
We go over tree visualizations! * Microreact https://microreact.org/,https://www.phylocanvas.gl/ * grapetree https://github.com/achtman-lab/GrapeTree * Auspice/NextStrain https://nextstrain.org/ * Taxonium https://taxonium.org/ * Itol: https://itol.embl.de/ * PhyloViz https://online.phyloviz.net/index * Phandango https://jameshadfield.github.io/phandango/#/main
125 Kostas Konstantinidis returns to talk to us about ANI and metagenomics by Microbial Bioinformatics
We talk with Kostas! For more information please visit https://enve-omics.gatech.edu/
In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.
We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full
In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development: Topics discussed: K-mers for analyzing sequencing data, and how Sourmash builds on MinHash How Sourmash handles k-mers for metagenomic comparisons vs. MASH The modhash and bottom sketch approaches used in Sourmash Dealing with sequencing errors and noise in k-mer data Sourmash as a reference-based method, and applications for metagenomics Titus' focus on building reusable libraries and APIs vs one-off tools Recruiting collaborators through "nerd sniping" with interesting problems The open source philosophy that motivates Titus' software work Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash: An overview of what Sourmash does - sketching and comparing large k-mer datasets How the sampling approach enables analyses like containment estimation Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search Scaling to search across millions of metagenomes in seconds with WebAssembly Potential public health applications for tracking and sourcing pathogens Important caveats around resolution limits and need for follow-up analyses Ongoing work to characterize the technique's specificity and sensitivity Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this episode, Andrew Page and Lee Katz continue their chat with Titus Brown, focusing on taxonomy assignment in metagenomics: Topics discussed: Dealing with contamination and low quality genomes in reference databases Sourmash as a versatile search tool, not a curated database The need for high confidence in taxonomic assignment in public health Most microbial assignment tools have low specificity or sensitivity Possible ways to achieve perfect species classification (in theory) The challenges around defining species based on small genomic differences Interesting cryptography concept of 'unicity' distance for classification Conveying the nuances and uncertainties in taxonomic assignment The conversation highlights the difficulties around taxonomic classification, especially at the species level, but explores ideas for improving accuracy. Overall it emphasizes the complexities of biology and need for transparent conveyance of uncertainties. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * https://youtu.be/RLz5Api676Y?si=D6nps33O42Fmk4Ac * https://youtu.be/0Nz8YrCC9X8?si=KNwkeFoS6Bu4LOpv * https://youtu.be/dxIPcbmo1_U?si=TOBw5AONYVCW0JzV * https://youtu.be/m1lc8GwBKFE?si=rj7Oiq51l2_dB6ro More information on the secret tattoo here: https://underunderstood.com/podcast/episode/jeff-goldblums-secret-tattoo-jurassic-park-ian-malcolm/
In this episode of the Micro Binfie Podcast, Andrew Page and Lee Katz interview Titus Brown about his journey from studying math and physics as an undergrad to becoming a bioinformatician focusing on metagenomics and software development. Topics discussed: Titus' background in math, physics, digital evolution research, and developmental biology His transition into bioinformatics to analyze the influx of genomic data in the 1990s Developing early tools for comparative genomics and sequence analysis The philosophy of creating usable software with good documentation Work on transcriptomics, metagenomics, and k-mers at Michigan State Digital normalization and dealing with large sequencing datasets Moving to UC Davis and continuing work on metagenomics and software like khmer and sourmash Thoughts on challenges around data reuse and accessibility in science. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
The podcast discusses an article co-authored by Andrew Page, examining the use of GPT-4 for research publication. The conversation focuses on the authorship of articles generated by GPT-4 and the implications for academic publishing. Authorship and Ethics: Andrew discusses the question of authorship when AI-generated content is involved in research articles. He explores the ethical implications and potential biases associated with AI-assisted writing, such as the omission of minority figures and novel discoveries. He emphasizes the importance of transparency when using AI and its potential to democratize research, as long as ethical guidelines are maintained. AI & Scientific Journals: The podcast delves into the current landscape of AI in academic publishing. It addresses the commercial use of AI in crafting manuscripts for research articles and the necessity of distinguishing between manual and AI-generated contributions. The possible misalignment of GPT-4's commercial objectives with academic goals is highlighted. Risks and Benefits: Andrew outlines the risks of using AI in publishing, such as unintentional plagiarism, biases, and outdated methods. He provides an example of bioinformatics software recommending deprecated methods, illustrating the need for caution. The conversation also touches upon the AI's potential to introduce bias unintentionally, citing past incidents where AI models quickly adopted extremist views. Andrew's co-authors, Niamh Tumelty and Sam Sheppard, bring different perspectives on ethics and the impact of AI on publishing. Niamh, associated with the London School of Economics, emphasizes ethical considerations, while Sam, editor-in-chief of Microbial Genomics, underscores the need to adapt to the reality of AI contributions in journal submissions. In conclusion, the podcast underscores the importance of recognizing and navigating the ethical challenges posed by AI in academic publishing. It suggests that the technology may evolve faster than policies can adapt, necessitating an ongoing conversation among researchers, publishers, and AI developers. Links: https://microbiologysociety.org/blog/microbe-talk-ai-a-useful-tool-or-dangerous-unstoppable-force.html https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001049
We continue our conversation with Wytamma Wirth about write-the and all things AI. It starts with discussing the usage of language models, specifically ChatGPT, in writing boilerplate code, and how it can assist in generating code snippets, unit tests, and even documentation strings. The participants also explore the potential of incorporating it into code editors to make coding more efficient and less error-prone. The conversation then shifts to discuss the generation of research papers, specifically software announcements, by leveraging code documentation. The participants believe ChatGPT could be useful in generating introductions and backgrounds for such publications. They also touch upon the utility of language models in translating documentation into different human languages to assist non-native English speakers. The discussion returns to code documentation, focusing on the tool "write the docs" which auto-generates well-structured and searchable documentation websites. The participants appreciate the tool's ease of use and the potential it has in maintaining proper documentation for projects. The conversation ends with an acknowledgment of the importance of human oversight in automating tasks using language models. Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
In this episode, we dive deep into the world of automated code documentation and conversion using ChatGPT through the write-the software developed by Dr Wytamma Wirth from The University of Melbourne. Our guest, an experienced software engineer, takes us on a journey through the challenges and nuances of writing code documentation and the role AI can play in easing this process. We explore the intersection of ChatGPT's capabilities with Write the Docs, a documentation system widely used by developers. From highlighting ChatGPT's ability to understand and generate code snippets, to demonstrating real-time code conversion across multiple programming languages, this episode is a treasure trove for developers looking to enhance their workflow. Whether you're a seasoned developer or just getting started, tune in to discover how the synergy of AI and coding can elevate your documentation game to the next level! Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
Lee and Andrew are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. On day 3 we catch up with Dr Ruth Timme.