DiscoverMicro binfie podcast
Micro binfie podcast
Claim Ownership

Micro binfie podcast

Author: Microbial Bioinformatics

Subscribed: 44Played: 1,222
Share

Description

Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you.

The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics.

The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience.

Intro music : Werq - Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Question and comments? microbinfie@gmail.com
126 Episodes
Reverse
In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.
We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full
In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development: Topics discussed: K-mers for analyzing sequencing data, and how Sourmash builds on MinHash How Sourmash handles k-mers for metagenomic comparisons vs. MASH The modhash and bottom sketch approaches used in Sourmash Dealing with sequencing errors and noise in k-mer data Sourmash as a reference-based method, and applications for metagenomics Titus' focus on building reusable libraries and APIs vs one-off tools Recruiting collaborators through "nerd sniping" with interesting problems The open source philosophy that motivates Titus' software work Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash: An overview of what Sourmash does - sketching and comparing large k-mer datasets How the sampling approach enables analyses like containment estimation Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search Scaling to search across millions of metagenomes in seconds with WebAssembly Potential public health applications for tracking and sourcing pathogens Important caveats around resolution limits and need for follow-up analyses Ongoing work to characterize the technique's specificity and sensitivity Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this episode, Andrew Page and Lee Katz continue their chat with Titus Brown, focusing on taxonomy assignment in metagenomics: Topics discussed: Dealing with contamination and low quality genomes in reference databases Sourmash as a versatile search tool, not a curated database The need for high confidence in taxonomic assignment in public health Most microbial assignment tools have low specificity or sensitivity Possible ways to achieve perfect species classification (in theory) The challenges around defining species based on small genomic differences Interesting cryptography concept of 'unicity' distance for classification Conveying the nuances and uncertainties in taxonomic assignment The conversation highlights the difficulties around taxonomic classification, especially at the species level, but explores ideas for improving accuracy. Overall it emphasizes the complexities of biology and need for transparent conveyance of uncertainties. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * https://youtu.be/RLz5Api676Y?si=D6nps33O42Fmk4Ac * https://youtu.be/0Nz8YrCC9X8?si=KNwkeFoS6Bu4LOpv * https://youtu.be/dxIPcbmo1_U?si=TOBw5AONYVCW0JzV * https://youtu.be/m1lc8GwBKFE?si=rj7Oiq51l2_dB6ro More information on the secret tattoo here: https://underunderstood.com/podcast/episode/jeff-goldblums-secret-tattoo-jurassic-park-ian-malcolm/
In this episode of the Micro Binfie Podcast, Andrew Page and Lee Katz interview Titus Brown about his journey from studying math and physics as an undergrad to becoming a bioinformatician focusing on metagenomics and software development. Topics discussed: Titus' background in math, physics, digital evolution research, and developmental biology His transition into bioinformatics to analyze the influx of genomic data in the 1990s Developing early tools for comparative genomics and sequence analysis The philosophy of creating usable software with good documentation Work on transcriptomics, metagenomics, and k-mers at Michigan State Digital normalization and dealing with large sequencing datasets Moving to UC Davis and continuing work on metagenomics and software like khmer and sourmash Thoughts on challenges around data reuse and accessibility in science. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
The podcast discusses an article co-authored by Andrew Page, examining the use of GPT-4 for research publication. The conversation focuses on the authorship of articles generated by GPT-4 and the implications for academic publishing. Authorship and Ethics: Andrew discusses the question of authorship when AI-generated content is involved in research articles. He explores the ethical implications and potential biases associated with AI-assisted writing, such as the omission of minority figures and novel discoveries. He emphasizes the importance of transparency when using AI and its potential to democratize research, as long as ethical guidelines are maintained. AI & Scientific Journals: The podcast delves into the current landscape of AI in academic publishing. It addresses the commercial use of AI in crafting manuscripts for research articles and the necessity of distinguishing between manual and AI-generated contributions. The possible misalignment of GPT-4's commercial objectives with academic goals is highlighted. Risks and Benefits: Andrew outlines the risks of using AI in publishing, such as unintentional plagiarism, biases, and outdated methods. He provides an example of bioinformatics software recommending deprecated methods, illustrating the need for caution. The conversation also touches upon the AI's potential to introduce bias unintentionally, citing past incidents where AI models quickly adopted extremist views. Andrew's co-authors, Niamh Tumelty and Sam Sheppard, bring different perspectives on ethics and the impact of AI on publishing. Niamh, associated with the London School of Economics, emphasizes ethical considerations, while Sam, editor-in-chief of Microbial Genomics, underscores the need to adapt to the reality of AI contributions in journal submissions. In conclusion, the podcast underscores the importance of recognizing and navigating the ethical challenges posed by AI in academic publishing. It suggests that the technology may evolve faster than policies can adapt, necessitating an ongoing conversation among researchers, publishers, and AI developers. Links: https://microbiologysociety.org/blog/microbe-talk-ai-a-useful-tool-or-dangerous-unstoppable-force.html https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001049
We continue our conversation with Wytamma Wirth about write-the and all things AI. It starts with discussing the usage of language models, specifically ChatGPT, in writing boilerplate code, and how it can assist in generating code snippets, unit tests, and even documentation strings. The participants also explore the potential of incorporating it into code editors to make coding more efficient and less error-prone. The conversation then shifts to discuss the generation of research papers, specifically software announcements, by leveraging code documentation. The participants believe ChatGPT could be useful in generating introductions and backgrounds for such publications. They also touch upon the utility of language models in translating documentation into different human languages to assist non-native English speakers. The discussion returns to code documentation, focusing on the tool "write the docs" which auto-generates well-structured and searchable documentation websites. The participants appreciate the tool's ease of use and the potential it has in maintaining proper documentation for projects. The conversation ends with an acknowledgment of the importance of human oversight in automating tasks using language models. Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
In this episode, we dive deep into the world of automated code documentation and conversion using ChatGPT through the write-the software developed by Dr Wytamma Wirth from The University of Melbourne. Our guest, an experienced software engineer, takes us on a journey through the challenges and nuances of writing code documentation and the role AI can play in easing this process. We explore the intersection of ChatGPT's capabilities with Write the Docs, a documentation system widely used by developers. From highlighting ChatGPT's ability to understand and generate code snippets, to demonstrating real-time code conversion across multiple programming languages, this episode is a treasure trove for developers looking to enhance their workflow. Whether you're a seasoned developer or just getting started, tune in to discover how the synergy of AI and coding can elevate your documentation game to the next level! Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
Lee and Andrew are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. On day 3 we catch up with Dr Ruth Timme.
Andrew and Lee are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. We talk to Dr William Hsiao, one of the organisers of the conference.
Andrew and Lee are at the Global Microbial Identifier conference 13 in Vancouver Canada. On the first day they talked to Dr Finlay Maguire and Dr Emma Griffiths about microbial genomics and Tim Hortons.
In this episode there is a comprehensive discussion on the influence of AI, especially GPT-4, in the sphere of microbial bioinformatics. They reflect on a study testing GPT-4's problem-solving capabilities, which raises concerns about its potential impact on employment practices and academic integrity. There's speculation that AI's proficiency in tackling standard technical problems could interfere with genuinely evaluating a candidate's knowledge during interviews. Drawing parallels with calculators, the hosts deliberate on whether AI tools should be permitted during assessments. They stress the necessity for individuals to possess a deep understanding of their domain to accurately interpret and validate AI solutions. Discussing the AI's limitations, the hosts highlight its struggles with regular expressions and handling larger scripts. They observe the AI tends to loop and repeat itself, performing better with shorter scripts but faltering on more complex tasks often seen in bioinformatics. This prompts a discussion on how educators should address these developments in their teaching strategies. Moreover, the hosts explore the potential of large language models to improve base calling and read correction in sequencing, drawing on the structured and predictable nature of language and genetic code. They also discuss the idea of introducing randomness in these models to generate creative and varied solutions, potentially predicting future alleles or gene configurations. Ultimately, they express a blend of enthusiasm and apprehension towards the swift advances in this field and the ensuing implications for bioinformatics. They end on a note of anticipation for future developments, with a humorous nod towards AI's potential for automating mundane tasks like auto-correcting sample sheets. References: What Is ChatGPT Doing … and Why Does It Work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ Many bioinformatics programming tasks can be automated with ChatGPT https://arxiv.org/ftp/arxiv/papers/2303/2303.13528.pdf ChatGPT for bioinformatics https://medium.com/@91mattmoore/chatgpt-for-bioinformatics-404c6d0817a1 Empowering Beginners in Bioinformatics with ChatGPT https://www.biorxiv.org/content/10.1101/2023.03.07.531414v1 Lawyer uses GPT and get ethics violation https://simonwillison.net/2023/May/27/lawyer-chatgpt/ Can ChatGPT solve bioinformatic problems with Python? https://dmnfarrell.github.io/bioinformatics/chatGPT-python
In this episode of the Micro Binfie Podcast, titled "AI Unleashed: Navigating the Opportunities and Challenges of AI in Microbial Bioinformatics", Lee, Nabil, and Andrew unpack the implications of generative predictive text AI tools, notably GPT, on microbial bioinformatics. They kick off the conversation by outlining the various applications of AI tools in their work, which range from generating boilerplate programs, drafting documents, to summarizing vast tracts of data. Andrew talks about his experience with GPT in coding, specifically via VS Code and GitHub Copilot, highlighting how GPT can generate nearly 90% of the necessary code based on a brief description of the task, thereby accelerating his work. He goes on to discuss the use of GPT in clarifying lines of code and notes that they used AI to generate a paper on the ethical considerations of employing AI in microbial genomics research during a recent hackathon. The conversation then switches gears as Nabil shares his experience of using GPT to standardize date formats in tables and summarize paper abstracts. While GPT is generally accurate in performing simple tasks, he warns that the tool can sometimes provide erroneous answers. Nabil also highlights GPT's ability to generate plausible but inaccurate responses for complex prompts, as illustrated by his experience when he used it to find a route in a video game. Andrew then talks about a script they created during a hackathon, which produces podcast episodes reviewing math tools. He points out the issues encountered, such as GPT providing wrong factual information. Looking ahead, Andrew envisions a future awash with GPT-generated content that may or may not be correct, raising the challenge of discerning real and false information. However, they also acknowledge the potential benefits of AI technologies for those with visual impairments, though it's far from a perfect solution at present. The conversation veers to the use of AI tech in handling boilerplate code and generating code snippets based on predictive text. The hosts further discuss the potential for this tool in rapid language learning. A live experiment ensues where Nabil and Andrew use a Perl script and utilize GPT-4 to convert this script into Python and back again to assess its capabilities in language translation. The AI tool proves proficient, considering comments, usage, and authorship and employing popular libraries like BioPython intelligently, though it does leave a disclaimer about potential inaccuracies. They consider the possibility of using AI to optimize coding, similar to minifying JavaScript, and even the idea of iterating through multiple languages and assessing the output. Nabil initiates a simpler task for the AI, asking it to write a Python script translating DNA into protein, which then gets translated into Rust. Andrew shares his experience of using AI to generate a Python class that compares two spreadsheets using pandas, demonstrating AI's comprehension and execution of complex tasks. In summary, this episode underscores the power and potential of AI in coding and the need for human oversight to ensure the quality and effectiveness of AI-generated content. It offers a glimpse into a future where AI tools, despite their limitations, can revolutionize many aspects of programming, bringing in new efficiencies and methods of working.
The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.
We are back talking about systematics, and SeqCode; a nomenclatural code for prokaryotes described from sequence data. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center (DiSC). Link to paper: https://www.nature.com/articles/s41564-022-01214-9 History paper: https://www.sciencedirect.com/science/article/pii/S0723202022000121 They discussed the SeqCode, a nomenclature code for Prokaryotes described from sequence data. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer explained that the impetus for the SeqCode was the need to accommodate previously uncultivated organisms under a specific nomenclature code. She emphasized that the SeqCode was written to allow any peer-reviewed publication, but noted that the authors have designed three paths of validation in the SeqCode. They hope that anyone proposing a name will work with the curriculum team to ensure the best quality descriptions, names, etymology, and solidification. Rodriguez discussed the SeqCode's governance, which is already in place, and they have made them public so that anyone interested can join the SeqCode community. The governance structure comprises an executive board, committees, and working groups. The position's co-opted members hold some of the committees of these committees, while some are chosen by ballot. The hosts sought to clarify the relationship between the Isme Society, which is backing the SeqCode, and the wider field in general. Rodriguez explained that ISME is simply providing support as an umbrella organization for the SeqCode. Palmer and Rodriguez clarified that the SeqCode is not a competing code but rather a parallel one that aims to accommodate previously uncultivated organisms. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer noted that most scientists culture prokaryotes not for naming but to advance their knowledge of these organisms through physiology experiments. They emphasized that the new system is the result of a long collaborative effort that involved many different viewpoints and philosophies. The episode also discussed the practical requirements for naming under the new system, which include standards for the completeness and contamination levels required in the genome sequence data. Palmer noted that while the 16S rRNA gene sequence was not required for naming, it was recommended for improved accuracy in cross-talk between different taxonomies. The conversation highlighted the importance and challenges of naming microorganisms and the ongoing efforts to create a system that is inclusive of all microorganisms, both cultivated and uncultivated. Rodriguez and Palmer also discussed the SeqCode, a nature code for naming prokaryotes described from sequence data. They agreed that high-quality genomes should be the main control types to ensure the system builds up rather than breaks down. They noted the challenge of obtaining full genomes of some organisms, such as obligate intracellular parasites but suggested obtaining housekeeping genes as a potential solution. They further explained the technical issue of estimating completeness or contamination for many taxa, but Palmer confirmed that registering a name on the SeqCode registry requires adding such estimates. It emphasized the importance of collaboration within the scientific community and the need to create a system that is inclusive of all microorganisms. It also highlighted the challenges inherent in the process of naming microorganisms but demonstrated that it is an ongoing process, and that scientists are working to create a system that is accurate, practical, and beneficial for all.
Today we are talking about systematics, and specifically SeqCode; a nomenclatural code for prokaryotes described from sequence data. Joining us to talk about it are co-authors on the recent publication. Marike Palmer and Miguel Rodriguez. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center(DiSC).
An honest discussion about the up and downsides of doing a postdoc in front of an audience of first year PhD students. Guests Dr Emma Waters, Dr Heather Felgate and Dr Muhammad Yasir are joined by Dr Andrew Page. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK. Emma starts the conversation by sharing that she enjoys research and solving problems with different tools. The thrill of discovery and exploration that comes with the postdoc position is something she loves. Heather echoes Emma's thoughts and believes that she is happy where she is, rather than chasing after a higher paying job in the industry. She appreciates the flexibility that academia offers, which has enabled her to balance her family and personal life. The conversation takes a turn when PhD students ask if any of the postdocs regret the decision of choosing academia despite the evident pay gap between the industry and academia. Emma points out that although she may have earned more in the industry, she is happy where she is, and finds satisfaction in helping people through her work. Chasing profits in the industry would not offer her that kind of gratification. Yasir shares his success story of sequencing 600 samples of the SAR-CoV-2 virus in Pakistan, and how it contributed towards the fight against the pandemic. He credits the freedom and flexibility of academia that allows him to collaborate with colleagues from all over the world. In conclusion, Andrew advises students to explore their options and to keep their careers open-ended. He suggests that if they are after a higher paycheck, they should consider the bioinformatics data science path that offers more earning opportunities in the industry. The postdocs stress the importance of following what makes one happy in life, rather than chasing big salaries.
This is a panel discussion on mobile genetic elements, guest chaired by Dr Muhammad Yasir with guests Dr Emma Waters, Dr Heather Felgate and Dr Andrew Page. We cover AMR, Salmonella Typhi and Staphylococci and outbreaks and the role of MGEs. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK.
loading
Comments 
Download from Google Play
Download from App Store