123 The Revolution of Hash Databases in cgMLST

Update: 2024-03-21

Description

In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics.

The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources.

Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions.

The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases.

Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types.

Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics.

This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

132 Unlocking the Secrets of Antimicrobial Resistance in Metagenomes

2024-11-2111:02

131 Bioinformatics Evolution: Torsten Seemann on Snippy, Open-Source Support, and Global Genomics

2024-11-0712:47

130 Exploring Genomic Innovation and Machine Learning in Public Health

2024-10-2512:35

129 Genomics on the Frontier: Bactopia, Bioinformatics, and Pathogen Surveillance with Robert Petit

2024-10-1114:58

128 Haiti cholera outbreak with Christine Lee and Cynney Walters

2024-09-1921:57

127 Minimum spanning trees

2024-09-0515:49

126 Tree viz

2024-08-2221:21

125 Kostas Konstantinidis returns to talk to us about ANI and metagenomics

2024-06-0622:07

124 Kostas Konstantinidis talks to us about ANI and metagenomics

2024-05-2322:58

123 The Revolution of Hash Databases in cgMLST

2024-03-2117:42

122 GAMBIT: Genomic Approximation Method for Bacterial Identification and Tracking

2024-03-0917:29

121 K-mers, Sourmash, and Open Source Software - More Conversations with Titus Brown

2024-02-0120:03

120 Scaling Metagenomic Search with Sourmash - Conversations with Titus Brown

2024-01-1825:25

119 The Challenges of Microbial Taxonomy - Conversations with Titus Brown

2024-01-0422:53

118 Real bioinformaticians react to Jurassic Park

2023-12-2101:00:44

117 From Math to Metagenomics - Titus Brown on Career Journeys and Software Solutions

2023-12-0720:12

116 AI Authorship and Ethics in Academic Publishing for Genomics

2023-11-2332:27

115 Write-the: speeding up software development for bioinformatics

2023-11-0924:22

114 Write-the: Automating Code Documentation ChatGPT

2023-11-0425:35

113 Global Microbial Identifier conference with Ruth Timme

2023-09-1808:11

00:00

123 The Revolution of Hash Databases in cgMLST

#box-pro-ellipsis-17322873212908{-webkit-line-clamp:2;}123 The Revolution of Hash Databases in cgMLST

123 The Revolution of Hash Databases in cgMLST

Microbial Bioinformatics

123 The Revolution of Hash Databases in cgMLST