DiscoverTrends from the TrenchesTony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future
Tony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future

Tony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future

Update: 2022-02-21
Share

Description

At the National Cancer Institute, Tony Kerlavage knows quite a bit about managing very large pools of data. When NCI launched the Genomic Data Commons, it aimed to democratize access to the genomic data in The Cancer Genome Atlas and other sources. Since then, though, Kerlavage points out that our data types and volumes have only grown. Now NCI is taking a “Commons of Commons” approach to link pools of well-structured data. “The more data we can bring together in a well-structured way, the more value it has in the long run,” he believes. He advocates for sharable Python notebooks and reusable R programming, believing significant investments in data hygiene and interoperability delivers more value than simply mining data lakes with artificial intelligence tools—for now, at least. The challenge for researchers, Kerlavage says, is to view their work with an eye to the future: How might someone else use this data going forward? 

Links from this episode:  
Bio-IT World
BioTeam
NCI Launches Genomic Data Commons
Bob Grossman’s Vision of the Commons of Commons
BioTeam’s Approach to Collaborative Dictionary Authoring 

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Tony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future

Tony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future

Bio-IT World