#60 Differential gene expression and DESeq2 with Michael Love

Update: 2021-05-12

Description

In this episode, Michael Love joins us to talk about the differential gene
expression analysis from bulk RNA-Seq data.

We talk about the history of Mike’s own differential expression package,
DESeq2, as well as other packages in this space, like edgeR and limma, and the
theory they are based upon. Mike also shares his experience of being the
author and maintainer of a popular bioninformatics package.

Links:

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
(Love, M.I., Huber, W. & Anders, S.)

DESeq2 on Bioconductor

Chan Zuckerberg Initiative: Ensuring Reproducible Transcriptomic Analysis with DESeq2 and tximeta

And a more comprehensive set of links from Mike himself:

limma, the original paper and limma-voom:

https://pubmed.ncbi.nlm.nih.gov/16646809/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053721/

edgeR papers:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796818/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/

The recent manuscript mentioned from the Kendziorski lab, which has a Gamma-Poisson hierarchical structure, although it does not in general reduce to the Negative Binomial:

https://doi.org/10.1101/2020.10.28.359901

We talk about robust steps for estimating the middle of the dispersion prior distribution, references are Anders and Huber 2010 (DESeq), Eling et al 2018 (one of the BASiCS papers), and Phipson et al 2016:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218662/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6167088/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/

The Stan software:

https://mc-stan.org/

We talk about using publicly available data as a prior, references I mention are the McCall et al paper using publicly available data to ask if a gene is expressed, and a new manuscript from my lab that compares splicing in a sample to GTEx as a reference panel:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013751/
https://doi.org/10.1101/856401

Regarding estimating the width of the dispersion prior, references are the Robinson and Smyth 2007 paper, McCarthy et al 2012 (edgeR), and Wu et al 2013 (DSS):

https://pubmed.ncbi.nlm.nih.gov/17881408/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590927/

Schurch et al 2016, a RNA-seq dataset with many replicates, helpful for benchmarking:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/

Stephens paper on the false sign rate (ash):

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379932/

Heavy-tailed distributions for effect sizes, Zhu et al 2018:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581436/

I credit Kevin Blighe and Alexander Toenges, who help to answer lots of DESeq2 questions on the support site:

https://www.biostars.org/u/41557/

https://www.biostars.org/u/25721/

The EOSS award, which has funded vizWithSCE by Kwame Forbes, and nullranges by Wancen Mu and Eric Davis:

https://chanzuckerberg.com/eoss/proposals/ensuring-reproducible-transcriptomic-analysis-with-deseq2-and-tximeta/

https://kwameforbes.github.io/vizWithSCE/

https://nullranges.github.io/nullranges/

One of the recent papers from my lab, MRLocus for eQTL and GWAS integration:

https://mikelove.github.io/mrlocus/

If you enjoyed this episode, please consider supporting the podcast on Patreon.

Comments

In Channel

#70 Prioritizing drug target genes with Marie Sadler

2023-12-2152:20

#69 Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa

2023-09-2956:46

#68 Phylogenetic inference from raw reads and Read2Tree with David Dylus

2023-08-2849:11

#67 AlphaFold and variant effect prediction with Amelie Stein

2023-07-2935:25

#66 AlphaFold and shape-mers with Janani Durairaj

2023-07-1020:51

#65 AlphaFold and protein interactions with Pedro Beltrao

2023-06-2152:23

#64 Enformer: predicting gene expression from sequence with Žiga Avsec

2021-11-0959:41

#63 Bioinformatics Contest 2021 with Maksym Kovalchuk and James Matthew Holt

2021-09-2701:00:47

#62 Steady states of metabolic networks and Dingo with Apostolos Chalkis

2021-07-2838:25

#61 3D genome organization and GRiNCH with Da-Inn Erika Lee

2021-06-2301:09:41

#60 Differential gene expression and DESeq2 with Michael Love

2021-05-1201:31:15

#59 Proteomics calibration with Lindsay Pino

2021-04-2148:26

#58 B cell maturation and class switching with Hamish King

2021-03-3101:29:11

#57 Enhancers with Molly Gasperini

2021-03-1046:57

#56 Polygenic risk scores in admixed populations with Bárbara Bitarello

2021-02-1701:30:12

#55 Phylogenetics and the likelihood gradient with Xiang Ji

2021-01-1357:02

#54 Seeding methods for read alignment with Markus Schmidt

2020-12-1601:00:46

#53 Real-time quantitative proteomics with Devin Schweppe

2020-11-1801:03:13

#52 How 23andMe finds identical-by-descent segments with William Freyman

2020-10-2742:40

#51 Basset and Basenji with David Kelley

2020-10-0701:13:58

00:00

#60 Differential gene expression and DESeq2 with Michael Love

#box-pro-ellipsis-176758164880919{-webkit-line-clamp:2;}#60 Differential gene expression and DESeq2 with Michael Love

#60 Differential gene expression and DESeq2 with Michael Love

Roman Cheplyaka

#60 Differential gene expression and DESeq2 with Michael Love