TY - JOUR
T1 - Differential expression analysis for RNAseq using Poisson mixed models
AU - Sun, Shiquan
AU - Hood, Michelle
AU - Scott, Laura
AU - Peng, Qinke
AU - Mukherjee, Sayan
AU - Tung, Jenny
AU - Zhou, Xiang
N1 - Publisher Copyright:
© The Authors 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample nonindependence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample nonindependence. We also develop a scalable samplingbased inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes).We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.
AB - Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample nonindependence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample nonindependence. We also develop a scalable samplingbased inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes).We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.
UR - https://www.scopus.com/pages/publications/85026667037
U2 - 10.1093/nar/gkx204
DO - 10.1093/nar/gkx204
M3 - 文章
C2 - 28369632
AN - SCOPUS:85026667037
SN - 0305-1048
VL - 45
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 11
M1 - e106
ER -