findDEG: an integrated software package for differential gene expression analysis with RNA sequencing data

WU Jiyan,YAO Dan,WU Hainan,TONG Chunfa

JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2019, Vol. 43 ›› Issue (02) : 93-99.

PDF(1847118 KB)
PDF(1847118 KB)
JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2019, Vol. 43 ›› Issue (02) : 93-99. DOI: 10.3969/j.issn.1000-2006.201806029

findDEG: an integrated software package for differential gene expression analysis with RNA sequencing data

  • WU Jiyan,YAO Dan,WU Hainan,TONG Chunfa*
Author information +
History +

Abstract

【Objective】With the fast development of next-generation sequencing technology, transcriptome sequencing(or RNA-seq)is being widely used for differential gene expression analyses and gene annotations in many species. A variety of software packages for RNA-seq data analysis are available. However, the practical analysis involves several complicated steps and multiple parameters, making it difficult for most researchers to perform such an analysis accurately. 【Method】Based on the available software packages such as Trinity, TopHat+Cufflinks and HISAT2+StringTie, an integrated package was generated to analyze RNA-seq data by considering different methods for computing gene expression abundance and hypothesis testing of differential gene expression. Meanwhile, other issues were also considered, including whether a reference genome is available, if the sampling is repetitive or not, and whether the data is paired or single end. 【Result】An integrated software package called findDEG was developed with Perl language for differential gene expression analysis. The software consisted of three modules, i.e., Trinity, TopHat+Cufflinks, and HISAT2+StringTie. The Trinity module provides three methods for calculating transcript expression abundance and four methods for testing differentially expressed genes, while the module TopHat+Cufflinks allows users to choose either the new or old version of Cufflinks for performing differential gene expression analysis. However, the module HISAT2+StringTie has only one strategy for the analysis. The new software is freely available at the website http://www.bioseqdata.com/findDEG/findDEG.htm. By taking three analytical strategies, including the old and new versions of Cufflinks and the Trinity module, we analyzed the RNA-seq data from Populus simonii under normal and drought stress conditions. Consequently, the new and old versions of Cufflinks identified 53 and 33 differentially expressed genes, respectively, with 25 matching genes between them. Trinity detected up to 1 641 differentially expressed genes, of which 14 and 3 genes were the same as the results from the new and old versions of Cufflinks, respectively. 【Conclusion】The new developed software findDEG can conveniently provide more than a dozen strategies for differential gene expression analysis with RNA-seq data by using one piece of software to conduct the whole analysis, avoiding many intermediate parameters and results that would need to be manually processed.

Cite this article

Download Citations
WU Jiyan,YAO Dan,WU Hainan,TONG Chunfa. findDEG: an integrated software package for differential gene expression analysis with RNA sequencing data[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2019, 43(02): 93-99 https://doi.org/10.3969/j.issn.1000-2006.201806029

References


[1] TRAPNELL C, ROBERTS A, GOFF L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks [J]. Nature Protocols, 2012, 7(3): 562-578. DOI:10.1038/nprot.2012.016.
[2] PERTEA M, KIM D, PERTEA G M, et al. TranscripT-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown [J]. Nature Protocols, 2016, 11(9): 1650. DOI:10.1038/nprot.2016.095.
[3] HAAS B J, PAPANICOLAOU A, YASSOUR M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis [J]. Nature Protocols, 2013, 8(8): 1494-1512. DOI:10.1038/nprot.2013.084.
[4] GHOSH S, CHAN C K K. Analysis of RNA-Seq data using TopHat and Cufflinks [J]. Methods in Molecular Biology, 2016, 1374:339-361. DOI:10.1007/978-1-4939-3167-5_18.
[5] KIM D, LANGMEAD B, SALZBERG S L. HISAT: a fast spliced aligner with low memory requirements [J]. Nature Methods, 2015, 12(4): 357-360. DOI:10.1038/nmeth.3317.
[6] FRAZEE A C, PERTEA G, JAFFE A E, et al. Ballgown bridges the gap between transcriptome assembly and expression analysis [J]. Nature Biotechnology, 2015, 33(3): 243-246. DOI:10.1038/nbt.3172.
[7] LI B, DEWEY C N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J]. BMC Bioinformatics, 2011, 12(1): 323. DOI:10.1186/1471-2105-12-323.
[8] BRAY N L, PIMENTEL H, MELSTED P, et al. Near-optimal probabilistic RNA-seq quantification [J]. Nature Biotechnology, 2016, 34(5): 525-527. DOI:10.1038/nbt.3519.
[9] PATRO R, DUGGAL G, LOVE M I, et al. Salmon provides fast and bias-aware quantification of transcript expression [J]. Nature Methods, 2017, 14(4): 417-419. DOI:10.1038/nmeth.4197.
[10] ROBINSON M D, MCCARTHY D J, SMYTH G K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data [J]. Bioinformatics, 2010, 26(1): 139-140. DOI:10.1093/bioinformatics/btp616.
[11] ANDERS S, MCCARTHY D J, CHEN Y, et al. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor [J]. Nature Protocols, 2013, 8(9): 1765-1786. DOI:10.1038/nprot.2013.099.
[12] LAW C W, CHEN Y, SHI W, et al. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts [J]. Genome Biology, 2014, 15(2): 29. DOI:10.1186/gb-2014-15-2-r29.
[13] SUOMI T, SEYEDNASROLLAH F, JAAKKOLA M K, et al. ROTS: an R package for reproducibility-optimized statistical testing [J]. PloS Computational Biology, 2017, 13(5): e1005562. DOI:10.1371/journal.pcbi.1005562.
[14] SAHRAEIAN S M E, MOHIYUDDIN M, SEBRA R, et al. Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis [J]. Nature Communications, 2017, 8(1): 59. DOI:10.1038/s41467-017-00050-4.
[15] TONG C F, LI H G, WANG Y, et al. Construction of high-density linkage maps of Populus deltoides × P. simonii using restriction-site associated DNA sequencing [J]. PloS One, 2016, 11(3):e0150692. DOI:10.1371/journal.pone.0150692.
[16] MOUSAVI M, TONG C F, LIU F X, et al. De novo SNP discovery and genetic linkage mapping in poplar using restriction site associated DNA and whole-genome sequencing technologies [J]. BMC Genomics, 2016, 17:656. DOI:10.1186/s12864-016-3003-9.
[17] 欧佳佳. 杨树干旱响应转录组测序分析 [D].南京: 南京林业大学, 2015.
OU J J. Research on the drought-responsive transcriptome of Populus using RNA-seq [D].Nanjing: Nanjing Forestry University, 2015.
[18] TRAPNELL C, PACHTER L, SALZBERG S L. TopHat: discovering splice junctions with RNA-Seq [J]. Bioinformatics, 2009, 25(9): 1105-1111. DOI:10.1093/bioinformatics/btp120.
[19] TRAPNELL C, WILLIAMS B A, PERTEA G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation [J]. Nature Biotechnology, 2010, 28(5): 511-515. DOI:10.1038/nbt.1621.
[20] PERTEA M, PERTEA G M, ANTONESCU C M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads [J]. Nature Biotechnology, 2015, 33(3): 290-295. DOI:10.1038/nbt.3122.
[21] GRABHERR M G, HAAS B J, YASSOUR M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome [J]. Nature Biotechnology, 2011, 29(7): 644-652. DOI:10.1038/nbt.1883.
[22] LANGMEAD B, SALZBERG S L. Fast gapped-read alignment with Bowtie 2 [J]. Nature Methods, 2012, 9(4): 357-359. DOI:10.1038/nmeth.1923.
[23] LI H, HANDSAKER B, WYSOKER A, et al. The sequence alignment/map format and SAMtools [J]. Bioinformatics, 2009, 25(16):2078-2079. DOI:10.1093/bioinformatics/btp352.
[24] BENJAMINI Y, HOCHBERG Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing [J]. Journal of the Royal Statistical Society, 1995, 57(1): 289-300. DOI:10.1111/j.2517-6161.1995.tb02031.x.
[25] TUSKAN G A, DIFAZIO S, JANSSON S, et al. The genome of black cottonwood, Populus trichocarpa(Torr. & Gray)[J]. Science, 2006, 313(5793):1596-1604.DOI:10.1126/science.1128691.
[26] TANG S, DONG Y, LIANG D, et al. Analysis of the drought stress-responsive transcriptome of black cottonwood(Populus trichocarpa)using deep RNA sequencing [J]. Plant Molecular Biology Reporter, 2014, 33(3): 424-438. DOI:10.1007/s11105-014-0759-4.
[27] TANG S, LIANG H, YAN D, et al. Populus euphratica: the transcriptomic response to drought stress [J]. Plant molecular biology, 2013, 83(6): 539-557. DOI:10.1007/s11103-013-0107-3.
[28] ROBERTS R J, CARNEIRO M O, SCHATZ M C. The advantages of SMRT sequencing [J]. Genome Biology, 2013, 14(7): 405. DOI:10.1186/gb-2013-14-6-405.
[29] JAIN M, OLSEN H E, PATEN B, et al. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community [J]. Genome Biology, 2016, 17(1): 239. DOI:10.1186/s13059-016-1103-0.
[30] SEDLAZECK F J, LEE H, DARBY C A, et al. Piercing the dark matter: bioinformatics of long-range sequencing and mapping [J]. Nature Reviews Genetics, 2018, 19(6): 329-346. DOI:10.1038/s41576-018-0003-4.
PDF(1847118 KB)

Accesses

Citation

Detail

Sections
Recommended
The full text is translated into English by AI, aiming to facilitate reading and comprehension. The core content is subject to the explanation in Chinese.

/