Course Overview
Transcriptomic Data Analysis course is a comprehensive program designed to equip participants with the skills and knowledge required to analyze transcriptomic datasets effectively. The course spans fundamental concepts, hands-on tools, and advanced analytical techniques for both bulk and single-cell RNA sequencing (RNA-seq) data. With a focus on reproducibility, practical applications, and integration of computational and statistical methods, this course is tailored for biologists, bioinformaticians, and data scientists interested in transcriptomics.
Course Objectives
By the end of this course, students will be able to:
- Understand the fundamental concepts of transcriptome analysis and RNA sequencing.
- Perform upstream RNA-seq data pre-processing in a Linux environment.
- Utilize R and the Tidyverse ecosystem for data manipulation and visualization.
- Analyze RNA-seq data using Bioconductor packages, including DESeq2 for differential gene expression.
- Conduct advanced downstream RNA-seq analysis and identify differentially expressed genes.
- Apply Seurat for single-cell RNA-seq analysis and explore cellular heterogeneity.
- Analyze spatial transcriptomics data to understand spatial gene expression patterns.
What is RNA Seq?
RNA-Seq, a pivotal tool employing
Next-Generation Sequencing (NGS) technologies, is designed to create detailed maps and quantifications of the
transcriptome, thereby unmasking information such as gene transcription levels, the structure and expression of transcripts, RNA modification, and non-coding RNA, among other facets. The transcriptome, a comprehensive collection of all transcripts in a cell, offers vital information regarding transcript levels at specific developmental stages or physiological states. Comprehending the transcriptome is essential to interpret the functional elements of the genome, as well as to understand biological development and diseases. Key objectives of transcriptomics encompass cataloging all species of transcripts; pinpointing the transcriptional structure of genes; and quantifying the expression levels of each transcript under varying conditions.
In offering an unbiased high-resolution view of global transcription patterns, RNA-Seq introduces an economical and accurate method for gene expression quantification and differential gene expression analysis across multiple sample groups. It enables the identification of novel and previously unpredicted transcripts, independent of a reference genome, hence facilitating de novo assembly of unstudied transcriptomes. Further, it allows for the discovery of new gene architectures, alternatively spliced isoforms, gene fusions, SNP/InDel, and allele-specific expressions (ASE).
Advantages of RNA-Seq
- Quantitative and precise measurements of RNA molecules at a single base-pair resolution
- Discovery of novel transcripts, splice variants, and gene fusions
- Remarkably, this strategy is applicable to any species, regardless of the availability of the reference genome
- A practice affording comparable or even lower costs relative to many other methodologies.
- The approach adeptly detects various RNA types, spanning mRNA, miRNA, lncRNA, amongst others, proffering a comprehensive viewpoint on the RNA present in cells or tissues.
- Noteworthy is the capacity to simultaneously analyze multiple samples, efficiently accruing abundant data—an attribute underscoring its profound utility in high-throughput RNA analytics.
RNA-Seq Development
Sequencing technology has undergone significant transformations and advancements over time, particularly over the past two to three decades. Initially,
Sanger sequencing was instituted as the first-generation sequencing method. By leveraging reversible termination synthesis reactions in binary chemistry, Sanger sequencing enabled the determination of base sequences at the DNA termini in accordance with the RNA sequence. The 1990s marked noteworthy strides in sequencing technology coinciding with the commencement of the whole-genome sequencing projects. Introduction of
high-throughput sequencing platforms, such as 454 sequencing,
Illumina sequencing, and Ion Torrent sequencing, facilitated the feasibility of RNA sequencing. Traditional RNA sequencing technologies often required a substantial volume of cells to obtain a satisfactory quantity of RNA for sequencing, which overshadowed the heterogeneity amongst different cells. Subsequently, the emergence of
single-cell RNA sequencing techniques, like
SMART-seq and
10x Genomics, permitted high-throughput sequencing of the transcriptomes of individual cells. In essence, these techniques uncover the gene expression characteristics of distinct cellular types and states.
The Applications of RNA-Seq
RNA sequencing (RNA-seq) stands as a widely employed technique, applicable across various domains within biological and medical research. Below delineates several common applications of RNA-seq:
- Gene expression analysis
- Differential gene expression analysis
- Discovery of novel genes
- Alternative splicing analysis
-
Biomarker discovery
- Non-coding RNA research
- Gene function elucidation
-
Population genetics and evolutionary biology