In the paper Landscape of Transcription in Human Cells the study tries to answer the general question: should the current definition of a human genome be redefined? Their study also attempts to illustrate the role of genome in the synthesis, processing, transport, modification and translation of RNA and the significance of generating an RNA catalogue to explain genomic functions. The study works under the premise that the concept of a gene may be redefined on the basis of identifying and characterizing annotated and enriched novel RNAs for all 15 cell lines that were studied in the nucleus and cytosol region, and in a single cell line of subnuclear compartments such as the chromatin.
In conjunction to the identification and characterization of annotated and novel RNAs in the subcellular localizations, the paper further investigates if these RNA transcripts are altered at their 59 terminus by 7-methyl guanosine cap and 39 terminus by polyadenylation. Polyadenylation and 7-methyl guanosine cap are both vital for effective gene expression. Polyadenylation is also necessary in the formation of mRNA.
The characterization of long RNA expression sites considers various factors: detection of annotated and novel transcripts; transcriptome of nuclear subcompartments; gene expression across cell lines; patterns of splicing; and alternative transcription, initiation and termination of the process. Alongside these factors considered were the corresponding methods. For all 15 cell lines that were studied in the nucleus and cytosol region fractionation was performed before the isolation of RNA to examine the human transcriptome. An additional nuclear subfractionation was also made for the single cell line of subnuclear compartments. Further fractionation into polyadenylated and nonpolyadenylated transcripts were made for long RNAs. During characterization of RNA fractions to “their sequence, sites of initial transcription and sites of 59 and 39 transcript termini,” various complementary technologies were used to include Combinatorial Analysis of Gene Clusters for gene expressions and spectrometric analysis. RNA sequences were mapped to calculate the presence of de novo elements (i.e. exons, transcripts, genes, contigs, splice juctions and transcription start sites (TSSs)) and the annotated GENCODE elements. These annotated elements were subjected to test the irreproducible detection rate where those elements found to be reproducible at a 90% likelihood ratio will be analyzed further.
Results revealed that most annotated elements are present in both polyadenylated and nonpolyadenylated samples where 70% of the samples consist of splice juctions, transcripts and genes. Novel elements that were detected encompassed 78% of the intronic nucleotide and 34% of the intergenic sequences. These novel elements were found to increase the collection of exons, splice sites, transcript and genes. Such increase in the number of genes by 80% was ascribed to the detection of both polyadenylated and nonpolyadenylated mono-exonic transcripts. The mass spectrometric analyses of transcript models revealed that most novel transcripts may lack protein coding capacity.
Analyses of RNA isolated from the chromatin, nucleolus and nucleoplasm shows that only a small fraction of annotated or novel elements was unique to each nuclear subcellular compartment. Based on the data, splicing was mostly observed in the chromatin-associated RNA. Alongside, a strong spliceosomal small nuclear RNA enrichment was noted in the chromatin-associated RNA.
Apart from the chromatin-splicing linkage, detected transcripts from subcellular compartments were also analyzed to gather more information on gene expressions. The predicted novel antisense and intergenic genes covers one third of RNA clusters where the level of expression has a range of 1024 to 1021 r.p.k.m. Enrichment of protein-coding genes were only seen in the cytosol. This indicates that the nucleus is the main repository of non-coding RNA (ncRNA). Subcellular compartment enrichments of pseudogenes and annotated ncRNAs were all noted in the study. Across cell lines, the distribution of gene expression is higher in protein coding genes than long non-coding RNAs (lncRNA). But when measured at the individual transcript level, the value of expression of lncRNA was found comparable to that of individual protein-coding transcripts. Correlation analysis suggested that lncRNA contribute more to cell-line specificity than protein-coding genes during expression. Fifty three percent of protein-coding genes were expressed in all cell lines and 7% were cell-line specific as opposed to lncRNA where 10% were only expressed in cell lines and the rest were cell-line specific. In understanding the pattern of splicing, an alternative isoform expression analysis revealed that multiple isoform expressions occur simultaneously. The alternative isoforms expressions were found at different levels where there is a domination of a single isoform at a certain condition. Depending on the cell line, protein-coding genes comprise a minimum of two different principal isoforms. Major isoforms in genes grow with the same proportion to the annotated isoform. A greater contribution to the differences of transcript richness across cell lines is accounted to the variability of gene expression than to the variability of the splicing ratio. To study transcription, RNA sequences and TSSs were compared and related to chromatin and the DNA characteristic features during initiation of transcription. These features include “DNase hypersensitivity, chromatin modification and DNA binding element. Both RNA sequences and TSSs indicate feature of CAGE. At least one of the landscapes of the beginning of transcription initiation process was closely linked to an estimated half of the total TSSs studied. Only a small fraction of TSSs showed all features related to the start of transcription. Annotated transcripts were marked as potential site of polyadenylation during termination.
In the characterization of short RNA expression landscape, annotated small RNAs and unannotated small RNAs were detected. The researchers noted that 28% of all annotated small RNAs were expressed in at least one cell line. It is the distribution of these annotated small RNAs based on their functions which defines the cytosolic and nuclear boundaries. Cytosol is marked by micro RNAs and transfer RNAs while small nucleolar RNA is in the nucleus. Small nuclear RNA was found abundant in both nucleus and cytosol compartments and in the chromatin-associated RNA fraction within the subnuclear compartment of the cell line. Unannotated short RNAs were detected from subfragments of annotated small RNAs and from novel short RNAs. Small nuclear RNA detected in the subfragments of annotated small RNAs was noted at the 59 terminus, middle and 39 terminus. The cytosol and nucleus area was found not to be enriched with small unannotated RNAs derived from novel short RNAs.
It was also observed that 18% of both annotated protein-coding and ling non-coding genes shows allele-specific expression. Genes having allele-specific expression are proportionate to RNA fractions. Cell-line specificity was pointed out as the main characteristic of transcripts from repeated regions and not the human genome based on the cluster mapping undertaken. On the characterization of transcriptional activity at enhancer RNA, the aggregate pattern of RNA sequence and CAGE signal around the predicted gene-distal enhancers having DNase I hypersensitive sites shows that there is a transcription initiation within the enhancer region. This process continues outwards for several kilobases and can also be observed for both polyadenylated and non-polyadenylated RNA fractions. Chromatin is modified in transcribed enhancers and not in non-transcribe ones which may be indicative of the difference in the regulatory regions with respect to the expression of enhancer transcripts and the regulatory regions at the beginning of the genic regions.
Djebali, Sarah, Carrie A. Davis, Angelika Merkel, Alex Dobin, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, Julien Lagarde, Wei Lin, Felix Schlesinger, Chenghai Xue, Georgi K. Marinov, Jainab Khatun, Brian A. Williams,Chris Zaleski, Joel Rozowsky, Maik Roder, Felix Kokocinski, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin, Michael T. Bae2, Nadav S. Bar, Philippe Batut, Kimberly Bell, Ian Bell, Sudipto Chakrabortty, Xian Chen, Jacqueline Chrast, Joao Curado, Thomas Derrien, Jorg Drenkow, Erica Dumais, Jacqueline Dumais, Radha Duttagupta, Emilie Falconnet, Meagan Fastuca, Kata Fejes-Toth, Pedro Ferreira, Sylvain Foissac, Melissa J. Fullwood, Hui Gao, David Gonzalez, Assaf Gordon, Harsha Gunawardena, Cedric Howald, Sonali Jha, Rory Johnson, Philipp Kapranov, Brandon King, Colin Kingswood, Oscar J. Luo, Eddie Park, Kimberly Persaud, Jonathan B. Preall, Paolo Ribeca, Brian Risk, Daniel Robyr, Michael Sammeth, Lorian Schaffer, Lei-Hoon See, Atif Shahab, Jorgen Skancke, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner, Diane Trout, Nathalie Walters, HuaienWang, John Wrobel, Yanbao Yu, Xiaoan Ruan, Yoshihide Hayashizaki, Jennifer Harrow, Mark Gerstein, Tim Hubbard, Alexandre Reymond, Stylianos E. Antonarakis, Gregory Hannon, Morgan C. Giddings, Yijun Ruan, Barbara Wold, Piero Carninci, Roderic Guigo´, & Thomas R. Gingeras. “Landscape of Transcription in Human Cells.” Nature 489 (2012): 101-108.Web.