Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression
Steve Lianoglou, Vidur Garg, Julie L. Yang, Christina S. Leslie, Christine Mayr

Overview of study

More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 3' untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. The extent of 3' UTR variation across tissues and the functional role of ApA remain poorly understood. We developed a sequencing method to quantitatively map the 3' ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that cell type-specific gene expression is accomplished by two complementary programs. Tissue-restricted genes tend to have single 3' UTRs, whereas a majority of ubiquitously transcribed genes generate multiple 3' UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes mostly change 3' UTR isoform ratios to achieve tissue specificity. However, both regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Instead of finding global shifts in 3' UTR length during transformation and differentiation, we identify tissue-specific groups of multi-UTR genes that change their 3' UTR ratios; these changes in 3' UTR length are largely independent from changes in mRNA abundance. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs.

Data Generated

We have mapped the 3'-ends of the polyadenylated RNAs from the human transcriptome across a a diverse set of tissues and conditions. These data have been submitted to the SRA (project id SRP029953) and are included here in as alignments against hg19 in a variety of forms.

We are providing three different "views" (BAM files) over our raw data from each tissue in order to facilitate further investigation of our data at different levels of scrutiny. These different views are outlined below and represent intermediary (and final steps) of the data as it is processed through our pipeline.

The denoised (final) alignments are the set of reads that was used for the analysis in this paper.

For each "view," we also provide unified BAM files which include all of the reads from each experiment in a single BAM file. These BAM files show the averaged cleavage pattern over each gene. The reads in these unified BAM files are annotated with @RG (read group) tags that indicate which experiment the read comes from.

--- layout: default ---

Views over the data

The pictures (1-3) below show how the cleavage profile is affected after key steps in our processing pipeline. The top three lanes in each picture is the cleavage profile in testis, skeletal muscle, and brain. The bottom lane is the unified profile.

1. Raw alignments

2. Cleaned alignments

3. Denoised (final) alignments

  1. Raw alignments: These are the alignments of our reads without any further processing. This includes multi-mapped reads, as well as reads that are results of likely artifacts of our sequencing method described in the supplemental methods, namely (1) internally primed reads; and (2) antisense reads
  2. Cleaned alignments: These files consist of a subset of the original reads which (1) only map uniquely to the human genome (hg19); and (2) are likely not a result of internal priming or spurious antisense reads.
  3. Denoised (final) alignments: These files are a stricter subset of the cleaned alignments and is the set of reads that was used for the analysis in this paper. As described in the main manuscript, we set minimum expression and usage requirements of each cleavage event over each gene in order for it to be considered in our analysis. We did so in order to focus our analysis on events that were unlikely to originate from transcriptional or sequencing "noise". Intergenic cleavage events that are greater than 5kb downstream from annotated 3'UTRs have been removed.

Accessing the data

The datasets are made available as BAM files of read alignments and hosted on a publicly accessible web server.

There are many ways to view BAM files over the internet. We describe how you can use (1) the Integrated Genomics Viewer (IGV); as well as (2) the UCSC Genome Browser to view our data.

Accessing the data with IGV

We have made it as easy as possible to use IGV to explore our data by setting up a custom Data Server which allows you to easily pick and choose which datasets you would like to explorec

If you do not have IGV installed locally on your computer, you can launch IGV can using your web browser by selecting an appropriate memory configuration from here. If you will be loading many datasets simultaneously, you will want to pick the largest RAM configuration your computer can support

Once IGV is launched, you will want to configure it to use our Data Server to easily load remote datasets.

  1. Open IGV's preference window by navigating to the View menu and selecting Preferences....
  2. Navigate to the Advanced tab in the preferences window.
  3. Click on the Edit server properties checkbox and set the Data Registry URL field to: http://cbio.mskcc.org/public/Leslie/ApA/igv/data.registry.txt as shown below.
  4. Click the OK button on the bottom right hand side of the preferences window to save your changes

You can now easily load our alignment (BAM) files by navigating to the File menu and selecting Load from Server.... You will see our dataset at the top of the Available Datasets list. You can pick which alignment tracks (TDF tracks are coming soon) to show by expanding the triangles under ApA Atlas > Lianoglou et al., 2013 and selecting the samples you want to view, as shown below.

Viewing the data using the UCSC Genome Browser

Our data can be viewed using the UCSC Genome Browser by adding the URL to each BAM file as a custom track to your browsing session. There are detailed instructions on how to configure custom tracks at the UCSC Genome Browser site, but we will recapitulate the important steps below.

  1. Use your web browser to navigate to http://genome.ucsc.edu/cgi-bin/hgGateway and ensure that you have the proper organism (Human) and assembly (GRCh37/hg19) selected.
  2. Click the "add custom tracks" button
  3. Copy and paste the appropriate data track URLs (below) for the BAM files you want to view. Each data track must be separated by a carriage return. Hit submit when you have selected all the tracks you are interested in viewing.
  4. Upon successfully uploading the track information, you will be taken to a page that allows you to manage your "loaded" custom tracks. From here you can jump to the genome browser, where you will now see the alignments integrated into the region of the genome you are exploring.

Data track URLs

Listed below are the samples analyzed in this study. Click on the raw, cleaned, or final links beside each sample to show (hide) the track information that you can copy and paste into your "custom track manager" at the UCSC Genome Browser (step 3 above).

  • B Cells (1) [raw | cleaned | final]
  • B Cells (2) [raw | cleaned | final]
  • B-LCL [raw | cleaned | final]
  • Brain [raw | cleaned | final]
  • Breast [raw | cleaned | final]
  • HEK293 [raw | cleaned | final]
  • HeLa [raw | cleaned | final]
  • hES [raw | cleaned | final]
  • MCF10A (1) [raw | cleaned | final]
  • MCF10A (2) [raw | cleaned | final]
  • MCF10A + HRAS (1) [raw | cleaned | final]
  • MCF10A + HRAS (2) [raw | cleaned | final]
  • MCF7 [raw | cleaned | final]
  • NTERA [raw | cleaned | final]
  • Ovary [raw | cleaned | final]
  • Skeletal muscle [raw | cleaned | final]
  • Testis [raw | cleaned | final]
  • unified-atlas [raw | cleaned | final]

Change unique to multimap in the URLs above to show alignments that include multimapped reads.