Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression
Steve Lianoglou, Vidur Garg, Julie L. Yang, Christina S. Leslie, Christine Mayr
Overview of study
More than half of human genes use alternative cleavage and polyadenylation (ApA)
to generate mRNA transcripts that differ in the lengths of their 3' untranslated
regions (UTRs), thus altering the post-transcriptional fate of the message and
likely the protein output. The extent of 3' UTR variation across tissues and the
functional role of ApA remain poorly understood. We developed a sequencing
method to quantitatively map the 3' ends of the transcriptome of diverse human
tissues and isogenic transformation systems. We found that cell type-specific
gene expression is accomplished by two complementary programs. Tissue-restricted
genes tend to have single 3' UTRs, whereas a majority of ubiquitously
transcribed genes generate multiple 3' UTRs. During transformation and
differentiation, single-UTR genes change their mRNA abundance levels, while
multi-UTR genes mostly change 3' UTR isoform ratios to achieve tissue
specificity. However, both regulation programs target genes that function in the
same pathways and processes that characterize the new cell type. Instead of
finding global shifts in 3' UTR length during transformation and
differentiation, we identify tissue-specific groups of multi-UTR genes that
change their 3' UTR ratios; these changes in 3' UTR length are largely
independent from changes in mRNA abundance. Finally, tissue-specific usage of
ApA sites appears to be a mechanism for changing the landscape targetable by
ubiquitously expressed microRNAs.
Data Generated
We have mapped the 3'-ends of the polyadenylated RNAs from the human
transcriptome across a a diverse set of tissues and conditions. These
data have been submitted to the SRA (project id
SRP029953)
and are included here in as alignments against hg19 in a variety of forms.
We are providing three different "views" (BAM files) over
our raw data from each tissue in order to facilitate further investigation of
our data at different levels of scrutiny. These different views are outlined
below and represent intermediary (and final steps) of the data as it is processed
through our pipeline.
The denoised (final) alignments are the set of reads that was used for the
analysis in this paper.
For each "view," we also provide unified BAM files which
include all of the reads from each experiment in a single BAM file.
These BAM files show the averaged cleavage pattern over each gene.
The reads in these unified BAM files are annotated with
@RG
(read group)
tags that indicate which experiment the read comes from.
---
layout: default
---
Views over the data
The pictures (1-3) below show how the cleavage profile is affected after
key steps in our processing pipeline. The top three lanes in each
picture is the cleavage profile in testis, skeletal muscle, and brain.
The bottom lane is the unified profile.
1. Raw alignments
|
2. Cleaned alignments
|
3. Denoised (final) alignments
|
- Raw alignments:
These are the alignments of our reads
without any further processing. This includes multi-mapped reads, as well
as reads that are results of likely artifacts of our sequencing
method described in the supplemental methods, namely (1) internally primed reads;
and (2) antisense reads
- Cleaned alignments: These files
consist of a subset of the original reads which (1) only map uniquely
to the human genome (hg19); and (2) are likely not a result of internal
priming or spurious antisense reads.
- Denoised (final) alignments:
These files are a stricter subset of
the cleaned alignments and is the set of reads that was used for the
analysis in this paper. As described in the
main manuscript, we set minimum expression and usage requirements of each cleavage
event over each gene in order for it to be considered in our analysis. We did
so in order to focus our analysis on events that were unlikely to originate from
transcriptional or sequencing "noise". Intergenic cleavage events that are greater
than 5kb downstream from annotated 3'UTRs have been removed.
Accessing the data
The datasets are made available as BAM files of read alignments and hosted
on a publicly accessible web server.
There are many ways to view BAM files over the internet. We describe how
you can use
(1) the Integrated Genomics Viewer (IGV); as well as
(2) the UCSC Genome Browser to view our data.
Accessing the data with IGV
We have made it as easy as possible to use IGV to explore our data by setting up a custom Data Server which allows you to easily pick and choose which datasets you would like to explorec
If you do not have IGV installed locally on your computer, you can launch IGV can using your web browser by selecting an appropriate memory configuration from here. If you will be loading many datasets simultaneously, you will want to pick the largest RAM configuration your computer can support
Once IGV is launched, you will want to configure it to use our Data Server to easily load remote datasets.
-
Open IGV's preference window by navigating to the View
menu and selecting Preferences....
-
Navigate to the Advanced tab in the preferences window.
-
Click on the Edit server properties checkbox and set the
Data Registry URL field to:
http://cbio.mskcc.org/public/Leslie/ApA/igv/data.registry.txt
as shown below.
-
Click the OK button on the bottom right hand side
of the preferences window to save your changes
You can now easily load our alignment (BAM) files by navigating to the
File menu and selecting Load from Server....
You will see our dataset at the top of the Available Datasets
list. You can pick which alignment tracks
(TDF tracks are coming soon)
to show by expanding the
triangles under ApA Atlas > Lianoglou et al., 2013
and selecting the samples you want to view, as shown below.
Viewing the data using the UCSC Genome Browser
Our data can be viewed using the UCSC Genome Browser by adding the URL to each
BAM file as a custom track
to your browsing session. There are
detailed instructions
on how to configure custom tracks at the UCSC Genome Browser site, but we will
recapitulate the important steps below.
- Use your web browser to navigate to
http://genome.ucsc.edu/cgi-bin/hgGateway
and ensure that you have the proper organism (Human) and assembly (GRCh37/hg19) selected.
-
Click the "add custom tracks" button
-
Copy and paste the appropriate data track URLs
(below) for the BAM files you want to view. Each data track must be separated
by a carriage return. Hit submit when you have selected all the tracks
you are interested in viewing.
-
Upon successfully uploading the track information, you will be taken to
a page that allows you to manage your "loaded" custom tracks. From here
you can jump to the genome browser, where you will now see the alignments
integrated into the region of the genome you are exploring.
Data track URLs
Listed below are the samples analyzed in this study. Click on the
raw
, cleaned
, or final
links beside
each sample to show (hide) the track information that you can copy and paste
into your "custom track manager" at the UCSC Genome Browser (step 3 above).
-
B Cells (1)
[raw |
cleaned |
final]
B Cells (1) Raw
track type="bam" name="bcells-1-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/bcells-1.bam" genome="hg19" visibility="squish"
B Cells (1) Cleaned
track type="bam" name="bcells-1-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/bcells-1.bam" genome="hg19" visibility="squish"
B Cells (1) Final
track type="bam" name="bcells-1-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/bcells-1.bam" genome="hg19" visibility="squish"
-
B Cells (2)
[raw |
cleaned |
final]
B Cells (2) Raw
track type="bam" name="bcells-2-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/bcells-2.bam" genome="hg19" visibility="squish"
B Cells (2) Cleaned
track type="bam" name="bcells-2-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/bcells-2.bam" genome="hg19" visibility="squish"
B Cells (2) Final
track type="bam" name="bcells-2-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/bcells-2.bam" genome="hg19" visibility="squish"
-
B-LCL
[raw |
cleaned |
final]
B-LCL Raw
track type="bam" name="blcl-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/blcl.bam" genome="hg19" visibility="squish"
B-LCL Cleaned
track type="bam" name="blcl-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/blcl.bam" genome="hg19" visibility="squish"
B-LCL Final
track type="bam" name="blcl-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/blcl.bam" genome="hg19" visibility="squish"
-
Brain
[raw |
cleaned |
final]
Brain Raw
track type="bam" name="brain-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/brain.bam" genome="hg19" visibility="squish"
Brain Cleaned
track type="bam" name="brain-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/brain.bam" genome="hg19" visibility="squish"
Brain Final
track type="bam" name="brain-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/brain.bam" genome="hg19" visibility="squish"
-
Breast
[raw |
cleaned |
final]
Breast Raw
track type="bam" name="breast-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/breast.bam" genome="hg19" visibility="squish"
Breast Cleaned
track type="bam" name="breast-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/breast.bam" genome="hg19" visibility="squish"
Breast Final
track type="bam" name="breast-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/breast.bam" genome="hg19" visibility="squish"
-
HEK293
[raw |
cleaned |
final]
HEK293 Raw
track type="bam" name="hek293-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/hek293.bam" genome="hg19" visibility="squish"
HEK293 Cleaned
track type="bam" name="hek293-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/hek293.bam" genome="hg19" visibility="squish"
HEK293 Final
track type="bam" name="hek293-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/hek293.bam" genome="hg19" visibility="squish"
-
HeLa
[raw |
cleaned |
final]
HeLa Raw
track type="bam" name="hela-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/hela.bam" genome="hg19" visibility="squish"
HeLa Cleaned
track type="bam" name="hela-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/hela.bam" genome="hg19" visibility="squish"
HeLa Final
track type="bam" name="hela-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/hela.bam" genome="hg19" visibility="squish"
-
hES
[raw |
cleaned |
final]
hES Raw
track type="bam" name="hES-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/hES.bam" genome="hg19" visibility="squish"
hES Cleaned
track type="bam" name="hES-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/hES.bam" genome="hg19" visibility="squish"
hES Final
track type="bam" name="hES-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/hES.bam" genome="hg19" visibility="squish"
-
MCF10A (1)
[raw |
cleaned |
final]
MCF10A (1) Raw
track type="bam" name="mcf10a-1-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/mcf10a-1.bam" genome="hg19" visibility="squish"
MCF10A (1) Cleaned
track type="bam" name="mcf10a-1-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/mcf10a-1.bam" genome="hg19" visibility="squish"
MCF10A (1) Final
track type="bam" name="mcf10a-1-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/mcf10a-1.bam" genome="hg19" visibility="squish"
-
MCF10A (2)
[raw |
cleaned |
final]
MCF10A (2) Raw
track type="bam" name="mcf10a-2-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/mcf10a-2.bam" genome="hg19" visibility="squish"
MCF10A (2) Cleaned
track type="bam" name="mcf10a-2-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/mcf10a-2.bam" genome="hg19" visibility="squish"
MCF10A (2) Final
track type="bam" name="mcf10a-2-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/mcf10a-2.bam" genome="hg19" visibility="squish"
-
MCF10A + HRAS (1)
[raw |
cleaned |
final]
MCF10A + HRAS (1) Raw
track type="bam" name="mcf10a.hras-1-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/mcf10a.hras-1.bam" genome="hg19" visibility="squish"
MCF10A + HRAS (1) Cleaned
track type="bam" name="mcf10a.hras-1-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/mcf10a.hras-1.bam" genome="hg19" visibility="squish"
MCF10A + HRAS (1) Final
track type="bam" name="mcf10a.hras-1-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/mcf10a.hras-1.bam" genome="hg19" visibility="squish"
-
MCF10A + HRAS (2)
[raw |
cleaned |
final]
MCF10A + HRAS (2) Raw
track type="bam" name="mcf10a.hras-2-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/mcf10a.hras-2.bam" genome="hg19" visibility="squish"
MCF10A + HRAS (2) Cleaned
track type="bam" name="mcf10a.hras-2-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/mcf10a.hras-2.bam" genome="hg19" visibility="squish"
MCF10A + HRAS (2) Final
track type="bam" name="mcf10a.hras-2-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/mcf10a.hras-2.bam" genome="hg19" visibility="squish"
-
MCF7
[raw |
cleaned |
final]
MCF7 Raw
track type="bam" name="mcf7-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/mcf7.bam" genome="hg19" visibility="squish"
MCF7 Cleaned
track type="bam" name="mcf7-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/mcf7.bam" genome="hg19" visibility="squish"
MCF7 Final
track type="bam" name="mcf7-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/mcf7.bam" genome="hg19" visibility="squish"
-
NTERA
[raw |
cleaned |
final]
NTERA Raw
track type="bam" name="ntera-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/ntera.bam" genome="hg19" visibility="squish"
NTERA Cleaned
track type="bam" name="ntera-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/ntera.bam" genome="hg19" visibility="squish"
NTERA Final
track type="bam" name="ntera-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/ntera.bam" genome="hg19" visibility="squish"
-
Ovary
[raw |
cleaned |
final]
Ovary Raw
track type="bam" name="ovary-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/ovary.bam" genome="hg19" visibility="squish"
Ovary Cleaned
track type="bam" name="ovary-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/ovary.bam" genome="hg19" visibility="squish"
Ovary Final
track type="bam" name="ovary-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/ovary.bam" genome="hg19" visibility="squish"
-
Skeletal muscle
[raw |
cleaned |
final]
Skeletal muscle Raw
track type="bam" name="skmuscle-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/skmuscle.bam" genome="hg19" visibility="squish"
Skeletal muscle Cleaned
track type="bam" name="skmuscle-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/skmuscle.bam" genome="hg19" visibility="squish"
Skeletal muscle Final
track type="bam" name="skmuscle-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/skmuscle.bam" genome="hg19" visibility="squish"
-
Testis
[raw |
cleaned |
final]
Testis Raw
track type="bam" name="testis-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/testis.bam" genome="hg19" visibility="squish"
Testis Cleaned
track type="bam" name="testis-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/testis.bam" genome="hg19" visibility="squish"
Testis Final
track type="bam" name="testis-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/testis.bam" genome="hg19" visibility="squish"
-
unified-atlas
[raw |
cleaned |
final]
unified-atlas Raw
track type="bam" name="unified-raw" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/raw/unified-atlas.bam" genome="hg19" visibility="squish"
unified-atlas Cleaned
track type="bam" name="unified-clean" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean/unified-atlas.bam" genome="hg19" visibility="squish"
unified-atlas Final
track type="bam" name="unified-final" bigDataUrl="http://cbio.mskcc.org/public/Leslie/ApA/atlas-lianoglou/alignments/unique/clean-final/unified-atlas.bam" genome="hg19" visibility="squish"
Change unique
to multimap
in the URLs above
to show alignments that include multimapped reads.