|
|
RepeatMap
RepeatMap is a set of resources enabling researchers to quickly
determine the uniqueness of a sequence. Some uses include:
- RNA Probe Design. Off-target effects seem to begin when there is
a 15-20 basepair region of perfect homology. RepeatMap is able to
determine exact counts so that the probability of off-target effects
is minimized.
- UCSC Genome Browser Track. Each kmer of the genome is annotated
with the exact number of times it occurs in the genome. This can then
be used to determine structures correlated with high/low repeat
counts.
- Compression. A fundamental problem in compression is knowing what
strings occur at higher frequency and then using smaller symbols for
those string. RepeatMap provides an efficient way for determining a
priori the exact number of times a kmer occurs. In essense, RepeatMap
is a similar idea as Burrow-Wheeler (BW), except we only look at kmers
whereas BW uses a full suffix tree.
Description
RepeatMap is composed of individual modules that are each meant to
enable extremely rapid repeat counting. We provide intuitive and easy
to use interfaces to each of the tools. There are currently three
parts of the RepeatMap system:
- The RepeatMap Dictionary Server creates the dictionaries with
repeats and loads these repeats into dictionaries. This "server" can
run on the same computer as the client (see below) as long as the computer
has sufficient memory (see documentation).
- The RepeatMap Client queries the RepeatMap Dictionary Server
to determine the repeat counts of strings.
- The Annotator Client queries the RepeatMap Dictionary Server to
determine the repeat counts for very long strings (e.g. chromosomes).
It then outputs the results in a file that can be viewed in the UCSC
Genome Browser.
All components can be used independently and can be tweaked by the
user for any purpose under the GPL.
|