Detecting Lateral Gene Transfer by Hierarchical Segmentation

We provide a method for segmenting genomes based on local heterogeneity/homogeniety of nucleotide acid usage. This "top-down" approach is in contrast to window-based methods. In our paper Detection of genomic islands via segmental genome heterogeneity, we show that our method is highly sensitive and specific.

The algorithms we use are based on the Markovian Jensen-Shannon Divergence (MJSD). We first recursively segment the genome using, where recursion is continued if the MJSD value is statistically significant. We then apply the MJSD to each segment and compare it to the genome. This results in a set of segments being ranked in terms of their likelihood of coming from differing probability distributions. This in turn is used as a proxy for determining presence of foreign genomic material.

Using MJSD on your data

Installation

Before installing the programs, you'll need to have the following:

  • A C++ compiler (tested with several versions of GNU)
  • Python (version 2 works, version 3 and higher may break)

Download the code in tgz or zip files. After decompressing the files, you should now have a mjsd directory. Compile the C++ code via
g++ -o so_jensen so_jensen.cpp
Assuming you have python and numeric, you are now finished with installation.


Usage

To see all options, you can run the code without any arguments. That is
./so_jensen
./assess_atypicality

To segment the sequence using 2nd order MJSD and significance of 0.99, run the following
./so_jensen -f test.fasta -s 0.99 -o 2 > test.seg

We now have a set of segments in test.seg. To compare these segments to the background sequence using 1st order MJSD, run the following:
./assess_atypicality.py --filename test --segfile test.seg --order=1


Misc

If you have troubles installing or using the program, contact aarvey (--AT--) cs.ucsd.edu.

The software is licensed under the GPL. All components can be used independently and can be tweaked by the user as allowed under the GPL.


Valid CSS! Valid HTML 4.01 Transitional

This page last modified Thursday, 30-Jul-2009 14:47:33 EDT