We strongly believe in the importance of reproducible computational results and in open source software. We make all our software tools freely available for research, education, and non-profit use as soon as they are ready for public release, and we provide source code wherever possible. In a few cases, our code depends on commercial software packages or proprietary code of collaborators; in these situations, we can usually still provide our portion of the source code.

  • MEDUSA

    MEDUSA

    MEDUSA is an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting.

  • Rankprop

    RankProp

    Rankprop is a publicly available web server that can be used to search for similar proteins from a query protein sequence. At its core, Rankprop is a ranking algorithm that exploits the global network structure of similarity relationships among proteins in a database by performing a diffusion operation on a protein similarity network with weighted edges.

  • SVM-Fold

    SVM-Fold

    SVM-Fold is a publicly available web server that uses SVMs to predict family, superfamily, and fold-level classifications for a query protein sequence based on the Structural Classification of Proteins (SCOP).

    SVM-Fold detects subtle protein sequence similarities by learning from all available annotated proteins, as well as utilizing potential hits as identified by PSI-BLAST. Predictions of classes of proteins that do not have any known example with a significant pairwise PSI-BLAST E-value can still be found using SVMs.

  • String Kernels

    String Kernels represents the pioneering work our lab developed in the use of "k-mer" based string kernels for support vector machine classification of protein sequences into structural categories. These novel and efficient-to-compute string kernels incorporate biologically motivated notions of inexact string matching, based on shared approximate occurrences of short subsequences ("k-mers"). More recently, we introduced profile kernels, which leverage evolutionary information in the form of sequence "profiles" estimated from multiple alignments, which achieve state-of-the-art performance for remote homology detection.