Supervised learning of oncogenic pathway signatures

Put your John Hancock right here ...

I initially helped create this tutorial (along with my adviser, Christina Leslie, and my lab-mate Xuejing Li) for a lab session that was presented at the Integrative Statistical Analysis of Genome Scale Data (2008) workshop at Cold Spring Harbor Labs. The goal of this lab was to use SVMs to build differnet classifiers that can be used to extract oncogenic pathway signatures from microarray expression data. The signatures are then used to classify different types of cancer in several mouse models.

All code is written in R and utilizes several bioconductor packages.

I have since added more details that were not originally relevant for the lab in order to fulfill the requirements for a class project in Jason Banfelder's Quantitative Understanding in Biology course.

I'm posting it here in hopes that people may find it useful and educational. Topics covered include:

Although it is mentioned in the documents, I should say here that I do not intend for this to be a rigorous presentation of any of the aforementioned topics. I rather intend for the material to be presented in a fashion that provides the reader with an intuitive sense of what these techniques do and how they might be useful.

Please contact me with any question/comments/suggestions you might have.

The Goods

Miscellany

I think I've stressed the point enough in the manuscript, but just in case: the mathematical presentation of some of the concepts here (like the SVM) are not exactly correct and are only presented in a manner that I think provides intuition for pedagogical purposes.

I have references in the manuscript to resources one should read to get a complete and correct overview of the concepts presented

Lastly, I've left out a section in the preliminaries that I had initially inteded to write providing an overview of principal comonents analysis. If you are after a light and intuitive explanation of this technique, there is a great tutorial that you can find here (PDF).

Last updated on: July 1, 2008