Generally the best way to reach me
 Gary Bader, Ph.D. Sander Lab
    Imagine you are confronted one day by a pile of hundreds of tiny metal gears, springs, screws and such. Could you tell by looking at that pile what could be assembled from it? Now imagine that different people across the planet have bits of information on putting these parts together. Someone from Beijing can tell you that gear A is attached to spring B and someone from Vancouver can tell you that spring B connects to flywheel C. You somehow manage to collect this information in one place to create a plan and put the parts together accordingly. Surprisingly, what you have put together turns out to be an intricate Swiss watch with a mechanism that can be wound up and set in motion to tell time.

    Now imagine that you are given a list of parts for a person and want to know how an estimated 100 trillion cells in the human body function over a lifetime. The types of things that are in such a human parts list include biomolecules like DNA, RNA, proteins and small molecules (such as vitamins, fats and sugars). We are at this stage right now in biology. The human genome project has provided us with a large number of parts, but we don’t know how the parts fit together, how the biomolecules interact.

    Finding and understanding this information is important, as biomolecules interact inside us and arrange themselves into intricate networks and pathways that control all aspects of a cell’s function. Metabolic pathways are like assembly lines that, for instance, make new parts so the cell can grow. Signal transduction pathways are like electrical networks that control the clockwork of the cell, for instance to make sure it doesn’t grow too fast, as in cancer. Understanding the cell on the level of biomolecular interactions will allow us to further understand how we work, how diseases arise and how to develop effective cures.

    Thanks to more sensitive and robust technologies, such as mass spectrometry, scientists around the world are finding out, at an increasing rate, how the parts of the cell fit together. DNA sequence information from the genome project automatically goes into a public, international database for all to use. However, up until now, biomolecular interaction data has been mainly stored in scientific journals, which are generally hard to access. We need to design and build computer systems to collect biomolecular interaction information in one place so we can easily analyze it to better understand ourselves.

    Some of these systems have already been built. The Biomolecular Interaction Network Database (BIND) can describe the biology of the cellular biomolecular interaction network in great detail - right down to an atomic level, if necessary. This attention to detail will allow researchers to mine the data, using tools that are being developed, for knowledge and patterns that have not previously been noticed. It will also help us move towards such things as visualizing the cell over time and in-depth computational modeling of biochemical pathways for the purposes of in-silico drug design.

    To achieve these objectives, the databases we build must be filled with data to become real knowledge resources. Tens of thousands of known molecular interactions must be input into the databases from the scientific literature and to assure quality, they must be entered by people - although this process will be helped along by new computer algorithms that can extract relevant information from journal abstracts. All the while, our databases must keep up with the current flood of data from high-throughput interaction finding experiments. It is my hope that along with a funded curation effort, the scientific community at large pitches in to create such a great resource.

    Science will continue to generate more and more information about the biology of the cell. By seamlessly integrating this information so that it is made freely available to researchers, and by the continuing development of bioinformatics tools to help study this fundamental data, we are right on track to finally see our own true blueprint.

Memorial Sloan-Kettering Cancer Center (MSKCC)
Computational Biology Lab (a.k.a. cBio)