March 17, Wednesday 14:15, Room 303, Jacobs

FragBag – a method for representing protein structures as a bag-of-words of backbone fragments for fast and accurate filtering of near structural neighbors

Lecturer : Inbal Budowski-Tal

Lecturer homepage : http://cs.haifa.ac.il/~ibudowsk/

Affiliation : CS dept, Haifa University


Abstract:

Proteins are large, complex molecules that play many critical roles in the body. Proteins are made up of hundreds or thousands of smaller units called amino acids, which are attached to one another in long chains. There are 20 different types of amino acids that can be combined to make a protein. The sequence of amino acids determines each protein’s unique 3-dimensional structure and its specific function.

Scientists often need to quickly identify proteins that are structurally similar to a given protein, for example, in protein structure and function prediction.  This is a difficult task, which is further complicated by the rapid expansion of the Protein Databank (PDB).  Our study suggests FragBag - a concise representation of the protein backbone as a bag-of-words of short backbone segments, for rapidly measuring the similarity between protein structures.  FragBag is designed to serve as a filter which quickly finds a small set of candidate structural neighbors; then, one can use a computationally expensive state-of-the-art structural alignment method on this small set, to identify and align the closest structure.

Our analysis shows that FragBag performs in the range of the computationally expensive and highly trusted structural alignment methods. Of course, it is much faster: comparing vectors is orders of magnitudes faster than calculating structural alignment of two structures.

This research was conducted together with Dr. Rachel Kolodny (Department of CS at the University of Haifa) and Dr. Yuval Nov (Department of Statistics at the University of Haifa), and its following paper is forthcoming in Proceedings of the National Academy of Sciences (PNAS).