Rachel Kolodny

home
personal
papers
software
links



 

 

 

 

 

Global view of protein evolution

 

with Sergey Nepomnyachiy, and Nir Ben-Tal
Proc. Natl. Acad. Sci. (USA) (2014) pdf

We've been mentioned in PNAS Highlights (text from there)

Just as the elements in the periodic table can be traced back to the Big Bang, the set of all proteins in terrestrial organisms reflects the history of evolution on Earth. A global view of this so-called protein universe would help reveal how proteins evolve and are related to one another, but empirical evidence exists for relatively few relationships between proteins. Sergey Nepomnyachiy et al. applied network theory to a representative set of all known protein domains drawn from the Structural Classification of Proteins (SCOP) database. The authors represented protein space using two network configurations: a domain network in which edges connect domains the segments of which share similar sequence and structural motifs, and a motif network in which edges connect recurring motifs that lie within the same domains. The authors demonstrate how networks suggest evolutionary paths between domains and provide clues about the mechanisms of protein evolution. The findings offer an approach to representing protein space that could aid protein design, according to the authors.



 

Representation of the Protein Universe using Classifications, Maps, and Networks

 

with Nir Ben-Tal
Israel Journal of Chemistry (2014) pdf

In an issue celebrating the 2013 Chemistry Nobel prize awarded to Michael Levitt (and others, but Michael is my advisor)

A meaningful and coherent global picture of the protein universe is needed to better understand protein evolution and the underlying biophysics. We survey the studies that tackled this fundamental challenge, providing a glimpse of the protein space. A global picture represents all known local relationships among proteins, and needs to do so in a comprehensive and accurate manner. Three types of global representations can be used: classifications, maps, and networks. In these, the local relationships are derived, based on the similarity of the proteins' sequences, structures, or functions (or a combination of these). Alternatively, the local relationships can be co-occurrences of elements in the protein universe. The representations can be based on different objects: full polypeptide chains, fragments, such as structural domains, or even smaller motifs. Different protein qualities were revealed in each study; many point out the uniqueness of domains of the alpha/beta SCOP (structural classification of proteins) class



 

 

 

Redundancy-weighting for better inference of protein structural features

 

with Chen Yanover, Natalia Vanetik, Michael Levitt, and Chen Keasar
Bioinformatics (2014) pdf

In this study we explore the concept of redundancy-weighted data-sets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted data-sets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted data-sets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant data-sets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts.We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials, and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology. 



 

On the Universe of Protein Folds

with Leonid Pereyaslavets, Abraham O. Samson, and Michael Levitt
Ann. Rev. of Biophysics (2013) pdf

In the fifty years since the first atomic structure of a protein was revealed, tens of thousands of additional structures have been solved. Like all objects in biology, proteins structures show common patterns that seem to define family relationships. Classification of proteins structures, which started in the 1970s with about a dozen structures, has continued with increasing enthusiasm, leading to two main fold classifications, SCOP and CATH, as well as many additional databases. Classification is complicated by deciding what constitutes a domain, the fundamental unit of structure. Also difficult is deciding when two given structures are similar. Like all of biology, fold classification is beset by exceptions to all rules. Thus, the perspectives of protein fold space that the fold classifications offer differ from each other. In spite of these ambiguities, fold classifications are useful for prediction of structure and function. Studying the characteristics of fold space can shed light on protein evolution and the physical laws that govern protein behavior.



 

From Protein Structure to Function via Computational Tools and Approaches

 

with Mickey Kosloff
Isr. J. Chem. (2013)pdf

The 3D structures of proteins are often considered fundamental for understanding their function. Yet, because of the complexity of protein structure, extracting specific functional information from structures can be a considerable challenge. Here, we present selected approaches and tools that were developed in the Kolodny and Kosloff labs to study and connect protein sequence, structure, and function spaces. 



 

Maps of protein structure space reveal a fundumental relationship between protein structure and function

 

with Margarita Osadchy
Proc. Natl. Acad. Sci. (2011) 108(30):12301-6 pdf

We've been mentioned in f1000

We propose a new method to efficiently create three-dimensional maps of structure space using a very large data set of > 30,000 SCOP domains.  In our maps, each domain is represented by a point, and the distance between any two points approximates the structural distance between their corresponding domains.  We use these maps to study the spatial distributions of properties of proteins, and in particular those of local vicinities in structure space such as structural density and functional diversity.  These maps provide a novel broad view of protein space, and thus reveal new fundamental properties thereof.  At the same time, the maps are consistent with previous knowledge (e.g., domains cluster by their SCOP class), and organize in a unified, coherent representation previous observation concerning specific protein folds.  To investigate the function-structure relationship, we measure the functional diversity (using the Gene Ontology controlled vocabulary) in local structural vicinities.  Our most striking finding is that functional diversity varies considerably across structure space: the space has a highly diverse region, and diversity abates when moving away from it.  Interestingly, the domains in this region are mostly alpha/beta structures, which are known to be the most ancient proteins. 



 

A library of protein surface patches discriminates between native structures and decoys generated by structure prediction servers

with Roi Gamliel, Klara Kedem, and Chen Keasar  
BMC Structural Biology (2011) 11:20. online version

 

 

 

FragBag, a "bag-of-words" representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately

with Inbal Budowski-Tal and Yuval Nov
Proc. Natl. Acad. Sci.(2010) 107: 3481-3486 pdf ,web-page

In FragBag, we describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments (using our libraries described below). Then, we represent the protein as a ‘bags-of-fragments’ – a vector that counts the number of occurrences of each fragment – and measure the similarity between two structures by the similarity between their vectors.

We use ROC curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state-of-the-art structural aligners (the same data set from our comparison study). Our best FragBag library finds more accurate candidate sets than three other filter methods: SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted, structural aligners STRUCTAL and CE .  

 

 

 

 

Sequence-Similar, Structure-dissimilar protein pairs in the PDB

with Mickey Kosloff
Proteins: Structure, Function, and Bioinformatics (2007) 71(2): 891-902 pdf, database

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. This assumption underlies many computational studies and structure prediction methods.

Here, we compare sequence-based structural superpositions and geometry-based structural alignments and show that the former provides a better measure of structure dissimilarity. Using sequence-based structural superpositioning we find many examples in the PDB where two proteins that are similar in sequence have structures that differ significantly from one another, usually in direct relation to their function. We conclude that the assumption of two proteins with similar sequences having similar structures is often incorrect and can lead to the loss of structurally and functionally important information.

 

 

VISTAL - A two-dimensional visualization tool for structural alignments

with Barry Honig
Bioinformatics (2006) 22(17): 2166-2167 pdf, software

VISTAL describes structures as a series of secondary structure elements, and places matched residues one on top of each other colored according to the three-dimensional distance of their Ca atoms.

 

Using an Alignment of Fragment Strings for Comparing Protein Structures.

with Iddo Friedberg, Tim Harder, Einat Sitbon, Zhanwen Li, and Adam Godzik
Bioinformatics (2006) 23(2): e219-e224 pdf

This work by Iddo and Tim, compares protein structures that are described via strings of fragments from our libraries.

 

 

Protein Structure Comparison: Implications for the Nature of 'Fold Space', and Structure and Function Prediction.

with Donald Petrey and Barry Honig
Curr. Opin. Struct. Bio. (2006) 16: 393-398 pdf

We argue in favor of viewing protein structure space as continuous, with potential structural similarities between any pair of structures. This is different from the traditional perspective in which a structure is in a particular group (denoted fold) and only other structures within that fold are considered as its structural neighbors. We survey recent progress made in the prediction of protein structure and function by relying on these relationships.

 

Faster Algorithms for Optimal Multiple Sequence Alignment based on Pairwise Comparisons.

with Pankaj K. Agarwal and Yonatan Bilu
Lecture Notes in Computer Science (WABI 2005) 3692: 315-327 2005. pdf, online material

We consider the following version of the Multiple Sequence Alignment (MSA) problem: In a preprocessing stage pairwise alignments are found for every pair of sequences. The goal is to find an optimal alignment in which matches are restricted to positions that were matched at the preprocessing stage. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. In our formulation the MSA must conform with pairwise (local) alignments, and in return can be solved more efficiently. We prove that it suffices to find an optimal alignment of sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time.

 

Comprehensive Evaluation of Protein Structure Alignment: Scoring by Geometric Measures.

with Patrice Koehl and Michael Levitt
J. Mol. Biol. (2005) 346, 1173-1188. pdf, online material

We report a comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2,930 sequence diverse protein domains. We follow the traditional path and rely on a gold standard (the CATH classification) and compare the rates of true and false positives using ROC curves. However, due to limitations of this methodology, we also compare the alignments directly, using geometric match measures.

 

Inverse Kinematics in Biology: The Protein Loop Closure Problem.

with Leonidas Guibas, Michael Levitt and Patrice Koehl
Int. Jour. Robotics Research. (2005) 24, 151-162. pdf

We address an inverse kinematics problem in structural biology: the loop closure problem. We describe a procedure for generating the conformations of candidate loops that fit in a gap in a protein structure framework. Our method concatenates small fragments of protein from small libraries of representative fragments. Our approach has the advantages of ab initio methods since we are able to enumerate all candidate loops in the discrete approximation of the conformational space accessible to the loop, as well as the advantages of database search approach since the use of fragments of known protein structures guarantees that the backbone conformations are physically reasonable.

 

Approximate Protein Structural Alignment in Polynomial Time.

with Nathan Linial
Proc. Natl. Acad. Sci., (2004) 101 (33), 12201-12206.
pdf, online material

Protein structural alignment is a fundamental problem in computational structural biology. Here, we study it as a family of optimization problem and provide a polynomial time algorithm to solve them. We also show an NP-hardness proof of an alternative approach to this problem using internal distance matrices. Lastly, we visualize the scoring function for several pairs of structures.

 

Protein Decoy Assembly Using Short Fragments Under Geometric Constraints.

with Michael Levitt
Biopolymers, (2003) 68, 278-285. pdf


We use the libraries of fragments described below to generate decoys for several proteins. Coupled with a descriminating energy function, decoys are useful for predicting protein structure. It seems that this method works well for all alpha proteins.

 

Small Libraries of Protein Fragments Model Native Protein Structures Accurately.

with Patrice Koehl, Leonidas Guibas and Michael Levitt
J. Mol. Biol. (2002) 323, 297-307. pdf, online material


We study efficient means of modeling protein structure. Our model concatenates elements from libraries of commonly observed protein backbone fragments into approximate structures. There are no additional degrees of freedom so a string of fragment labels fully defines a three-dimensional structure; the set of all strings defines the set of structures (of a given length). By varying the size of the library and the length of its fragments, we generate structure sets of different resolution. With larger libraries, the approximations are better, but we get good fits to real proteins (less than 1A) with less than 5 states per residue.