|
Rachel Kolodny
home
personal
papers
software
links
|
|
|
|
|
|
|

|
|
Maps of protein structure space reveal a fundumental relationship between protein structure and function
with Margarita
Osadchy
Proc. Natl. Acad. Sci. (2011) 108(30):12301-6 pdf
We've been mentioned in f1000
We propose a new method to
efficiently create three-dimensional maps of structure space using a very
large data set of > 30,000 SCOP domains.
In our maps, each domain is represented by a point, and the distance
between any two points approximates the structural distance between their
corresponding domains. We use these
maps to study the spatial distributions of properties of proteins, and in
particular those of local vicinities in structure space such as structural
density and functional diversity. These
maps provide a novel broad view of protein space, and thus reveal new fundamental
properties thereof. At the same
time, the maps are consistent with previous knowledge (e.g., domains
cluster by their SCOP class), and organize in a unified, coherent
representation previous observation concerning specific protein folds. To investigate the function-structure
relationship, we measure the functional diversity (using the Gene Ontology
controlled vocabulary) in local structural vicinities. Our most striking finding is that functional
diversity varies considerably across structure space: the space has a
highly diverse region, and diversity abates when moving away from it. Interestingly, the domains in this region
are mostly alpha/beta structures, which are known to be the most ancient
proteins.
|
|
|
A library of protein
surface patches discriminates between native structures and decoys
generated by structure prediction servers
with Roi Gamliel, Klara Kedem, and Chen Keasar
BMC Structural Biology
(2011) 11:20.
online version
|
|

|
|
FragBag, a "bag-of-words"
representation of protein structure, retrieves structural neighbors from
the entire PDB quickly and accurately
with Inbal Budowski-Tal and Yuval
Nov
Proc. Natl. Acad. Sci.(2010) 107:
3481-3486 pdf ,web-page
In FragBag, we describe a protein structure by
the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments (using
our libraries described below). Then, we represent
the protein as a ‘bags-of-fragments’ – a vector that counts the number of
occurrences of each fragment – and measure the similarity between two
structures by the similarity between their vectors.
We use ROC curve analysis to
quantify the success of FragBag in identifying
neighbor candidate sets in a dataset of over 2,900 structures. The gold
standard is the set of neighbors found by six state-of-the-art structural
aligners (the same data set from our comparison study).
Our best FragBag library finds more accurate
candidate sets than three other filter methods: SGM, PRIDE, and a method by
Zotenko et
al. More interestingly, FragBag
performs on a par with the computationally expensive, yet highly trusted,
structural aligners STRUCTAL and CE .
|
|
|
|
|
|

|
|
Sequence-Similar,
Structure-dissimilar protein pairs in the PDB
with Mickey Kosloff
Proteins: Structure,
Function, and Bioinformatics (2007) 71(2): 891-902 pdf, database
It is often assumed that in the Protein Data Bank (PDB), two proteins with
similar sequences will also have similar structures. This assumption
underlies many computational studies and structure prediction methods.
Here, we compare sequence-based structural superpositions
and geometry-based structural alignments and show that the former provides
a better measure of structure dissimilarity. Using sequence-based
structural superpositioning we find many examples
in the PDB where two proteins that are similar in sequence have structures
that differ significantly from one another, usually in direct relation to
their function. We conclude that the assumption of two proteins with
similar sequences having similar structures is often incorrect and can lead
to the loss of structurally and functionally important information.
|
|
|
|
|
|

|
|
VISTAL - A
two-dimensional visualization tool for structural alignments
with Barry Honig
Bioinformatics
(2006) 22(17):
2166-2167 pdf, software
VISTAL describes structures as a series of secondary structure elements,
and places matched residues one on top of each other colored according to
the three-dimensional distance of their Ca atoms.
|
|
|
|
|

|
|
Using an Alignment
of Fragment Strings for Comparing Protein Structures.
with Iddo Friedberg, Tim Harder, Einat
Sitbon, Zhanwen Li, and
Adam Godzik
Bioinformatics (2006)
23(2):
e219-e224 pdf
This
work by Iddo and Tim,
compares protein structures that are described via strings of fragments
from our libraries.
|
|
|
|
|
|
|
Protein Structure
Comparison: Implications for the Nature of 'Fold Space', and Structure and
Function Prediction.
with Donald Petrey
and Barry Honig
Curr. Opin.
Struct. Bio. (2006) 16: 393-398 pdf
We argue in favor of viewing protein structure space as continuous, with
potential structural similarities between any pair of structures. This is
different from the traditional perspective in which a structure is in a
particular group (denoted fold) and only other structures within that fold
are considered as its structural neighbors. We survey recent progress made
in the prediction of protein structure and function by relying on these
relationships.
|
|
|
|
|

|
|
Faster Algorithms for
Optimal Multiple Sequence Alignment based on Pairwise
Comparisons.
with Pankaj
K. Agarwal and Yonatan Bilu
Lecture Notes in Computer Science (WABI 2005) 3692:
315-327 2005. pdf, online material
We consider the following version of the Multiple Sequence Alignment (MSA)
problem: In a preprocessing stage pairwise
alignments are found for every pair of sequences. The goal is to find an
optimal alignment in which matches are restricted to positions that were
matched at the preprocessing stage. We present several techniques for
making the dynamic programming algorithm more efficient, while still
finding an optimal
solution under these restrictions. In our formulation the MSA must conform
with pairwise (local) alignments, and in return
can be solved more efficiently. We prove that it suffices to find an
optimal alignment of sequence segments, rather than single letters, thereby
reducing the input size and thus improving the running time.
|
|
|
|
|

|
|
Comprehensive
Evaluation of Protein Structure Alignment: Scoring by Geometric Measures.
with Patrice Koehl
and Michael Levitt
J. Mol. Biol.
(2005) 346,
1173-1188. pdf, online material
We report a comprehensive comparison of protein structural alignment
methods. Specifically, we evaluate six publicly available structure
alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning
all 8,581,970 protein structure pairs in a test set of 2,930 sequence
diverse protein domains. We follow the traditional path and rely on a gold
standard (the CATH classification) and compare the rates of true and false
positives using ROC curves. However, due to limitations of this
methodology, we also compare the alignments directly, using geometric match
measures.
|
|
|
|
|

|
|
Inverse Kinematics
in Biology: The Protein Loop Closure Problem.
with Leonidas
Guibas, Michael Levitt and Patrice Koehl
Int. Jour. Robotics
Research. (2005) 24,
151-162. pdf
We address an inverse kinematics problem in structural biology: the loop
closure problem. We describe a procedure for generating the conformations
of candidate loops that fit in a gap in a protein structure framework. Our
method concatenates small fragments of protein from small libraries of
representative fragments. Our approach has the advantages of ab initio methods
since we are able to enumerate all candidate loops in the discrete
approximation of the conformational space accessible to the loop, as well
as the advantages of database search approach since the use of fragments of
known protein structures guarantees that the backbone conformations are
physically reasonable.
|
|
|
|
|

|
|
Approximate Protein
Structural Alignment in Polynomial Time.
with Nathan Linial
Proc. Natl. Acad. Sci.,
(2004) 101 (33),
12201-12206.
pdf, online
material
Protein structural alignment is a fundamental problem in computational
structural biology. Here, we study it as a family of optimization problem
and provide a polynomial time algorithm to solve them. We also show an
NP-hardness proof of an alternative approach to this problem using internal
distance matrices. Lastly, we visualize the scoring function for several
pairs of structures.
|
|
|
|
|

|
|
Protein Decoy
Assembly Using Short Fragments Under Geometric Constraints.
with Michael Levitt
Biopolymers,
(2003) 68,
278-285. pdf
We use the libraries of fragments described below to generate decoys for
several proteins. Coupled with a descriminating
energy function, decoys are useful for predicting protein structure. It
seems that this method works well for all alpha proteins.
|
|
|
|
|

|
|
Small Libraries of
Protein Fragments Model Native Protein Structures Accurately.
with Patrice Koehl,
Leonidas Guibas and
Michael Levitt
J. Mol. Biol.
(2002) 323,
297-307. pdf, online material
We study efficient means of modeling protein structure. Our model
concatenates elements from libraries of commonly observed protein backbone
fragments into approximate structures. There are no additional degrees of
freedom so a string of fragment labels fully defines a three-dimensional
structure; the set of all strings defines the set of structures (of a given
length). By varying the size of the library and the length of its
fragments, we generate structure sets of different resolution. With larger
libraries, the approximations are better, but we get good fits to real
proteins (less than 1A) with less than 5 states per residue.
|
|
|
|
|
|