Publications of Andras Szilagyi

(Click here for the printable version.)

#Authors, title, sourceLinksImpact factoryearTimes cited (WoS)
1Szilagyi A, Zavodszky P (1995):
Structural basis for the extreme thermostability of D-glyceraldehyde-3-phosphate dehydrogenase from Thermotoga maritima: analysis based on homology modelling.
Protein Engineering, 8(8), 779-89. doi: 10.1093/protein/8.8.779
Abstract

D-Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from a hyperthermophilic eubacterium, Thermotoga maritima, is remarkably heat stable (Tm = 109 degrees C). In this work, we have applied homology modelling to predict the 3-D structure of Th.maritima GAPDH to reveal the structural basis of thermostability. Three known GAPDH structures were used as reference proteins. First, the rough model of one subunit was constructed using the identified structurally conserved and variable regions of the reference proteins. The holoenzyme was assembled from four subunits and the NAD molecules. The structure was refined by energy minimization and molecular dynamics simulated annealing. No errors were detected in the refined model using the 3-D profile method. The model was compared with the structure of Bacillus stearothermophilus GAPDH to identify structural details underlying the increased thermostability. In all, 12 extra ion pairs per subunit were found at the protein surface. This seems to be the most important factor responsible for thermostability. Differences in the non-specific interactions, including hydration effects, were also found. Minor changes were detected in the secondary structure. The model predicts that a slight increase in alpha-helical propensities and helix-dipole interactions also contribute to increased stability, but to a lesser degree.

PubMed
PDF
3.605199531
2Magyar C, Szilagyi A, Zavodszky P (1996):
Relationship between thermal stability and 3-D structure in a homology model of 3-isopropylmalate dehydrogenase from Escherichia coli.
Protein Engineering, 9(8), 663-70. doi: 10.1093/protein/9.8.663
Abstract

To reveal the structural basis of the increased thermal stability of 3-isopropylmalate dehydrogenase (IPMDH) from Thermus thermophilus, an extreme thermophile, the homology-based structural model of one mesophilic (Escherichia coli) counterpart, was constructed. Both IPMDHs are homodimeric proteins. We built a model of one subunit using the 3-D structures of the Th. thermophilus IPMDH and the homologous E.coli isocitrate dehydrogenase. Energy minimization and molecular dynamics simulated annealing were performed on the dimer, including a surrounding solvation shell. No serious errors were detected in the refined model using the 3-D profile method. The resulting structure was scrutinized and compared with the structure of the Th.thermophilus IPMDH. Significant differences were found in the non-specific interactions including the hydrophobic effect. The model predicts a higher number of ion pairs in the Th.thermophilus than in the E.coli enzyme. An increase was observed in the stabilities of alpha-helical regions in the thermophilic protein. The preliminary X-ray coordinates of the E.coli IPMDH were received after the completion of this work, allowing an assessment of the model in terms of the X-ray structure. The comparison proved that most of the structural features underlying the stability differences between the two enzymes were predicted correctly.

PubMed
PDF
1.975199611
3Wallon G, Lovett ST, Magyar C, Svingor A, Szilagyi A, Zavodszky P, Ringe D, Petsko GA (1997):
Sequence and homology model of 3-isopropylmalate dehydrogenase from the psychrotrophic bacterium Vibrio sp. I5 suggest reasons for thermal instability.
Protein Engineering, 10(6), 665-72. doi: 10.1093/protein/10.6.665
Abstract

The leuB gene from the psychrotrophic strain Vibrio sp. I5 has been cloned and sequenced. The gene codes for 3-isopropylmalate dehydrogenase, a 360-residue, dimeric enzyme involved in the biosynthesis of leucine. Three recently solved homologous isopropylmalate dehydrogenase (IPMDH) crystal structures from thermophilic and mesophilic organisms have been used to build a homology model for the psychrotrophic IPMDH and to deduce the possible structural reasons for its decreased thermostability. According to our model the psychrotrophic IPMDH contains fewer stabilizing interactions than its mesophilic and thermophilic counterparts. Elements that have been identified as destabilizing in the comparison of the psychrotrophic, mesophilic and thermophilic IPMDHs are a smaller number of salt-bridges, a reduction in aromatic-aromatic interactions, fewer proline residues and longer surface loops. In addition, there are a number of substitutions of otherwise strictly conserved residues that can be linked to thermostability.

PubMed
PDF
1.631199738
4Nemeth A, Svingor A, Pocsik M, Dobo J, Magyar C, Szilagyi A, Gal P, Zavodszky P (2000):
Mirror image mutations reveal the significance of an intersubunit ion cluster in the stability of 3-isopropylmalate dehydrogenase.
FEBS Letters, 468(1), 48-52. doi: 10.1016/S0014-5793(00)01190-X
Abstract

The comparison of the three-dimensional structures of thermophilic (Thermus thermophilus) and mesophilic (Escherichia coli) 3-isopropylmalate dehydrogenases (IPMDH, EC 1.1.1.85) suggested that the existence of extra ion pairs in the thermophilic enzyme found in the intersubunit region may be an important factor for thermostability. As a test of our assumption, glutamine 200 in the E. coli enzyme was turned into glutamate (Q200E mutant) to mimic the thermophilic enzyme at this site by creating an intersubunit ion pair which can join existing ion clusters. At the same site in the thermophilic enzyme we changed glutamate 190 into glutamine (E190Q), hereby removing the corresponding ion pair. These single amino acid replacements resulted in increased thermostability of the mesophilic and decreased thermostability of the thermophilic enzyme, as measured by spectropolarimetry and differential scanning microcalorimetry.

PubMed
PDF
3.440200013
5Szilagyi A, Zavodszky P (2000):
Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey.
Structure with Folding & Design, 8(5), 493-504. doi: 10.1016/S0969-2126(00)00133-7
Abstract

BACKGROUND: Proteins from thermophilic organisms usually show high intrinsic thermal stability but have structures that are very similar to their mesophilic homologues. From prevous studies it is difficult to draw general conclusions about the structural features underlying the increased thermal stability of thermophilic proteins. RESULTS: In order to reveal the general evolutionary strategy for changing the heat stability of proteins, a non-redundant data set was compiled comprising all high-quality structures of thermophilic proteins and their mesophilic homologues from the Protein Data Bank. The selection (quality) criteria were met by 64 mesophilic and 29 thermophilic protein subunits, representing 25 protein families. From the atomic coordinates, 13 structural parameters were calculated, compared and evaluated using statistical methods. This study is distinguished from earlier ones by the strict quality control of the structures used and the size of the data set. CONCLUSIONS: Different protein families adapt to higher temperatures by different sets of structural devices. Regarding the structural parameters, the only generally observed rule is an increase in the number of ion pairs with increasing growth temperature. Other parameters show just a trend, whereas the number of hydrogen bonds and the polarity of buried surfaces exhibit no clear-cut tendency to change with growth temperature. Proteins from extreme thermophiles are stabilized in different ways to moderately thermophilic ones. The preferences of these two groups are different with regards to the number of ion pairs, the number of cavities, the polarity of exposed surface and the secondary structural composition.

PubMed
PDF
Suppl.mat.
6.6812000426
6Nemeth A, Kamondi Sz, Szilagyi A, Magyar C, Kovari Z, Zavodszky P (2002):
Increasing the thermal stability of cellulase C using rules learned from thermophilic proteins: a pilot study.
Biophysical Chemistry, 96(2-3), 229-41. doi: 10.1016/S0301-4622(02)00027-3
Abstract

Some structural features underlying the increased thermostability of enzymes from thermophilic organisms relative to their homologues from mesophiles are known from earlier studies. We used cellulase C from Clostridium thermocellum to test whether thermostability can be increased by mutations designed using rules learned from thermophilic proteins. Cellulase C has a TIM barrel fold with an additional helical subdomain. We designed and produced a number of mutants with the aim to increase its thermostability. Five mutants were designed to create new electrostatic interactions. They all retained catalytic activity but exhibited decreased thermostability relative to the wild-type enzyme. Here, the stabilizing contributions are obviously smaller than the destabilization caused by the introduction of the new side chains. In another mutant, the small helical subdomain was deleted. This mutant lost activity but its melting point was only 3 degrees C lower than that of the wild-type enzyme, which suggests that the subdomain is an independent folding unit and is important for catalytic function. A double mutant was designed to introduce a new disulfide bridge into the enzyme. This mutant is active and has an increased stability (deltaT(m)=3 degrees C, delta(deltaG(u))=1.73 kcal/mol) relative to the wild-type enzyme. Reduction of the disulfide bridge results in destabilization and an altered thermal denaturation behavior. We conclude that rules learned from thermophilic proteins cannot be used in a straightforward way to increase the thermostability of a protein. Creating a crosslink such as a disulfide bond is a relatively sure-fire method but the stabilization may be smaller than calculated due to coupled destabilizing effects.

PubMed
PDF
1.494200217
7Szilagyi A, Kovacs LK, Rakhely G, Zavodszky P (2002):
Homology modelling reveals the structural background of the striking difference in thermal stability between two related [NiFe]hydrogenases.
Journal of Molecular Modeling, 8(2), 58-64. doi: 10.1007/s00894-001-0071-8
Abstract

Hydrogenases are redox metalloenzymes in bacteria that catalyze the uptake or production of molecular hydrogen. Two homologous nickel-iron hydrogenases, HupSL and HydSL from the photosynthetic purple sulfur bacterium Thiocapsa roseopersicina, differ substantially in their thermal stabilities despite the high sequence similarity between them. The optimum temperature of HydSL activity is estimated to be at least 50 degrees C higher than that of HupSL. In this work, homology models of both proteins were constructed and analyzed for a number of structural properties. The comparison of the models reveals that the higher stability of HydSL can be attributed to increased inter-subunit electrostatic interactions: the homology models reliably predict that HydSL contains at least five more inter-subunit ion pairs than HupSL. The subunit interface of HydSL is more polar than that of HupSL, and it contains a few extra inter-subunit hydrogen bonds. A more optimized cavity system and amino acid replacements resulting in increased conformational rigidity may also contribute to the higher stability of HydSL. The results are in accord with the general observation that with increasing temperature, the role of electrostatic interactions in protein stability increases.

PubMed
PDF
Suppl.mat.
1.23520029
8Skolnick J, Zhang Y, Arakaki AK, Kolinski A, Boniecki M, Szilagyi A, Kihara D (2003):
TOUCHSTONE: a unified approach to protein structure prediction.
Proteins, 53(S6), 469-79. doi: 10.1002/prot.10551
Abstract

We have applied the TOUCHSTONE structure prediction algorithm that spans the range from homology modeling to ab initio folding to all protein targets in CASP5. Using our threading algorithm PROSPECTOR that does not utilize input from metaservers, one threads against a representative set of PDB templates. If a template is significantly hit, Generalized Comparative Modeling designed to span the range from closely to distantly related proteins from the template is done. This involves freezing the aligned regions and relaxing the remaining structure to accommodate insertions or deletions with respect to the template. For all targets, consensus predicted side chain contacts from at least weakly threading templates are pooled and incorporated into ab initio folding. Often, TOUCHSTONE performs well in the CM to FR categories, with PROSPECTOR showing significant ability to identify analogous templates. When ab initio folding is done, frequently the best models are closer to the native state than the initial template. Among the particularly good predictions are T0130 in the CM/FR category, T0138 in the FR(H) category, T0135 in the FR(A) category, T0170 in the FR/NF category and T0181 in the NF category. Improvements in the approach are needed in the FR/NF and NF categories. Nevertheless, TOUCHSTONE was one of the best performing algorithms over all categories in CASP5.

PubMed
PDF
Suppl.mat.
4.313200357
9Tompa P, Buzder-Lantos P, Tantos A, Farkas A, Szilagyi A, Banoczi Z, Hudecz F, Friedrich P (2004):
On the sequential determinants of calpain cleavage.
Journal of Biological Chemistry, 279(20), 20775-85. doi: 10.1074/jbc.M313873200
Abstract

The structural clues of substrate recognition by calpain are incompletely understood. In this study, 106 cleavage sites in substrate proteins compiled from the literature have been analyzed to dissect the signal for calpain cleavage and also to enable the design of an ideal calpain substrate and interfere with calpain action via site-directed mutagenesis. In general, our data underline the importance of the primary structure of the substrate around the scissile bond in the recognition process. Significant amino acid preferences were found to extend over 11 residues around the scissile bond, from P(4) to P(7)'. In compliance with earlier data, preferred residues in the P(2) position are Leu, Thr, and Val, and in P(1) Lys, Tyr, and Arg. In position P(1) ', small hydrophilic residues, Ser and to a lesser extent Thr and Ala, occur most often. Pro dominates the region flanking the P(2)-P(1)' segment, i.e. positions P(3) and P(2)'-P(4)'; most notable is its occurrence 5.59 times above chance in P(3)'. Intriguingly, the segment C-terminal to the cleavage site resembles the consensus inhibitory region of calpastatin, the specific inhibitor of the enzyme. Further, the position of the scissile bond correlates with certain sequential attributes, such as secondary structure and PEST score, which, along with the amino acid preferences, suggests that calpain cleaves within rather disordered segments of proteins. The amino acid preferences were confirmed by site-directed mutagenesis of the autolysis sites of Drosophila calpain B; when amino acids at key positions were changed to less preferred ones, autolytic cleavage shifted to other, adjacent sites. Based on these preferences, a new fluorogenic calpain substrate, DABCYLTPLKSPPPSPR-EDANS, was designed and synthesized. In the case of micro- and m-calpain, this substrate is kinetically superior to commercially available ones, and it can be used for the in vivo assessment of the activity of these ubiquitous mammalian calpains.

PubMed
PDF
6.3552004210
10Szilagyi A, Grimm V, Arakaki AK, Skolnick J (2005):
Prediction of physical protein-protein interactions.
Physical Biology, 2(2), S1-16. doi: 10.1088/1478-3975/2/2/S01
Abstract

Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.

PubMed
PDF
2.773200655
11Szilagyi A, Skolnick J (2006):
Efficient prediction of nucleic acid binding function from low-resolution protein structures.
Journal of Molecular Biology, 358(3), 922-33. doi: 10.1016/j.jmb.2006.02.053
Abstract

Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.

PubMed
PDF
4.890200654
12Szilagyi A, Kardos J, Osvath Sz, Barna L, Zavodszky P (2007):
Protein folding.
In: Lajtha A, Banik N (eds.): Handbook of Neurochemistry and Molecular Neurobiology, Volume 7, Chapter 10. pp. 303-344. Springer, 2007
Abstract

Since Anfinsen's famous experiments in the 1960s, it has been known that the complex three-dimensional structure of protein molecules is encoded in their amino acid sequences, and the chains autonomously fold under proper conditions. Cracking this code, which is sometimes called "the second part of the genetic code," has been one of the greatest challenges of molecular biology. Although a full understanding of how proteins fold remains elusive, theoretical and experimental studies of protein folding have come a long way since Anfinsen's findings. In the living cell, folding occurs in a complex and crowded environment, often involving helper proteins, and in some cases it can go awry: the protein can misfold, aggregate, or form amyloid fibers. It is increasingly recognized that misfolded proteins and amyloid formation are the root cause of a number of serious illnesses including several neurodegenerative diseases. Therefore, the study of protein folding remains a key area of biomedical research.

PDF
13Graczer E, Varga A, Hajdu I, Melnik B, Szilagyi A, Semisotnov G, Zavodszky P, Vas M (2007):
Rates of unfolding, rather than refolding, determine thermal stabilities of thermophilic, mesophilic and psychrotrophic IPMDHs.
Biochemistry, 46(41), 11536-49. doi: 10.1021/bi700754q
Abstract

The relationship between the thermal stability of proteins and rates of unfolding and refolding is still an open issue. The data are very scarce, especially for proteins with complex structure. Here, time-dependent denaturation-renaturation experiments on Thermus thermophilus, Escherichia coli, and Vibrio sp. I5 3-isopropylmalate dehydrogenases (IPMDHs) of different heat stabilities are presented. Unfolding, as monitored by several methods, occurs in a single first-order step with half-times of approximately 1 h, several minutes, and few seconds for the thermophilic, mesophilic, and psychrotrophic enzymes, respectively. The binding of Mn*IPM (the manganese complex of 3-isopropylmalate) markedly reduces the rates of unfolding; this effect is more prominent for the less stable enzyme variants. Refolding is a two-step or multistep first-order process involving an inactive intermediate(s). The restoration of the native structure and reactivation take place with a half-time of a few minutes for all three IPMDHs. Thus, the comparative experimental unfolding-refolding studies of the three IPMDHs with different thermostabilities have revealed a close relationship between thermostability and unfolding rate. Structural analysis has shown that the differences in the molecular contacts between selected nonconserved residues are responsible for the different rates of unfolding. On the other hand, the folding rates might be correlated with the absolute contact order, which does not significantly vary between IPMDHs with different thermostabilities. On the basis of our observations, folding rates appear to be dictated by global structural characteristics (such as native topology, i.e., contact order) rather than by thermodynamic stability.

PubMed
PDF
3.368200714
14Szilagyi A (2008):
A mathematically related singularity and the maximum size of protein domains.
Proteins, 71(4), 2086-8. doi: 10.1002/prot.22000
Abstract

In a paper titled "A topologically related singularity suggests a maximum preferred size for protein domains" (Zbilut et al., Proteins 2007;66:621-629), Zbilut et al. claim to have found a singularity in certain geometrical properties of protein structures, and suggest that this singularity may limit the maximum size of protein domains. They find further support for the singularity in their analysis of G-factors calculated by the PROCHECK program. Here, we show that the claimed singularity is a mathematical artifact with no physical meaning, and we reanalyze the G-factors to show that Zbilut et al.'s results are due to a single outlier in the data. Thus, the existence of an actual singularity in the topological properties of proteins is not supported by the findings of Zbilut et al.

PubMed
PDF
3.41920081
15Szilagyi A, Gyorffy D, Zavodszky P (2008):
The twilight zone between protein order and disorder.
Biophysical Journal, 95(4), 1612-26. doi: 10.1529/biophysj.108.131151
Abstract

The amino acid composition of intrinsically disordered proteins and protein segments characteristically differs from that of ordered proteins. This observation forms the basis of several disorder prediction methods. These, however, usually perform worse for smaller proteins (or segments) than for larger ones. We show that the regions of amino acid composition space corresponding to ordered and disordered proteins overlap with each other, and the extent of the overlap (the "twilight zone") is larger for short than for long chains. To explain this finding, we used two-dimensional lattice model proteins containing hydrophobic, polar, and charged monomers and revealed the relation among chain length, amino acid composition, and disorder. Because the number of chain configurations exponentially grows with chain length, a larger fraction of longer chains can reach a low-energy, ordered state than do shorter chains. The amount of information carried by the amino acid composition about whether a protein or segment is (dis)ordered grows with increasing chain length. Smaller proteins rely more on specific interactions for stability, which limits the possible accuracy of disorder prediction methods. For proteins in the "twilight zone", size can determine order, as illustrated by the example of two-state homodimers.

PubMed
PDF
4.683200818
16Hajdu I, Bothe C, Szilagyi A, Kardos J, Gal P, Zavodszky P (2008):
Adjustment of conformational flexibility of glyceraldehyde-3-phosphate dehydrogenase as a means of thermal adaptation and allosteric regulation.
European Biophysical Journal, 37(7), 1139-44. doi: 10.1007/s00249-008-0332-x
Abstract

Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from Thermotoga maritima (TmGAPDH) is a thermostable enzyme (Tm = 102 degrees C), which is fully active at temperatures near 80 degrees C but has very low activity at room temperature. In search for an explanation of this behavior, we measured the conformational flexibility of the protein by hydrogen-deuterium exchange and compared the results with those obtained with GAPDH from rabbit muscle (RmGAPDH). At room temperature, the conformational flexibility of TmGAPDH is much less than that of RmGAPDH, but increases with increasing temperature and becomes comparable to that of RmGAPDH near the physiological temperature of Thermotoga maritima. Using the available three-dimensional structures of the two enzymes, we compared the B factors that reflect the local mobility of protein atoms. The largest differences in B factors are seen in the coenzyme and NAD binding regions. The likely reason for the low activity of TmGAPDH at room temperature is that the motions required for enzyme functions are restricted. The findings support the idea of "corresponding states" which claims that over the time span of evolution, the overall conformational flexibility of proteins has been preserved at their corresponding physiological temperatures.

PubMed
PDF
2.40920083
17Kamondi S, Szilagyi A, Barna L, Zavodszky P (2008):
Engineering the thermostability of a TIM-barrel enzyme by rational family shuffling.
Biochemical and Biophysical Research Communications, 374(4), 725-40. doi: 10.1016/j.bbrc.2008.07.095
Abstract

A possible approach to generate enzymes with an engineered temperature optimum is to create chimeras of homologous enzymes with different temperature optima. We tested this approach using two family-10 xylanases from Thermotoga maritima: the thermophilic xylanase A catalytic domain (TmxAcat, T(opt)=68 degrees C), and the hyperthermophilic xylanase B (TmxB, T(opt)=102 degrees C). Twenty-one different chimeric constructs were created by mimicking family shuffling in a rational manner. The measured temperature optima of the 16 enzymatically active chimeras do not monotonically increase with the percentage of residues coming from TmxB. Only four chimeras had a higher temperature optimum than TmxAcat, the most stable variant (T(opt)=80 degrees C) being the one in which both terminal segments came from TmxB. Further analysis suggests that the interaction between the N- and C-terminal segments has a disproportionately high contribution to the overall thermostability. The results may be generalizable to other enzymes where the N- and C-termini are in contact.

PubMed
PDF
2.648200811
18Nimrod G, Szilagyi A, Leslie C, Ben-Tal N (2009):
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
Journal of Molecular Biology, 387(4), 1040-53. doi: 10.1016/j.jmb.2009.02.023
Abstract

DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.

PubMed
PDF
3.871200933
19Hajdu I, Szilagyi A, Kardos J, Zavodszky P (2009):
A link between hinge-bending domain motions and the temperature dependence of catalysis in IPMDH.
Biophysical Journal, 96(12), 5003-12. doi: 10.1016/j.bpj.2009.04.014
Abstract

Enzyme function depends on specific conformational motions. We show that the temperature dependence of enzyme kinetic parameters can provide insight into these functionally relevant motions. While investigating the catalytic properties of IPMDH from Escherichia coli, we found that its catalytic efficiency (k(cat)/K(M,IPM)) for the substrate IPM has an unusual temperature dependence, showing a local minimum at approximately 35 degrees C. In search of an explanation, we measured the individual constants k(cat) and K(M,IPM) as a function of temperature, and found that the van 't Hoff plot of K(M,IPM) shows sigmoid-like transition in the 20-40 degrees C temperature range. By means of various measurements including hydrogen-deuterium exchange and fluorescence resonance energy transfer, we showed that the conformational fluctuations, including hinge-bending domain motions increase more steeply with temperatures >30 degrees C. The thermodynamic parameters of ligand binding determined by isothermal titration calorimetry as a function of temperature were found to be strongly correlated to the conformational fluctuations of the enzyme. Because the binding of IPM is associated with a hinge-bending domain closure, the more intense hinge-bending fluctuations at higher temperatures increasingly interfere with IPM binding, thereby abruptly increasing its dissociation constant and leading to the observed unusual temperature dependence of the catalytic efficiency.

PubMed
PDF
4.390200910
20Than NG, Romero R, Goodman M, Weckle A, Xing J, Dong Z, Xu Y, Tarquini F, Szilagyi A, Gal P, Hou Z, Tarca AL, Kim CJ, Kim JS, Haidarian S, Uddin M, Bohn H, Benirschke K, Santolaya-Forgas J, Grossman LI, Erez O, Hassan SS, Zavodszky P, Papp Z, Wildman DE (2009):
A primate subfamily of galectins expressed at the maternal-fetal interface that promote immune cell death.
Proceedings of the National Academy of Sciences USA, 106(24), 9731-6. doi: 10.1073/pnas.0903568106
Abstract

Galectins are proteins that regulate immune responses through the recognition of cell-surface glycans. We present evidence that 16 human galectin genes are expressed at the maternal-fetal interface and demonstrate that a cluster of 5 galectin genes on human chromosome 19 emerged during primate evolution as a result of duplication and rearrangement of genes and pseudogenes via a birth and death process primarily mediated by transposable long interspersed nuclear elements (LINEs). Genes in the cluster are found only in anthropoids, a group of primate species that differ from their strepsirrhine counterparts by having relatively large brains and long gestations. Three of the human cluster genes (LGALS13, -14, and -16) were found to be placenta-specific. Homology modeling revealed conserved three-dimensional structures of galectins in the human cluster; however, analyses of 24 newly derived and 69 publicly available sequences in 10 anthropoid species indicate functional diversification by evidence of positive selection and amino acid replacements in carbohydrate-recognition domains. Moreover, we demonstrate altered sugar-binding capacities of 6 recombinant galectins in the cluster. We show that human placenta-specific galectins are predominantly expressed by the syncytiotrophoblast, a primary site of metabolic exchange where, early during pregnancy, the fetus comes in contact with immune cells circulating in maternal blood. Because ex vivo functional assays demonstrate that placenta-specific galectins induce the apoptosis of T lymphocytes, we propose that these galectins reduce the danger of maternal immune attacks on the fetal semiallograft, presumably conferring additional immune tolerance mechanisms and in turn sustaining hemochorial placentation during the long gestation of anthropoid primates.

PubMed
PDF
9.432200973
21Kucukural A, Szilagyi A, Sezerman O, Zhang Y (2010):
Protein homology analysis for function prediction with parallel sub-graph isomorphism.
In: Lodhi H, Yamanishi Y (eds.): Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative perspectives. IGI Global, 2010, pp. 129-144.
Abstract

To annotate the biological function of a protein molecule, it is essential to have information on its 3D structure. Many successful methods for function prediction are based on determining structurally conserved regions because the functional residues are proved to be more conservative than others in protein evolution. Since the 3D conformation of a protein can be represented by a contact map graph, graph matching, algorithms are often employed to identify the conserved residues in weakly homologous protein pairs. However, the general graph matching algorithm is computationally expensive because graph similarity searching is essentially a NP-hard problem. Parallel implementations of the graph matching are often exploited to speed up the process. In this chapter, the authors review theoretical and computational approaches of graph theory and the recently developed graph matching algorithms for protein function prediction.

Publisher
22Mukherjee S, Szilagyi A, Roy A, Zhang Y (2010):
Genome-wide protein structure prediction.
In: Kolinski A (ed.): Multiscale approaches to protein modeling: structure prediction, dynamics, thermodynamics and macromolecular assemblies. Springer 2010, pp. 255-280.
Abstract

The post-genomic era has witnessed an explosion of protein sequences in the public databases; but this has not been complemented by the availability of genome-wide structure and function information, due to the technical difficulties and labor expenses incurred by existing experimental techniques. The rapid advancements in computer-based protein structure prediction methods have enabled automated and yet reliable methods for generating three-dimensional (3D) structural models of proteins. Genome-scale structure prediction experiments have been conducted by a number of groups, starting as early as in 1997, and some noteworthy efforts have been made using the MODELLER and ROSETTA methods. Along another line, TOUCHSTONE was used to predict the structures of all 85 small proteins in the Mycoplasma genitalium genome, which established template-refinement-based structure prediction as a practical approach for genome-scale experiments. This was followed by the development of Threading ASSEmbly Refinement (TASSER) and Iterative Threading ASSEmbly Refinement (I-TASSER) algorithms which use a combination of various approaches for threading, fragment assembly, ab initio loop modeling, and structural refinement to predict the structures. A successful structural prediction for all medium-sized open reading frames (ORFs) in the Escherichia coli genome was demonstrated by this method, achieving high-accuracy models for 920 out of 1,360 proteins. G protein-coupled receptors (GPCRs) are an extremely important class of membrane proteins for which only very few structures are available in the Protein Data Bank (PDB). TASSER was used to predict the structures of all 907 putative GPCRs in the human genome, and the high accuracy confirmed by newly solved GPCR structures and recent blind tests have demonstrated the usefulness and robustness of the TASSER/I-TASSER models for the functional annotation of GPCRs. Recently, the I-TASSER protein structure prediction method has been used as a basis for functional annotation of protein sequences. The increasing popularity and need for such automated structure and function prediction algorithms can be judged by the fact that the I-TASSER server has generated structure predictions for 35,000 proteins submitted by more than 8,000 users from 86 countries in the last 24 months. The success of these modeling experiments demonstrates significant new progress in high-throughput and genome-wide protein structure prediction.

Publisher
23Nimrod G, Schushan M, Szilagyi A, Leslie C, Ben-Tal N (2010):
iDBPs: A web server for the identification of DNA binding proteins.
Bioinformatics, 26(5), 692-3. doi: 10.1093/bioinformatics/btq019
Abstract

The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies.

PubMed
PDF
iDBPs server
4.877201020
24Wu S, Szilagyi A, Yang Zhang (2011):
Improving protein structure prediction using multiple sequence-based contact predictions.
Structure, 19(8), 1182-91. doi: 10.1016/j.str.2011.05.004
Abstract

Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors that are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 nonhomologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with p value by Student's t test<0.00001 and 0.001, respectively. In several cases, template modeling score increases by >30%, which essentially converts "nonfoldable" targets into "foldable" ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets.

PubMed
PDF
6.347201129
25Than G, Romero R, Meiri H, Erez O, Xu Y, Tarquini F, Barna L, Szilagyi A, Ackerman R, Sammar M, Fule T, Karaszi K, Kovalszky I, Dong Z, Kim CJ, Zavodszky P, Papp Z, Gonen R (2011):
PP13, maternal ABO blood groups and the risk assessment of pregnancy complications
PLoS ONE, 6(7), e21564. doi: 10.1371/journal.pone.0021564
Abstract

BACKGROUND: Placental Protein 13 (PP13), an early biomarker of preeclampsia, is a placenta-specific galectin that binds beta-galactosides, building-blocks of ABO blood-group antigens, possibly affecting its bioavailability in blood. METHODS AND FINDINGS: We studied PP13-binding to erythrocytes, maternal blood-group effect on serum PP13 and its performance as a predictor of preeclampsia and intrauterine growth restriction (IUGR). Datasets of maternal serum PP13 in Caucasian (n.=.1078) and Hispanic (n.=.242) women were analyzed according to blood groups. In vivo, in vitro and in silico PP13-binding to ABO blood-group antigens and erythrocytes were studied by PP13-immunostainings of placental tissue-microarrays, flow-cytometry of erythrocyte-bound PP13, and model-building of PP13--blood-group H antigen complex, respectively. Women with blood group AB had the lowest serum PP13 in the first trimester, while those with blood group B had the highest PP13 throughout pregnancy. In accordance, PP13-binding was the strongest to blood-group AB erythrocytes and weakest to blood-group B erythrocytes. PP13-staining of maternal and fetal erythrocytes was revealed, and a plausible molecular model of PP13 complexed with blood-group H antigen was built. Adjustment of PP13 MoMs to maternal ABO blood group improved the prediction accuracy of first trimester maternal serum PP13 MoMs for preeclampsia and IUGR. CONCLUSIONS: ABO blood group can alter PP13-bioavailability in blood, and it may also be a key determinant for other lectins' bioavailability in the circulation. The adjustment of PP13 MoMs to ABO blood group improves the predictive accuracy of this test.

PubMed
PDF
4.092201114
26Szilagyi A, Zhang Y, Zavodszky P (2012):
Intra-chain 3D segment swapping spawns the evolution of new multidomain protein architectures.
Journal of Molecular Biology, 415(1), 221-35. doi: 10.1016/j.jmb.2011.10.045
Abstract

Multidomain proteins form in evolution through the concatenation of domains, but structural domains may comprise multiple segments of the chain. In this work, we demonstrate that new multidomain architectures can evolve by an apparent three-dimensional swap of segments between structurally similar domains within a single-chain monomer. By a comprehensive structural search of the current Protein Data Bank (PDB), we identified 32 well-defined segment-swapped proteins (SSPs) belonging to 18 structural families. Nearly 13% of all multidomain proteins in the PDB may have a segment-swapped evolutionary precursor as estimated by more permissive searching criteria. The formation of SSPs can be explained by two principal evolutionary mechanisms: (i) domain swapping and fusion (DSF) and (ii) circular permutation (CP). By large-scale comparative analyses using structural alignment and hidden Markov model methods, it was found that the majority of SSPs have evolved via the DSF mechanism, and a much smaller fraction, via CP. Functional analyses further revealed that segment swapping, which results in two linkers connecting the domains, may impart directed flexibility to multidomain proteins and contributes to the development of new functions. Thus, inter-domain segment swapping represents a novel general mechanism by which new protein folds and multidomain architectures arise in evolution, and SSPs have structural and functional properties that make them worth defining as a separate group.

PubMed
PDF
Web page
3.90520125
27Gyorffy D, Zavodszky P, Szilagyi A (2012):
"Pull moves" for rectangular lattice polymers are not fully reversible.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(6), 1847-9. doi: 10.1109/TCBB.2012.129
Abstract

"Pull moves" is a popular move set for lattice polymer model simulations. We show that the proof given for its reversibility earlier is flawed, and some moves are irreversible, which leads to biases in the parameters estimated from the simulations. We show how to make the move set fully reversible.

PubMed
PDF
arXiv
Web page
1.61620122
28Abrusan G, Szilagyi A, Zhang Y, Papp B (2013):
Turning gold into 'junk': transposable elements utilize central proteins of cellular networks.
Nucleic Acids Research, 41(5), 3190-200. doi: 10.1093/nar/gkt011
Abstract

The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts.

PubMed
PDF
8.80820132
29Csermely P, Nussinov R, Szilagyi A (2013):
From allosteric drugs to allo-network drugs: State of the art and trends of design, synthesis, and computational methods.
Current Topics in Medicinal Chemistry, 13(1), 2-4. doi: 10.2174/1568026611313010002
Abstract

Allosteric drugs bind to sites which are usually less conserved evolutionarily as compared to orthosteric sites. As such, they can discriminate between closely related proteins, have fewer side effects, and a consequent lower concentration can convey a lesser likelihood of receptor desensitization. However, an allosteric mode of action may also make the results of preclinical and animal experiments less predictive. The sensitivity of the allosteric consequences to the environment further increases the importance of accounting for patient population diversity. Even subtle differences in protein sequence, in cellular metabolic states or in target tissues, can result in different outcomes. This mini-hot-topic issue of CTMC showcases some successes and challenges of allosteric drug development through the examples of seventransmembrane (GPCR), AMPA, NMDA and metabotropic glutamate receptors, as well as the morpheein model of allosterism involved in inherent metabolic errors. Finally, the development of allo-network drugs, which are allosteric drugs acting indirectly on the neighborhood of the pharmacological target in protein-protein interaction or signaling networks, is described.

PubMed
PDF
3.45320139
30Szilagyi A, Nussinov R, Csermely P (2013):
Allo-network drugs: Extension of the allosteric drug concept to protein-protein interaction and signaling networks.
Current Topics in Medicinal Chemistry, 13(1), 64-77. doi: 10.2174/1568026611313010007
Abstract

Allosteric drugs are usually more specific and have fewer side effects than orthosteric drugs targeting the same protein. Here, we overview the current knowledge on allosteric signal transmission from the network point of view, and show that most intra-protein conformational changes may be dynamically transmitted across protein-protein interaction and signaling networks of the cell. Allo-network drugs influence the pharmacological target protein indirectly using specific inter-protein network pathways. We show that allo-network drugs may have a higher efficiency to change the networks of human cells than those of other organisms, and can be designed to have specific effects on cells in a diseased state. Finally, we summarize possible methods to identify allo-network drug targets and sites, which may develop to a promising new area of systems-based drug design.

PubMed
PDF
3.453201328
31Abrusan G, Zhang Y, Szilagyi A (2013):
Structure prediction and analysis of DNA transposon and LINE retrotransposon proteins.
Journal of Biological Chemistry, 288(22), 16127-38. doi: 10.1074/jbc.M113.451500
Abstract

Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology.

PubMed
PDF
4.60020131
32Kucukural A, Szilagyi A, Sezerman O, Zhang Y (2013):
Protein homology analysis for function prediction with parallel sub-graph isomorphism.
In: Bioinformatics: Concepts, Methodologies, Tools, and Applications. IGI Global, 2013, pp. 386-399. doi: 10.4018/978-1-4666-3604-0.ch021
Abstract

To annotate the biological function of a protein molecule, it is essential to have information on its 3D structure. Many successful methods for function prediction are based on determining structurally conserved regions because the functional residues are proved to be more conservative than others in protein evolution. Since the 3D conformation of a protein can be represented by a contact map graph, graph matching, algorithms are often employed to identify the conserved residues in weakly homologous protein pairs. However, the general graph matching algorithm is computationally expensive because graph similarity searching is essentially a NP-hard problem. Parallel implementations of the graph matching are often exploited to speed up the process. In this chapter, the authors review theoretical and computational approaches of graph theory and the recently developed graph matching algorithms for protein function prediction.

Publisher
33Szilagyi A, Zhang Y (2014):
Template-based structure modeling of protein-protein interactions.
Current Opinion in Structural Biology, 24(Feb), 10-23. doi: 10.1016/j.sbi.2013.11.005
Abstract

The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the protein-protein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement.

PubMed
PDF
7.201201418
34Than NG, Balogh A, Romero R, Karpati E, Erez O, Szilagyi A, Kovalszky I, Sammar M, Gizurarson S, Matko J, Zavodszky P, Papp Z, Meiri H (2014):
Placental Protein 13 (PP13) - A placental immunoregulatory galectin protecting pregnancy.
Frontiers in Immunology, 5, 348. doi: 10.3389/fimmu.2014.00348
Abstract

Galectins are glycan-binding proteins that regulate innate and adaptive immune responses, and some confer maternal-fetal immune tolerance in eutherian mammals. A chromosome 19 cluster of galectins has emerged in anthropoid primates, species with deep placentation and long gestation. Three of the five human cluster galectins are solely expressed in the placenta, where they may confer additional immunoregulatory functions to enable deep placentation. One of these is galectin-13, also known as Placental Protein 13 (PP13). It has a "jelly-roll" fold, carbohydrate-recognition domain and sugar-binding preference resembling other mammalian galectins. PP13 is predominantly expressed by the syncytiotrophoblast and released from the placenta into the maternal circulation. Its ability to induce apoptosis of activated T cells in vitro, and to divert and kill T cells as well as macrophages in the maternal decidua in situ, suggests important immune functions. Indeed, mutations in the promoter and an exon of LGALS13 presumably leading to altered or non-functional protein expression are associated with a higher frequency of preeclampsia and other obstetrical syndromes, which involve immune dysregulation. Moreover, decreased placental expression of PP13 and its low concentrations in first trimester maternal sera are associated with elevated risk of preeclampsia. Indeed, PP13 turned to be a good early biomarker to assess maternal risk for the subsequent development of pregnancy complications caused by impaired placentation. Due to the ischemic placental stress in preterm preeclampsia, there is increased trophoblastic shedding of PP13 immunopositive microvesicles starting in the second trimester, which leads to high maternal blood PP13 concentrations. Our meta-analysis suggests that this phenomenon may enable the potential use of PP13 in directing patient management near to or at the time of delivery. Recent findings on the beneficial effects of PP13 on decreasing blood pressure due to vasodilatation in pregnant animals suggest its therapeutic potential in preeclampsia.

PubMed
PDF
5.69520159
35Graczer E, Bacso A, Konya D, Kazi A, Soos T, Molnar L, Szimler T, Beinrohr L, Szilagyi A, Zavodszky P, Vas M. (2014):
Drugs against Mycobacterium tuberculosis 3-isopropylmalate dehydrogenase can be developed using homologous enzymes as surrogate targets
Protein & Peptide Letters, 21(12), 1295-307. doi: 10.2174/0929866521666140606111019
Abstract

3-Isopropylmalate dehydrogenase (IPMDH) from Mycobacterium tuberculosis (Mtb) may be a target for specific drugs against this pathogenic bacterium. We have expressed and purified Mtb IPMDH and determined its physical-chemical and enzymological properties. Size-exclusion chromatography and dynamic light scattering measurements (DLS) suggest a tetrameric structure for Mtb IPMDH, in contrast to the dimeric structure of most IPMDHs. The kinetic properties (kcat and Km values) of Mtb IPMDH and the pH-dependence of kcat are very similar to both Escherichia coli (Ec) and Thermus thermophilus (Tt) IPMDHs. The stability of Mtb IPMDH in 8 M urea is close to that of the mesophilic counterpart, Ec IPMDH, both of them being much less stable than the thermophilic (Tt) enzyme. Two known IPMDH inhibitors, O-methyl oxalohydroxamate and 3-methylmercaptomalate, have been synthesised. Their inhibitory effects were found to be independent of the origin of IPMDHs. Thus, experiments with either Ec or Tt IPMDH would be equally relevant for designing specific inhibitory drugs against Mtb IPMDH.

PubMed
1.06820141
36Abrusan G, Yant SR, Szilagyi A, Marsh JA, Mates L, Izsvak Zs, Barabas O, Ivics Z (2016):
Structural determinants of Sleeping Beauty transposase activity
Molecular Therapy, 24(8), 1369-77. doi: 10.1038/mt.2016.110
Abstract

Transposases are important tools in genome engineering, and there is considerable interest in engineering more efficient ones. Here we seek to understand the factors determining their activity using the Sleeping Beauty transposase. Recent work suggests that protein co-evolutionary information can be used to classify groups of physically connected, co-evolving residues into elements called ‘sectors’, which have proven useful for understanding the folding, allosteric interactions, and enzymatic activity of proteins. Using extensive mutagenesis data, protein modeling and analysis of folding energies, we show that 1) The Sleeping Beauty transposase contains two sectors, which span across conserved domains, and are enriched in DNA-binding residues, indicating that the DNA binding and endonuclease functions of the transposase coevolve; 2) Sector residues are highly sensitive to mutations, and most mutations of these residues strongly reduce transposition rate; 3) Mutations with a strong effect on free energy of folding in the DDE domain of the transposase significantly reduce transposition rate. 4) Mutations that influence DNA and protein-protein interactions generally reduce transposition rate, although most hyperactive mutants are also located on the protein surface, including residues with protein-protein interactions. This suggests that hyperactivity results from the modification of protein interactions, rather than the stabilization of protein fold.

PubMed
Publisher
6.9382015
37Szilagyi A, Gyorffy D, Zavodszky P (2017):
Segment swapping aided the evolution of enzyme function: The case of uroporphyrinogen III synthase.
Proteins: Structure, Function, and Bioinformatics, 85(1), 46-53. doi: 10.1002/prot.25190
Abstract

In an earlier study, we showed that two-domain segment-swapped proteins can evolve by domain swapping and fusion, resulting in a protein with two linkers connecting its domains. We proposed that a potential evolutionary advantage of this topology may be the restriction of interdomain motions, which may facilitate domain closure by a hinge-like movement, crucial for the function of many enzymes. Here, we test this hypothesis computationally on uroporphyrinogen III synthase, a two-domain segment-swapped enzyme essential in porphyrin metabolism. To compare the interdomain flexibility between the wild-type, segment-swapped enzyme (having two interdomain linkers) and circular permutants of the same enzyme having only one interdomain linker, we performed geometric and molecular dynamics simulations for these species in their ligand-free and ligand-bound forms. We find that in the ligand-free form, interdomain motions in the wild-type enzyme are significantly more restricted than they would be with only one interdomain linker, while the flexibility difference is negligible in the ligand-bound form. We also estimated the entropy costs of ligand binding associated with the interdomain motions, and find that the change in domain connectivity due to segment swapping results in a reduction of this entropy cost, corresponding to ∼20% of the total ligand binding free energy. In addition, the restriction of interdomain motions may also help the functional domain-closure motion required for catalysis. This suggests that the evolution of the segment-swapped topology facilitated the evolution of enzyme function for this protein by influencing its dynamic properties. This article is protected by copyright. All rights reserved.

PubMed
Publisher
PDF (author version)
2.4992015
38Gyimesi G, Zavodszky P, Szilagyi A (2017):
Calculation of configurational entropy differences from conformational ensembles using Gaussian mixtures
Journal of Chemical Theory and Computation, 13(1), 29-41. doi: 10.1021/acs.jctc.6b00837
Abstract

We present a novel, conceptually simple approach to calculate the configurational entropy difference between two conformational ensembles of a molecular system. The method estimates the full-dimensional probability density function of the system by a Gaussian mixture, using an efficient greedy learning algorithm with a cross-validation based stopping criterion. Evaluating the method on conformational ensembles corresponding to substates of five small peptide systems, excellent agreement is found with the exact entropy differences obtained from a full enumeration of conformations. Compared with the quasiharmonic method and two other, more recently developed methods, the Gaussian mixture method yields more accurate results at smaller sample sizes. We illustrate the power of the method by calculating the backbone torsion angle entropy difference between disulfide-bonded and non-disulfide-bonded states of tachyplesin, a 17-residue antimicrobial peptide, and between two substates in the native ensemble of the 58-residue bovine pancreatic trypsin inhibitor. The program is available at http://gmentropy.szialab.org.

PubMed
Software
Publisher
ACS e-print
5.3012015
     
 Total: 136.7791241
(1217 without self-citations)
 h-index = 16   

Per-article citation numbers last updated on Jun 13, 2016. Total citation numbers updated on Nov 30, 2016.