Learning the pattern-based CRF for prediction of a protein local structure

Zhalgas Mukanov, Rustem Takhanov


We describe a pattern-based conditional random field model for the prediction of dihedral angles of an all-alpha protein from its primary structure. Such conditional random fields appear naturally in sequence labeling problems of bioinformatics and can be considered relative to the Hidden Markov Models. The learning of parameters of the model is done by the structural SVM technique. The accuracy that we achieved in predicting dihedral angles, φ and ψ, equals 22.8 and 48.3 degrees, respectively. The MDA score, defined as the percentage of residues that are found in correctly predicted eight-residue segments, attained 56.5%.

Full Text:



Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. Hidden markov support vector machines. Proceedings, Twentieth International Conference on Machine Learning, 1, 07 2003.

C B Anfinsen. The formation and stabilization of protein structure. Biochemical Journal, 128(4):737–749, 07 1972.

Zhenisbek Assylbekov and Rustem Takhanov. Reusing weights in subword-aware neural language models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1413–1423, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.

Christopher Bystroff, Vesteinn Thorsson, and David Baker. Hmmstr: a hidden markov model for local sequence-structure correlations in proteins. edited by j. thornton. Journal of Molecular Biology, 301(1):173 –190, 2000.

Peter Y. Chou and Gerald D. Fasman. Prediction of protein conformation. Biochemistry, 13(2):222–245, 1974. PMID: 4358940.

A.G. de Brevern, C. Etchebest, and S. Hazout. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Structure, Function, and Bioinformatics, 41(3):271–287, 2000.

R. Fletcher. Newton-Like Methods, chapter 3, pages 44–79. John Wiley and Sons, Ltd, 2000.

J. Garnier, D.J. Osguthorpe, and B. Robson. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology, 120(1):97 – 120, 1978.

Blaise Gassend, Charles O’Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas. Learning biophysically-motivated parameters for alpha helix prediction. BMC bioinformatics, 8 Suppl 5:S3, 02 2007.

Misha Gromov. Crystals, proteins, stability and isoperimetry. Bulletin of the American Mathematical Society, 48(2):229–257, 2011. Copyright: Copyright 2011 Elsevier B.V., All rights reserved.

DT Jones. Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2):195—202, September 1999.

Vladimir Kolmogorov, Michal Rolı́nek, and Rustem Takhanov. Effectiveness of structural restrictions for hybrid csps. In Khaled Elbassioni and Kazuhisa Makino, editors, Algorithms and Computation - 26th International Symposium, ISAAC 2015, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial

Intelligence and Lecture Notes in Bioinformatics), pages 566–577, Germany, January 2015. Springer Verlag. 26th International Symposium on Algorithms and Computation, ISAAC 2015 ; Conference date: 09-12-2015 Through 11-12-2015.

Petros Kountouris, Petros Kountouris, and Jonathan D. Hirst. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics, 10(2):437, 2009.

Jooyoung Lee, Sitao Wu, and Yang Zhang. Ab Initio Protein Structure Prediction, pages 3–25. Springer Netherlands, Dordrecht, 2009.

Sebastian Nowozin and Christoph H. Lampert. Structured learning and prediction Trends in ® computer in Computer vision. Graphics Foundations and Vision, and 6(3–4):185–365, 2011.

Xian Qian, Xiaoqian Jiang, Qi Zhang, Xuanjing Huang, and Lide Wu. Sparse higher order conditional random fields for improved sequence labeling. In ICML, 2009.

Rustem Takhanov. Hybrid vcsps with crisp and valued conservative templates. In Takeshi Tokuyama and Yoshio Okamoto, editors, 28th International Symposium on Algorithms and Computation, ISAAC 2017, volume 92, Germany, December 2017. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 28th International Symposium on Algorithms and Computation, ISAAC 2017 ; Conference date: 09-12-2017 Through 22-12-2017.

Rustem Takhanov. Searching for an algebra on csp solutions, 2017.

Rustem Takhanov and Zhenisbek Assylbekov. Patterns versus characters in subword-aware neural language modeling. In Derong Liu, Shengli Xie, Yuanqing Li, Dongbin Zhao, and El-Sayed M. El-Alfy, editors, Neural Information Processing, pages 157–166, Cham, 2017. Springer International Publishing.

Rustem Takhanov and Vladimir Kolmogorov. Inference algorithms for pattern-based crfs on sequence data. pages 1182–1190, January 2013. 30th International Conference on Machine Learning, ICML 2013 ; Conference date: 16-06-2013 Through 21-06-2013.

Rustem Takhanov and Vladimir Kolmogorov. Combining pattern-based crfs and weighted context-free grammars, 2014.

Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 104, New York, NY, USA, 2004. Association for Computing Machinery.

Yuedong Yang, Jianzhao Gao, Jihua Wang, Rhys Heffernan, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Briefings in

Bioinformatics, 19(3):482–494, 12 2016.

Nan Ye, Wee Sun Lee, Hai Leong Chieu, and Dan Wu. Conditional random fields with high-order features for sequence labeling. In

NIPS, 2009.

DOI: https://doi.org/10.31449/inf.v46i6.3787

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.