PDPredictor: Predicting crystallizability from protein sequence




MCSG developed a computational method that assigns a MCSG Z-score, which is a predictor of crystallizability. In PSI:Biology, MCSG identified 576 high scorers that yielded 200 purified proteins and more than 40 structures determined to date, an ~20% success rate.


MCSG developed a computational method to predict crystallizability using two datasets garnered from the MCSG pipeline. The first is a set of more than 1700 proteins that expressed well but were insoluble, and the second set was comprised of the >700 unique proteins that the MCSG deposited into the PDB. These two datasets covered more than 130 species and the sequence sets were clustered to 60% identity to eliminate redundancies. In a similar fashion to the OB-score, a score matrix was constructed by two-dimensional binning of the calculated isoelectric point (pI) and grand average hydropathy (GRAVY) values for the full-length and domain constructs of protein targets in the two sets. The MCSG Z-score can predict insoluble proteins at 71% accuracy and those producing crystals amenable for structure determination with 60% accuracy. In addition, MCSG has developed a Support Vector Machine-based prediction method for crystallizability of proteins utilizing a prioritized set of 10 amino acid attributes and achieved 69% accuracy for the positive 'crystal' class and 66% accuracy for the negative 'insoluble' class. To further improve this predictor, sequence-based attributes such as topology (transmembrane and signal peptide predictions), similarity profile to existing structures and orthologs, sequence family profiles, secondary structure, and prediction of disordered, low complexity, and coiled-coil regions were included. In the current phase of the Protein Structure Initiative, PSI:Biology, MCSG has been using the MCSG Z-score to actively select high scoring targets. Recently, 576 high scorers yielded 200 purified proteins and more than 40 structures determined to date, an ~20% success rate.

Users can design nested truncations using the MCSG Z-score matrix using the PDpredictor tool online. Heat maps are generated for the calculated pI, GRAVY, and MCSG Z-score. The interface allows the user to include a fusion or solubility tag, which will change the overall behavior of the protein. Matrix data can be downloaded from the website. A separate web utility can be used to evaluate the propensity of these truncations to produce x-ray quality crystals, calculating the crystallizability score for a set of input sequences. Other utilities are available, such as conversion of FASTA formatted sequences to the required format, 2D binning of input data, and visualization of 2D matrices.



The correlation of a protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability, the MCSG Z-score.


Babnigg G, Joachimiak A. "Predicting protein crystallization propensity from protein sequence." J. Struct. Funct. Genomics. 11:71-80 (2010). PubMed ID: 20177794 | Search SBKB Publications portal

Kim Y, Babnigg G, Jedrzejczak R, Eschenfeldt WH, Li H, Maltseva N, Hatzos-Skintges C, Gu M, Makowska-Grzyska M, Wu R, An H, Chhor G, Joachimiak A. "High-throughput protein purification and quality assessment for crystallization." Methods. 55:12-28 (2011). PubMed ID: 21907284 | Search SBKB Publications portal


G. Babnigg, Email: gbabnigg@anl.gov

Midwest Center for Structural Genomics


share on Facebook Tweet This share on LinkedIn add Google Bookmark add to delicious Share on reddit share on StumbleUpon Digg this

Last edited:Wed 29 Feb 2012 - 6 years, 6 months ago