Rgy calculations involving proteins: a physical-based potential function that focuses on the basic forces amongst atoms, in addition to a knowledge-based possible that relies on parameters derived from experimentally solved protein structures [27]. Owing for the heavy computational complexity necessary for the initial method, we adopted the knowledge-based possible for our workflow. The power functions for the surface residues used are these on the Protein Structure Analysis site [28]. Furthermore, a study regarding LE prediction [29] showed that certain NBI-31772 Purity & Documentation sequential residue pairs occur much more regularly in LE epitopes than in non-epitopes. A comparable statistical feature could, as a result, improve the overall performance of a CE prediction workflow. Therefore, we incorporated the statistical distribution of geometrically associated pairs of residues located in verified CEs along with the identification of residues with comparatively high power profiles. We very first situated surface residues with reasonably higher knowledge-based energies inside a specified radius of a sphere and assigned them because the initial anchors of candidate epitope regions. Then we extended the surfaces to consist of neighboring residues to define CE clusters. For this report, the distributions of energies and combined with know-how of geometrically associated pairs residues in correct epitopes were analyzed and adopted as variables for CE prediction. The results of our developed method indicate that it provides an outstanding CE prediction with high specificity and accuracy.Lo et al. BMC Bioinformatics 2013, 14(Suppl four):S3 http:www.biomedcentral.com1471-210514S4SPage three ofMethodsCE-KEG workflow architectureThe proposed CE prediction method according to knowledge-based power function and geometrical neighboring residue contents is abbreviated as “CE-KEG”. CE-KEG is performed in 4 stages: analysis of a grid-based protein surface, an energy-profile computation, anchor assignment, and CE clustering and ranking (Figure 1). The very first module inside the “Grid-based surface structure analysis” accepts a PDB file from the Research Collaboratory for Structural Bioinformatics Protein Data Bank [30] and performs protein data sampling (structure discretization) to extract surface data. Subsequently, threedimensional (3D) mathematical morphology computations (dilation and erosion) are applied to extract the solvent accessible surface of the protein in the “Surface residue detection” submodule [31], and surface rates for atoms are calculated by evaluating the exposure ratio contacted by solvent molecules. Then, the surface rates from the side chain atoms of every single residue are summed, expressed as the residue surface price, and exported to a look-up table. The subsequent module is “Energy profile computation” that uses calculations performed at the ProSA net program to rank the energies of every single residue around the targeted antigen surface(s) [28]. Surface residues with higher energies and positioned at mutually exclusivepositions are regarded as the initial CE anchors. The third module is “Anchor assignment and CE clustering” which performs CE neighboring residue extensions employing the initial CE anchors to retrieve neighboring residues according to energy indices and distances among anchor and extended residues. Also, the frequencies of occurrence of pair-wise amino acids are calculated to pick appropriate possible CE residue clusters. For the final module, “CE ranking and output result” the values of the knowledge-based energy propens.