Binding antigenic peptides to major histocompatibility complex (MHC) class molecules is a core step in adaptive immune response. There are two major categories of MHC molecules: MHC class I (MHC-I) and MHC class II (MHC-II). In contrast to MHC-I molecules that mainly bind peptides from intracellular antigens, MHC-II molecules are mainly responsible for binding peptides from extracellular antigens. These binding peptides are then presented on cell surfaces to the receptors of T helper (Th) cells, by which the adaptive immune system recognizes the antigen and starts specific responses, including activate B cells to excrete antibodies neutralizing the pathogen [1]. Therefore, the accurate identification of peptides binding to MHC molecules is of great importance in understanding the mechanism of immune recognition and facilitating the process of epitope based vaccine design [2]. Compared with MHC-I, the MHC-II peptide binding prediction is a more challenging problem. One reason is the highly porlymorphic of MHC molecules, and the other reason is that the binding groove of MHC-II is open at both ends, which leads to high flexibility on the length of binding peptides (usually 11-20 amino acids) [3].

Currently many computational methods have been developed to predict MHC class II binding peptides in the last few years, and among these methods, the pan-specific [4] methods have better performance, such as TEPITOPEpan [5], NetMHCIIpan [6, 7] and MultiRTA [8]. TEPITOPEpan was developed by extrapolating from the binding specificities of HLA-DR molecules characterized by TEPITOPE[9] to those uncharacterized, which uses PSSM (Position-specific scoring matrix) to score the peptides. Both NetMHCIIpan-1.0 and NetMHCIIpan-2.0 are ANN (artifical neural network) based methods, which use the encoding of both peptides and pocket sequences of MHC-II molecules. MultiRTA is based on a regularized thermodynamic model, where the binding afinity is computed as a weighted average over all possible binding core configurations.

Compared with feature vector based methods, kernel-based methods can deal with the flexibility of peptide lengths more naturally. With carefully designed kernels, these methods can perform very well without undertaking the complicated tasks of feature extraction and selection [10]. We develop a novel string kernel MHC2SK (MHC-II String Kernel) method to measure the similarities among peptides with variable lengths. By considering the distinct features of MHC-II peptide binding prediction problem, MHC2SK differs significantly from the recently developed kernel based method, GS (Generic String) [11] kernel, in the way of computing similarities. Furthermore, we extend MHC2SK to MHC2SKpan for pan-specific MHC-II peptide binding prediction by leveraging the binding data of various MHC molecules.


  1. Janeway C, Travers P, Walport M, Shlomchik M. 2005. Immunobiology: the immune system in health and disease. Garland Science Publishing, New York., 6 edition 2005.
  2. Lund O, Nielsen M, Lundegaard C, Kesmir C, Brunak S. 2005. Immunological bioinformatics. MIT press.
  3. Sette A, Adorini L, Colon S, Buus S, Grey H. 1989. Capacity of intact proteins to bind to MHC class II molecules.. The Journal of Immunology.143(4):1265–1267.
  4. Zhang L, Udaka K, Mamitsuka H, Zhu S. 2012 Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools.. Brief Bioinform. 13(3):350-64.
  5. Zhang L, Chen Y, Wong HS, Zhou S, Mamitsuka H, Zhu S. 2012 TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules.. PLoS One.7(2):e30483.
  6. Nielsen M, Justesen S, Lund O, Lundegaard C, Buus S. 2010. NetMHCIIpan-2.0 - Improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure. IMMUNOME RESEARCH . 6:9.
  7. Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S, Buus S, Lund O. 2008. Quantitative Predictions of Peptide Binding to Any HLA-DR Molecule of Known Sequence: NetMHCIIpan. PLoS computational biology. 4(7):e1000107.
  8. Bordner AJ, Mittelmann HD. 2010. MultiRTA: a simple yet reliable method for predicting peptide binding affinities for multiple class II MHC allotypes. BMC Bioinformatics. 11:482.
  9. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, Braxenthaler M, Gallazzi F, Protti M P, Sinigaglia F, Hammer J. 1999. Generation of tissue-specific and promiscuous HLA ligand database using DNA microarrays and virtual HLA class II matrices. Nat. Biotechnol. 17:555-561.
  10. Scholkopf B, Tsuda K, Vert JP. 2004. Kernel methods in computational biology. MIT Press. Cambridge, Mass.
  11. Giguere S, Marchand M, Laviolette F, Drouin A, Corbeil J. 2013. Learning a peptide-protein binding affinity predictor with kernel ridge regression. BMC Bioinformatics. 14:82