In silico identification of cytotoxic T lymphocyte epitopes encoded by RD5 region of Mycobacterium tuberculosis

Introduction: While CD8+ T cells (cytotoxic T lymphocytes, CTLs) play important roles in immunity against Mycobacterium tuberculosis, only a small number of human leukocyte differentiation antigen (HLA) class I-restricted CTL epitopes for TB have been identified. The current study evaluates CTL epitopes of Rv3117 and Rv3120 proteins, two newly found M. tuberculosis region-diffference-5 (RD5)-encoded antigens, and their population coverage. Methodology: The amino acid sequences of the two proteins were subjected to epitope analysis under HLA-A2, A3 and B7 supertype restriction using NetCTL, SYFPEITHI, BIMAS, NetCTLPan, IEDB, NetMHC and NetMHCPan prediction online servers. Results: Eight RD5-encoded CTL epitopes were identified in the two proteins and the average population coverage of these epitopes was 87.2% among populations worldwide. Conclusion: These CTL epitopes that were identified in silico and may have potential use for CD8+ T cell-mediated TB vaccine design.


Introduction
Tuberculosis (TB), caused by intracellular pathogen Mycobacterium tuberculosis (M. tuberculosis) [1,2], remains one of the world's biggest threats [3].The Bacillus Calmette-Guerin (BCG) vaccine is an attenuated strain of Mycobacterium bovis and represents the world's most widely used live vaccine; however, this vaccine lacks efficacy in preventing pulmonary TB in adults [4]. To combat this ongoing scourge, the development of a more effective vaccine for TB is a global priority.
Effective cell-mediated immunity is essential to control M. tuberculosis infection [2], and T cellmediated adaptive immunity plays important role in controlling initial M. tuberculosis infection and preventing reactivation of latent M. tuberculosis bacilli residing in granulomas [1]. Although it is generally accepted that CD4+ T cells are essential component of protective immunity against TB [2], recent studies have shown that major histocompatibility complex (MHC) class I-restricted CD8+ T cells (cytotoxic T lymphocytes, CTLs) are also important to immunity against M. tuberculosis [5][6][7]. In humans, MHC is also well known as human leukocyte differentiation antigen (HLA). HLA Class I molecules have variable epitopebinding specificities, and the frequency of HLA Class I polymorphism is unique across ethnicities [8]. However, only a small number of HLA class Irestricted CTL epitopes for TB have been identified; therefore, it is urgent to identify more promiscuous CTL epitopes and estimate the population coverage of these epitopes for design of an effective TB vaccine.
Comparative genomic studies have identified several M. tuberculosis-specific genomic regions of differences (RDs) that are absent in avirulent M. bovis BCG strains [9] and have paved a way for the identification of new candidate antigens for protective TB vaccines. Rv3117 and Rv3120 were found to be two M. tuberculosis RD5-encoded B-cell antigens in our previous study [10], while Rv3117 was able to evoke specific humoral and cellular immune responses in C57BL/6 mice [11]. Based on these findings, we hypothesized that Rv3117 and Rv3120 proteins may have several CTL epitopes capable of inducing a robust CD8+ T cell immune response in humans. However, the CTL epitopes and their population coverage of Rv3117 and Rv3120 have not been investigated.
In the current study, based on the distribution of characteristics of HLA Class I alleles across all populations worldwide, we developed a combined immunoinformatics approach to identify potential CTL epitopes of Rv3117 and Rv3120 and population coverage of these epitopes.

Protein sequence retrieval and analysis
The amino acid sequences of Rv3117 and Rv3120 of M. tuberculosis H37Rv (NC_000962) were downloaded from GenBank (http:// www.ncbi.nlm.nih.gov/Genbank/) and compared with those of M. tuberculosis and human proteins using BLASTP (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The molecular weights and isoelectric points of the Rv3117 and Rv3120 proteins were calculated using Accelrys Gene 2.5 software (Accelrys Inc., San Diego, USA).

Epitope prediction
Nine-mer CTL epitopes of M. tuberculosis Rv3117 and Rv3120 proteins were predicted based on HLA Class I supertypes using six prediction online servers (Table 1). First, the NetCTL 1.2 Server was used, and the threshold value for epitope identification was set at 0.75 [12]. In this study, HLA-A2, A3, and B7 supertypes were included in this analysis, as they were the most common HLA Class I supertypes across ethnicities, with an estimated frequency of > 80% [8]. Second, the predictive CTL epitopes by NetCTL 1.2 server were analyzed by the rest five prediction online tools: SYFPEITHI [13], BIMAS [14], NetCTLPan [15], Immune Epitope Database (IEDB) Analysis Resource [16], NetMHC [17], and NetMHCPan [18]. The default prediction value was selected. If there was no HLA Class I supertype list in these servers, the representative HLA Class I alleles were selected based on a pervious study [19] (Supplementary Table 1). Any 9-mer peptide, which was predicted by all six prediction online servers was identified as a potential CTL epitope in Rv3117 and Rv3120 proteins. Finally, potential CTL epitopes were evaluated for inclusion in the CD4+ T cell epitopes of Rv3117 and Rv3120 proteins by NetMHCIIpan 2.2 server (available at http://www.cbs.dtu.dk/services/NetMHCII/), one of the most accurate prediction servers based on an artificial neural network algorithm [20]. Ten HLA-DR alleles (HLA-DRB1*0101, DRB1*0301, DRB1*0401, DRB1*0701, DRB1*0801, DRB1*1101, DRB1*1301, DRB1*1302, DRB1*1501, and DRB5*0101) were selected for the prediction due to their inclusions in an HLA-DR supertype [21].

Population coverage calculation
For the short-listed potential CTL epitopes, population coverage was calculated using tools from IEDB source available at http://tools.immuneepitope.org/tools/population/iedb_i nput [22].

Rv3117 and Rv3120 protein analysis
Rv3117 and Rv3120 protein sequences encoded by RD5 of M. tuberculosis were conserved in the M. tuberculosis complex and exhibited no more than 35% identity to human proteins. The Rv3117 and Rv3120 proteins had molecular weights of 31.0 KDa (isoelectric point 4.97) and 21.8 KDa (isoelectric point 5.10) respectively.

Epitope prediction and population coverage
A total of eight potential CTL epitopes in M. tuberculosis RD5-encoded Rv3117 and Rv3120 proteins were successfully predicted against three HLA class I supertypes ( Table 2). The number of predictive CTL epitopes in Rv3117 protein was more than that in Rv3120 protein. Five CTL epitopes were identified in Rv3117 protein, while only three CTL epitopes were found in Rv3120 protein. The scores from all six CTL epitope prediction tools showed that P99-107 (KLYGHEWVK) held the greatest potential in Rv3117 protein and was restricted to HLA-A3 supertype. On the other hand, p165-173 (KPGQRESEL) held the greatest potential in Rv3120 protein and was restricted to HLA-B7 supertype. There was no predictive CTL epitope restricted to HLA-A3 in Rv3120 protein. In addition, for Rv3117 protein, 9-mer CTL epitopes were overlapped with CD4+ T cell epitope (Supplementary Table 2). For example, p267-275 (SLVGAPIEL) was included in the CD4+ T cell epitope P261-277 (SWTEYGSLVGAPIELGS) restricted to HLA-DRB1*1501 and HLA-DRB1*0701 (Supplement Table 2). However, there was no predictive CTL epitope included in CD4+ T cell epitopes restricted to any HLA Class II alleles for Rv3120 protein.
Population coverage rates of eight CTL epitopes were shown in Figure 1. The combination of eight CTL epitope sequences from M. tuberculosis RD5-encoded Rv3117 and Rv3120 proteins provide a population coverage rate exceeding 75.0% for people in most areas, and 2.9% and 69.1% for people in Central America and Oceania, respectively. The cumulative worldwide population coverage of these epitopes was 87.2% ( Figure 1).

Discussion
A more effective vaccine for TB should generate both strong CD8+ T cell and CD4+ T cell responses, and a BCG vaccine with additional CTL epitopes may induce more balanced T cell response to enhance efficacy. In this study, we developed a combined immunoinformatics approach to predict eight RD5encoded HLA Class I-restricted CTL epitopes for TB in Rv3117 and Rv3120 proteins. Three of the eight epitopes were even presented CD4+ T cell epitopes. In addition, eight CTL epitopes in Rv3117 and Rv3120 proteins were found to have 87.2% worldwide population coverage, indicating that they may be sufficient candidates for a novel TB vaccine design. To the best of our knowledge, this is the first report on population coverage of CTL epitopes.
The number of predictive epitopes between these proteins varied, which might be because of the variation  in their sizes. The relatively large protein Rv3117 with a molecular mass of 31.0 KDa was found to have more CTL epitopes compared with the Rv3120 protein.
There are about 30 tools available for the prediction of CTL epitopes [23]; however, no tool has an overall 100.0% prediction sensitivity or specificity. The six prediction online servers in this study were applied independently or in integration to predict peptide HLA Class I-binding, proteasomal C-terminal cleavage, TAP transport efficiency, and the peptide half-time dissociation to HLA Class I molecules. Thus, theoretically, integrated application of six servers will improve the accuracy of in silico identification of CTL epitopes in Rv3117 and Rv3120 proteins. In fact, these online servers have been successfully used for the identification of T cell epitopes against different pathogens [24,25] and cancers [26]. Since NetCTL provides epitope prediction with 54-89% sensitivity and 94-99% specificity, we first employed NetCTL to predict CTL epitopes in Rv3117 and Rv3120 proteins at the default prediction value 0.75, which had 80% sensitivity and 97% specificity in a previous study [12]. Predictive CTL epitopes with high scores were further evaluated by SYFPEITHI, BIMAS, NetCTLPan, IEDB, NetMHC, and NetMHCPan online servers. Using this methodology, we aimed to predict CTL epitopes in Rv3117 and Rv3120 proteins that may bind to multiple HLA alleles promiscuously and serve as potential CTL epitopes for novel TB vaccine development. In fact, P118 (LIASNVAGV) in RD11encoded protein Rv3425, a CTL epitope, was predicted to be a possible epitope for lysing target cells using three of the above six online servers [27].
In this study, eight identified in silico CTL epitopes could cover most individuals worldwide, making them favorable candidates for novel TB vaccine design. However, further experiments such as peptidesensitized peripheral blood mononuclear cells and isolated CD8+ CTL responses in vitro and in vivo, are necessary to define these CTL epitopes.

Conclusions
Taken together, these eight in silico identified CTL epitopes would be used as potential candidates for the development of novel TB vaccine, based on CD8+ T cell immune responses. Further studies are warranted to investigate the immunogenicity and efficacy of a TB vaccine containing these epitopes in vitro and in vivo.