Comparative sequence , antigenic and phylogenetic analysis of avian influenza ( H 9 N 2 ) surface proteins isolated in Pakistan between 1999 and 2008

Introduction: Influenza A viruses possess a unique genomic structure which leads to genetic instability, especially in products of neuraminidase and hemagglutinin genes. These surface proteins play major roles in viral entry and release, and in the activation of the host immune system. Methodology: This study involved an in silico sequence, phylogenetic and antigenic analyses of hemagglutinin and neuraminidase proteins of avian influenza A (H9N2) strains that circulated in Pakistan’s poultry flocks from 1999 to 2008 and determined variations among these sequences at different levels. Results: Sequence and phylogenetic analysis revealed a large number of similar substitution mutations and close evolutionary relation among sequences of both proteins. Changes were observed in the N-glycosylation sites of both surface proteins, along with the appearance of a new glycosylation site in the neuraminidase sequence isolated in 2007. Epitopes for hemagglutinin remained conserved, whereas for neuraminidase, epitopes from older strains reappeared in present sequences. Conclusions: Because of the rapid mutating nature of avian influenza subtype H9N2, constant surveillance of annual sequence variations is important. Preventive measures and vaccine products can be evaluated by keeping track of changes that may lead to reassortment among different circulating strains in Pakistan’s commercial poultry flocks or in humans.


Introduction
Influenza viruses are members of the orthomyxoviradae family of viruses, separated into three types; A, B and C. Type A influenza viruses can cause infection in a diverse range of hosts including birds and mammals, whereas type B influenza viruses are only known to affect humans [1].Influenza type C viruses cause mild illness in humans and do not cause epidemics or pandemics.Both Influenza virus type A and B contain eight single-stranded RNA segments with negative polarity whereas type C, which comprises only seven singlestranded RNA segments that lack one of the envelope glycoproteins.These RNA segments encode ten proteins which include two surface glycoproteins, namely hemagglutinin (HA) and neuraminidase (NA), along with nucleoproteins (NP), three polymerase proteins (PA, PB1, PB2), and two matrix (M1, M2) and non-structural proteins (NS1, NS2) [2][3][4][5][6].Avian influenza is caused by type A influenza virus and, based on the activity of the two surface glycoproteins; HA and NA [2,5,6].There are 16 HA and 9 NA subtypes responsible for causing infection.HA plays an essential role in the early stages of infection and is responsible for the virus binding to its receptor, sialic acid, which is present on the host cell surface and promotes fusion of viral and endosomal membranes and eventually facilitates viral entry into the host cell [5].The NA surface glycoprotein of influenza viruses prevents virus aggregation by cleaving the α-ketosodic linkage between sialic acid and adjacent sugar residue.This results in destruction of receptors recognized by HA and facilitates the to and fro movement of the virus from the site of infection [7].
Avian influenza virus (AIV) H9N2 is known to infect poultry populations throughout the world, are derived from the Eurasian and the North American influenza virus genes [2,8].During 1994 to 1999, poultry populations in different countries, including Germany, Italy, Ireland, Iran, Pakistan, Saudi Arabia, South Africa and the USA were been found to be infected by H9N2 virus [9,10].The social behavior and migratory routines of avian species is the reason why birds are the major hosts to these viruses rather than mammals and humans.This is more significant in the case of domestic poultry where there is extensive direct and indirect contact among the flocks and other living species [11].
Human infections with H9N2 virus were first reported during 1999 when two children in Hong Kong with mild upper respiratory tract infections tested positive for H9N2 virus [17,21].In 2003, a five-year-old child in Hong Kong was again confirmed to have H9N2 virus infection that was of purely avian origin [16].Genetic studies of H9N2 viruses from Hong Kong live bird markets have shown the preferential binding of viruses to 2, 6linked sialic acid, which is a human-like receptor [18].These findings have proven the ability of H9N2 avian influenza viruses to emerge as new pandemic strains.
The aims of present study were: to carry out comparative sequence analyses to characterize and establish the phylogenetic relationships of Pakistani isolates with neighboring and Eurasian sublineages; to describe changes that occurred in avian (H9N2) viruses during 1999 -2008; and to identify and predict epitopes for antigenic changes, variations in glycosylation sites, and comparison of circulating viruses on a yearly basis.

Taxon sampling
For comparison among H9N2 viruses circulating from 1999 to 2008, we conducted a computational search of all available sequences of avian influenza Virus subtype H9N2 reported in poultry flocks in Pakistan.The search was performed with the National Centre for Biotechnology Information (NCBI) Flu Database [22] and sequences of NA and HA was retrieved and downloaded (Table 1).

Divergences in sequence patterns
The protein sequences of NA and HA from viral strains (Table 1) were aligned and compared with selected strains from neighboring countries and the Eurasian G1 sublineage reference strain by using multiple sequence alignment software ClustalW2 [23], to determine sequence similarities, variations, and phylogenetic relations.

Determination of H9N2 host binding and release factors
The attachment and release of viruses from their host cells exploit the phenomenon of glycosylation.To determine variations in the sites of viral attachment, we used an online server application, ScanProsite [24] to compare and identify Nglycosylation sites and determined whether there were any inter-strain differences or variations in previous years, which could have affected or changed the glycosylation sites.

Epitope analysis of HA and NA for antigenic variations
Epitope prediction was performed by the CTL (Cytotoxic T Lymphocyte) epitope prediction method [25].The amino acid sequences of HA and NA proteins from selected strains were used for this step and the predicted antigenic sites from each sequence were then compared for composition similarities and differences.

Phylogenetic analysis and tree construction
Phylogenetic analysis of HA and NA protein sequences of avian influenza H9N2 viruses isolated from Pakistan between 1999 and 2008 was performed using MEGA 4.0.2[26].The sequences were first aligned by using a multiple sequence alignment tool, CLUSTALW2.Unrooted phylogenetic trees were constructed by the minimum evolutionary method.Internal branching probabilities were determined by bootstrap analysis of 1,000 replicates and are indicated by percentage value on each branch.

Results and discussion
Sequence and phylogenetic analysis H9N2 viruses continue to circulate widely in domestic poultry in Asia [2,6,9,11].The ability of H9N2 to transmit among humans poses a pandemic potential, which could arise by mutations in H9N2 itself or by reassortment between avian and human influenza viruses.Chances of this to occur are high in developing countries such as Pakistan and among people who are in direct contact with poultry flocks, such as farmers.Computational sequence analysis has the potential to be an effective approach towards vaccine development and measures against potential pandemics before they became a threat to poultry (H5N1) or human such as the pandemic 2009 Swine flu (H1N1).Therefore, the present study was performed to verify the extent of reassortment and antigenic shifts and drifts in surface proteins of the H9N2 avian virus isolated in Pakistan.
By studying the evolution of sequences, we tried to highlight how the selective pressure on the viral proteins changes with time, leading to changes in antigenicity and host specificity.A total of 13 protein sequences for each protein, NA and HA were  retrieved from the NCBI Flu Database by restricting the search query to Pakistan (Table 1).Sequence comparison was performed at two stages.First, fulllength viral protein sequences of HA and NA isolated from poultry flocks in Pakistan (1999-2008) were compared with the Eurasian G1 sublineage reference strain (A/Quail/Hong Kong/G1/97), as shown in Tables 2 and 3 respectively.Sequence comparison showed the numbers of substitution mutations in both proteins.It has been observed that the H9N2 strain is mutating continuously, and the strains isolated during 2008 had many new mutations, which have not been found in previously reported entries in Pakistan.Second, amino acid variations in the HA receptor binding pocket and NA hemadsorbing site were speculated as shown in Table 4. Analysis of the HA receptor binding pocket in particular showed that, except for the isolate from 1999, all H9N2 isolates from Pakistan contained Leucine (L) at position 234, which has a preference for 2, 6-linked sialic acid (human receptors) instead of Glutamine (Q), which has a preference for 2, 3-linked sialic acid avian receptors.The role of this substitution mutation at position 234 (226; H3 numbering) has also been reported in other in vivo studies where avian H9N2 viruses showed replication with 100-fold higher peak titers in cultured human cells [27][28][29][30].
The evolutionary relationship of avian H9N2 virus HA and NA protein sequences was determined by analyzing selected isolates from Pakistan (Table 1) and representative H9N2 viruses from neighboring countries including China, India, Iran, Japan and Dubai, along with the established Eurasian H9N2 G1 lineage represented by its prototype strain (A/Hong Kong/G1/97).The unrooted phylogenetic tree of HA showed two distinct groups (Figure 1).G1 lineage reference virus and strain isolated from poultry flocks in Pakistan in 1999 clustered in one group, along with the strains from Dubai, Japan, and China.Clustering of all H9N2 Pakistani and neighboring viruses in one group postulates the emergence of a new lineage in the region of the subcontinent and Iran.This prediction could be related to another study in which it was postulated that a United Arab Emirates lineage of H9N2 viruses may have emerged [12].In the case of NA sequences (Figure 2), the phylogenetic tree showed greater divergence.All Pakistani isolates from 2005 to 2008 formed one cluster, while the G1strain, viruses from Dubai, Japan, China, Iran and a single isolate from Pakistan in 1999 formed another cluster.It has been observed that the virus isolated from Pakistan in 1999 is closer to the G1 strain (97% identity) in comparison to other sequences isolated in later years.

Glycosylation and antigenic variations
Glycosylation of HA and NA represent the characteristic of the pathogen to escape from the host defense through co-evolution with the host and identification of the host receptor [31].Glycosylation sites were predicted by ScanProsite for HA and NA viral proteins isolated from Pakistan only (1999-2008) as shown in Tables 5 and 6  PGSs for NA protein sequences also showed amino substitutions and variations.Eight glycosylation sites were obtained for each sequence except for one isolate from 2007, which possessed a new glycosylation site NESG at position 342-345.Two out of three strains from 2008 also showed a new modification at N-glycosylation site, 61-64 (NITK).Comparison with previously reported entries shows that these sites and positions are conserved, but in 2008 isolates, variations occurred with the substitution of a single amino acid (E64K; negatively charged polar to positively charged polar).It is important to note that all the above-mentioned substitutions in HA and NA protein sequences fell within the favorable region of mutations, which could still result in stable protein expression by retaining the hydrophobic / hydrophilic interactions and the 3D conformation of glycosylation sites.Therefore, detailed study of viral protein 3D structure, hydrophobicity and in vivo analysis would be useful for understanding possible outcomes of such sequential changes on the activity of viruses.
Results from CTLPred epitope prediction showed that HA protein sequences isolated from Pakistan tended to have amino acid Valine (V) at position 113.Nevertheless, in year 2007 one of the isolated strains had Isoleucine (I) at the same position.Another change was seen at position 138 where all previously isolated viruses had Threonine (T) at this position except entries from 2008 and two strains that were reported in 2006 (Table 7).It was also observed that in a strain from 1999 that an epitope was present at

Conclusion
High variation in amino acid sequences and reassortment phenomena of Influenza A viruses propose that although the H9N2 virus infection currently is not severe, it has a further pandemic potential.Co-infections during bouts of influenza might play a crucial role in the evolution of H9N2 in birds possessing highly virulent H5N1 virus and may cause the development of resistant viruses.Apart from viral factors, host factors might play an important role in the onset of new subtypes, for example, poultry flocks infected with H5N1 or H7N3, or workers infected with or previously exposed to H5N1 or 2009 pandemic H1N1 subtype.Collectively, our analysis highlights the need for focused studies on the evolution of avian influenza A (H9N2) by developing a continuous surveillance system for the effective management of viral Fig. 1.Phylogenetic relationship of HA protein sequences of AIV (H9N2) Fig. 2. Phylogenetic relationship of NA protein sequences of AIV (H9N2) On the other hand, H9N2 avian virus isolates from Pakistan in 2005 -2008 clustered together in one group along with H9N2 avian viruses isolated from neighboring countries Iran and India from 2003 to 2007.
respectively.A total of seven predicted glycosylation sites (PGS)(29-32; 105-108; 141-144; 298-301; 305-308; 492- 495; 551-554)  were obtained for each protein sequence of HA isolated during 2005-2008 except for a viral strain from 1999 which had an additional glycosylation site (218-221).This PGS was not found in other Pakistani isolates during sequence comparison and may represent the loss of a glycosylation site leading toward selected adaptation of avian H9N2 within poultry flocks.Results from ScanProsite revealed that amino acid variation E552G within the 2005 isolates resulted in alteration of the N-glycosylation site at positions 551-554 in all the later sequences.Two amino acid variations at position E32D (Glutamate to Aspartate) and Y144H (Tyrosine to Histidine) were found in all the 2008 isolates along with altered glycosylation sites, NSTD to NSTE(29)(30)(31)(32) and NVTY to NVTH (141-144) respectively.

Table 1 .
GenBank accession numbers of avian influenza A (H9N2) viral proteins, hemagglutinin and neuraminidase isolated from different regions of Pakistan between 1999 and 2008

Table 2 .
Comparison of variations in the HA protein sequence from A/Hong Kong/G1/97 (G1 lineage) and AIV (H9N2) isolates from poultry flocks in Pakistan from 1999 to 2008.Red colour indicates substitution in sequences.HK, Hong

Table 3 :
Comparison of variations in the NA protein sequence from A/Hong Kong/G1/97 (G1 lineage) and AIV (H9N2) isolates from Pakistan's poultry flocks from 1999 to 2008.Red colour shows substitution into sequences.

Table 4 :
Amino acid residues at the receptor binding sites of HA, and HB of NA protein from avian H9N2 viruses isolated from Pakistan, China, Dubai, India, Iran, and Japan.Residues differences and similarities as compared to G1 lineage reference (A/Quail/Hong Kong/G1/97) are indicated.Amino acid Leucine (L) at position 234 has preference for 2, 6-linked sialic acid (human receptors).A, Avian; Ck, Chicken; Qa, Quail; Pa, Partridge; ST, Shantou; Wc, Watercoot.

Table 6 .
Comparison of predicted N-glycosylation sites within amino acid sequences of NA protein isolated between 1999 and 2008 (H9N2).Sequences with same glycosylation sites are grouped together.A, Avian; Ck, Chicken; PK, Pakistan.

Table 5 .
Comparison of predicted N-glycosylation sites (PGS) within amino acid sequences of HA proteins isolated between 1999 and 2008 (H9N2).Red colour indicates the differences between N-glycosylation sites of isolates.A, Avian; Ck, Chicken; PK, Pakistan.epidemics, particularly in developing countries such as Pakistan.There is a strong possibility that many of the cases are still unreported and no new data is yet available for 2009 -2010.Variations in the sequence, glycosylation and epitopes of NA are needed, which might be an indication of virus activity leading to complete replacement of NA segments.Computational and experimental studies of currently reported NA segments from Influenza A viruses, especially NA segments from H5N1, H1N1 and H7N3, against N2 will be useful in determining possible reassortment or exchange of segments.Our findings demonstrate the instability and the potential of AIV (H9N2) circulating in poultry flocks in Pakistan to affect humans and can be used as a reference for further studies, which may involve in vivo studies and detailed 3D structure and function analysis of surface proteins thus facilitating the development of better treatment and prevention approaches.