Genetic structure of apical membrane antigen-1 in Plasmodium falciparum isolates from Pakistan
Article information
Abstract
Plasmodium falciparum apical membrane antigen-1 (PfAMA-1) is a major candidate for the blood-stage malaria vaccine. Genetic polymorphisms of global pfama-1suggest that the genetic diversity of the gene can disturb effective vaccine development targeting this antigen. This study was conducted to explore the genetic diversity and gene structure of pfama-1 among P. falciparum isolates collected in the Khyber Pakhtunkhwa (KP) province of Pakistan. A total of 19 full-length pfama-1 sequences were obtained from KP-Pakistan P. falciparum isolates, and genetic polymorphism and natural selection were investigated. KP-Pakistan pfama-1 exhibited genetic diversity, wherein 58 amino acid changes were identified, most of which were located in ectodomains, and domains I, II, and III. The amino acid changes commonly found in the ectodomain of global pfama-1 were also detected in KP-Pakistan pfama-1. Interestingly, 13 novel amino acid changes not reported in the global population were identified in KP-Pakistan pfama-1. KP-Pakistan pfama-1 shared similar levels of genetic diversity with global pfama-1. Evidence of natural selection and recombination events were also detected in KP-Pakistan pfama-1.
Introduction
Malaria caused by Plasmodium species is a major infectious disease in humans, causing a significant global public health burden. Irrespective of the massive control efforts to eliminate malaria over the past years, malaria is still prevalent in several endemic areas. The World Health Organization reported 247 million malaria cases and 619,000 deaths in 2021 [1]. Malaria control and elimination efforts have been challenged due to the spread of antimalarial drug-resistant parasites and insecticide-resistant Anopheles mosquitoes. The lack of an effective vaccine creates a major obstacle in malaria control, indicating the indispensable need to develop an effective vaccine. Several plasmodial proteins such as circumsporozoite protein (CSP), Duffy binding protein, merozoite surface proteins, apical membrane antigen-1 (AMA-1), and thrombospondin-related anonymous protein have been considered promising vaccine candidates because of their antigenic properties and expression in either preerythrocytic or erythrocytic stages of malaria parasites [2,3]. However, the genetic polymorphisms in these antigens among clinical isolates are significant hurdles in the development of effective malaria vaccines [4,5].
AMA-1 is a type I integral membrane protein commonly expressed on the surfaces of merozoites and sporozoites and plays important roles in parasite invasion into host cells [6,7]. It comprises a signal sequence, a cysteine-rich ectodomain, a conserved cytoplasmic region, and a transmembrane region [8]. The ectodomain of AMA-1 is further subdivided into 3 domains, viz., domains I (DI), II (DII), and III (DIII). The ectodomain is highly immunogenic and induces natural immune responses in individuals infected with P. falciparum [9,10]. Antibodies against AMA-1 prevent the invasion of erythrocytes by malaria parasites and establish a protective immune response [11,12]. This information suggests that the antigen is a feasible vaccine candidate. Similar to that in other major surface antigens of malaria parasites, substantial levels of genetic polymorphisms of ama-1 have also been recognized in wild parasite populations; however, AMA-1 has been considered less variable than other potential vaccine candidate antigens such as CSP and merozoite surface proteins, supporting the notion that it is a promising blood-stage vaccine candidate [10,13,14]. Nevertheless, genetic diversities observed in global ama-1 have emphasized the importance of continuous monitoring of genetic variations of ama-1 among global malaria parasites for designing an effective vaccine targeting this antigen [15].
Pakistan is a malaria-endemic country, with reports of millions of cases per year [16]. P. falciparum and P. vivax are the prevalent species, accounting for 32% and 67% of malaria cases, respectively. Khyber Pakhtunkhwa (KP) and Balochistan provinces have been the most critical malaria hot spots in the country [16]. A study on the genetic analysis of the hypervariable DI of P. falciparum AMA-1 (pfama-1) in Pakistan P. falciparum isolated in Hazara division, KP, Pakistan, had been performed [17]. However, the sequences did not cover full-length gene sequences, rendering only limited information on the genetic nature of Pakistan pfama-1. This study was conducted to analyze the genetic nature of full-length pfama-1 among P. falciparum isolates collected from the KP province of Pakistan.
Materials and Methods
Ethics statement
The study protocol was approved by the Ethical Review Committee of Abdul Wali Khan University Mardan under the letter of AWKUM/ERC/578. Consent was obtained from all participants before conducting the study.
Parasite samples and DNA purification
Blood samples were collected from 19 patients infected with P. falciparum, which was confirmed by microscopy and rapid diagnostic tests in different hospitals and private sector laboratories in KP, Pakistan (Supplementary Fig. S1). During the 2 malaria seasons from March to May and August to November 2019, the area had an average annual rainfall of 384 mm. The mean temperature in the region was 20°C–40°C. The blood samples collected before treatment were spotted on filters, air-dried, and stored in individually sealed plastic bags at ambient temperature until use. Genomic DNA was extracted from the spotted blood samples using a QIAamp blood mini kit (Qiagen, Redwood City, CA, USA) according to the manufacturer’s instructions and stored at −20°C.
PCR amplification and DNA sequencing
The full-length pfama-1 was amplified by PCR using specific primer sets and amplification conditions described previously [18,19]. The PCR products were analyzed on a 1.5% agarose gel, purified, and cloned into the T&A vector (Real Biotech, Banqiao City, Taiwan). The ligation mixture was transformed into Escherichia coli DH5α competent cells, and positive clones were selected by colony PCR. The nucleotide sequence of the cloned insert was analyzed by automatic DNA sequencing with M13 forward and M13 reverse primers (Genotech, Daejeon, Korea). Sequencing was also conducted using 2 additional internal primers (5′-CAGGGAAATGTCCAGTATTTGGTA-3′ and 5′-TTCCATCGACCCATAATCCG-3′) to obtain clear sequences for the central part of pfama-1 [18]. To ensure accuracy, sequencing of at least 2 different clones from each isolate was performed. Raw data were filtered for quality assessment using the DNASTAR Lasergene software (DNASTAR, Madison, WI, USA). The 19 KP-Pakistan pfama-1 nucleotide sequences were deposited in GenBank under accession numbers OM628702–OM628720.
Polymorphism analysis of pfama-1
The DNA sequence generated in this study was analyzed in comparison with a reference gene of pfama-1 from the P. falciparum 3D7 strain (GenBank Accession No.: U65407). The following global pfama-1 sequences deposited in GenBank were also included for analysis: Thailand (AB715735–AB715814), Myanmar (KU893276–KU893333), Philippines (AB715815–AB715869), Vietnam (MW938322–MW938452), Vanuatu (AB716010–AB716094), Solomon Islands (SI; AB715960–AB716009), Papua New Guinea (PNG; AB715870–AB715959), Ghana (AB715698–AB715734), and Tanzania (AB715636–AB715697) (Supplementary Table S1). Comparative sequence analyses were conducted to identify polymorphic loci using the MEGA6 program [20].
Statistical and population genetic analyses
The DnaSP v6.12 software package [21] was used to estimate parsimony informative sites, total number of mutations, pairwise nucleotide diversity (π), haplotype diversity, segregating sites, haplotypes, recombination between adjacent nucleotides per generation, and the minimum number of recombination events (Rm). Linkage disequilibrium was estimated between the various polymorphic sites based on the recombination events (R2) index using the DnaSP v6.12 software package [21]. Tajima’s D, Fu and Li’s D, and F indices were calculated by a sliding window method using the DnaSP v6.12 software package [21]. Population genetics, including pairwise fixation index (FST) and haplotype frequencies, were evaluated using the analysis of molecular variance. The significance of the analysis of molecular variance was estimated by 1,000 per mutation, and the nucleotide diversity based on Nei’s net distance was computed using Arlequin v3.5 [22]. The haplotype network plot was generated using the PopArt software [23].
Assessments of natural selection signatures
The global pfama-1 sequences were aligned and filtered using the GUIDANCE server with the confidence score threshold (i.e., best score ~1) [24]. This low-quality alignment filtration is essential for the accuracy of the natural selection analysis [25]. The good-quality, reliable alignment was subjected to the Datamonkey server of the HYPHY package for the identification of selected loci with a default P value [26–28]. Individual sites underlying positive selection were inferred using the following 3 algorithms: fixed effects likelihood, internal branches fixed effects likelihood, and mixed effects model of evolution [28].
Results
Genetic polymorphic features of KP-Pakistan pfama-1
The 19 full-length pfama-1 sequences were successfully amplified from the 19 KP-Pakistan P. falciparum isolates. The gene length was 1,869 bp, and no size polymorphism was identified in the sequences. Comparative analysis of the 19 KP-Pakistan pfama-1 sequences with the 3D7 pfama-1 reference sequence (U65407) revealed genetic polymorphisms in KP-Pakistan sequences. Across the sequences, 69 single nucleotide polymorphisms (SNPs) were identified, among which 58 were nonsynonymous SNPs (nsSNPs), resulting in amino acid substitutions at 58 positions and 14 distinct haplotypes. Most amino acid changes were found in DI (n=24), DII (n=7), and DIII (n=6) (Fig. 1; Supplementary Table S2). A tetramorphic change (E197D/G/H) and 3 trimorphic changes (E187N/K, H200R/D, and K243E/N) were detected in DI. A trimorphic amino acid change (R503N/H) was identified in DIII. The other 53 amino acid changes throughout the sequences were dimorphic. We also comparatively analyzed KP-Pakistan pfama-1 with previously reported global pfama-1 and identified 113 nsSNPs causing amino acid substitutions (92 dimorphic, 17 trimorphic, 2 tetramorphic, and 2 pentamorphic) in global pfama-1, including KP-Pakistan pfama-1. Interestingly, 13 amino acid changes (W28D, H30D, R45S, K49T, Q57L, S66P, I97N, and M114V in the signal and prosequence region; D333E and N401S in DII, I454T in DIII; and K570K and P614S in the transmembrane and cytoplasmic domain) detected in KP-Pakistan pfama-1 were novel that were not reported previously, although their frequencies were relatively low (Fig. 1; Supplementary Table S2). Meanwhile, the most amino acid changes in DI, DII, and DIII were commonly detected in global pfama-1 (Supplementary Table S2). Global pfama-1 exhibited similar patterns of amino acid changes, but the frequency of each amino acid change differed by country.
Nucleotide diversity and natural selection
We identified 69 segregating sites and 69 mutations in KP-Pakistan pfama-1 isolates. Haplotype diversity and nucleotide diversity (π) were 0.982±0.022 and 0.0109, respectively (Table 1). Tajima’s D, Fu and Li’s D, and Fu and Li’s F values were positive, suggesting that positive natural selection affected KP-Pakistan pfama-1. The overall nucleotide diversity (π) across global pfama-1 ranged from 0.0043±0.0006 (Vietnam) to 0.0141±0.0003 (Ghana), suggesting mild levels of genetic diversity in global pfama-1 populations. The π in KP-Pakistan pfama-1 was lower than or similar to that in Asia and Pacific pfama-1 populations but lower than that in Africa pfama-1 populations (Table 1). All pfama-1 sequences, except Vietnam pfama-1 sequences, demonstrated positive Tajima’s D values, indicating the role of balancing selection in global pfama-1 (Table 1). The positive values of both Fu and Li’s D and F also suggested evidence for the role of balancing selection in global pfama-1, except Vietnam pfama-1. A sliding window plot of π suggested that global pfama-1 shared highly similar patterns of π across the sequences (Fig. 2). The highest peak of π was commonly identified at cluster 1 of the loop I (C1-L) region in DI of all global isolates. Similar profiles of Tajima’s D across the gene were also identified in global pfama-1, except Vietnam pfama-1 (Fig. 2). The FST analysis between KP-Pakistan and global pfama-1 populations indicated genetic differentiation of KP-Pakistan isolates. KP-Pakistan pfama-1 showed the lowest FST values against pfama-1 from the Philippines, PNG and SI, but it showed higher FST values against pfama-1 from Myanmar, Thailand, and Vietnam (Table 2).
Recombination and linkage disequilibrium
The Rm of KP-Pakistan pfama-1 was estimated to be 9. The values between adjacent sites (Ra) and per gene (Rb) were 0.0238 and 44.4, respectively (Table 3). Possible recombination events were also identified in global pfama-1. The Rm values of Africa pfama-1 were greater than those of Asia and Pacific pfama-1. The increasing distance across the gene with the decreased linkage disequilibrium index (R2) in global pfama-1 suggests that recombination could be a major force contributing to the genetic diversity of pfama-1 (Supplementary Fig. S2).
Haplotype network analysis
The haplotype network analysis of 667 global pfama-1 and 3D7 reference sequences revealed a complicated network of 260 distinct haplotypes (Fig. 3). Most haplotypes were singletons. Haplotype 73 (H73) was the most predominant with a frequency of 10.6% and was shared by pfama-1 from different countries, including Myanmar, Thailand, the Philippines, Vanuatu, PNG, and SI. Haplotype 88 (H88) was the second major haplotype with a frequency of 10.2% and was shared by pfama-1 from Vanuatu, SI, PNG, and the Philippines. KP-Pakistan pfama-1 constructed 16 haplotypes that were scattered in the network. Only one sequence from Ghana shared a haplotype (H1) with 3D7.
Assessments of natural selection signatures
We analyzed the pattern of natural selection signatures across global pfama-1. The episodic positive selection analysis using the mixed effects model of evolution method suggested that 18 amino acid changes were under natural selection (P≤0.05) (Table 4). The pervasive positive selection signatures analyzed using fixed effects likelihood and internal branches fixed effects likelihood methods suggested that 20 amino acid changes were under natural selection (Table 4). All amino acid changes predicted to be under positive natural selection matched the amino acid changes commonly detected in global pfama-1.
Discussion
The complex biological properties of parasites and vectors and the large genetic and antigenic variations in prospective vaccine candidate antigens have hindered the development of an effective malaria vaccine despite extensive attempts over the past few decades. Although the RTS, S/A01, the first malaria vaccine endorsed by the World Health Organization for routine immunization of children in transmission areas, has developed based on CSP [29], there are controversies on its efficacy due to modest and short-lived protection efficiency and insufficient effectiveness against parasites with different alleles of CSP [30,31]. To address these limitations, a multistage vaccine combining different candidate antigens such as Duffy binding protein and AMA-1 was proposed [32,33]. The biological significance and immunological functions of PfAMA-1 suggest that this antigen is an attractive vaccine candidate. Nonetheless, the genetic polymorphisms observed in global pfama-1 also emphasize the importance of continuous surveillance of the genetic diversity of the gene in the global parasite population [18,19].
Similar to that in pfama-1 from other geographical areas [18,19], KP-Pakistan pfama-1 also exhibited genetic polymorphisms causing amino acid changes. The most common amino acid changes in global pfama-1, especially those in DI, DII, and DIII, were also identified in KP-Pakistan pfama-1. These common amino acid changes observed in DI and DIII of global pfama-1 matched with B-cell epitopes 3, 4, 5, 9, and 10, supporting the notion that these are major regions under natural selection and contribute to host immune escape [18,19,34]. Meanwhile, 13 novel amino acid changes not reported in global pfama-1 were identified in KP-Pakistan pfama-1, most of which were distributed in the signal and prosequence region. Among these amino acid changes, Q57L, S66P, I97N, I454T, and K570R were located in the intrinsically unstructured/disordered region, and D333E was mapped in red blood cell-binding sites. The amino acid changes I97N, M114V, D333E, N401S, and I454T were also located in the intrinsically unstructured/disordered region or B-cell epitope. These findings suggest the potential roles of these amino acids in the modulation of host immune responses; however, further studies would be required to clarify the biological significance of these amino acid changes.
KP-Pakistan pfama-1 demonstrated similar patterns of genetic diversity and natural selection with those of pfama-1 from other geographical regions. The π value for the global pfama-1 population varied, with the π values of Asia and Pacific pfama-1 populations being relatively lower than that of the Africa pfama-1 population. Although the π value of global pfama-1 differed by country, similar patterns of π across pfama-1 were identified in global pfama-1, including KP-Pakistan pfama-1. The sliding window plot revealed that the high levels of π were similarly observed in DI and DIII of global pfama-1, supporting that the domains are the central regions contributing to the genetic heterogeneity of pfama-1 [18,19]. The positive values of Tajima’s D, Fu and Li’s D, and F of KP-Pakistan pfama-1 suggested that the gene was under balancing selection. Similar patterns of natural selection were also detected in global pfama-1, except Vietnam pfama-1 [18,19,35]. Meiotic recombination is also a driving force generating the genetic diversity of pfama-1 [18,19]. Potential recombination events were also detected in KP-Pakistan pfama-1, suggesting that interallelic recombination is a force causing the genetic diversity of KP-Pakistan pfama-1. FST is a measure of population substructure and is the most common statistic to analyze the overall genetic differentiation among populations as follows: no differentiation (0), low genetic differentiation (0–0.05), moderate differentiation (0.05–0.15), or high differentiation (0.15–0.25) [36]. Global pfama-1 displayed low or moderate levels of FST between and among populations originating from different continents or countries. The only exception was Vietnam pfama-1 [19]. Although global pfama-1 demonstrated a complicated haplotype diversity with 260 distinct haplotypes, low or moderate levels of FST values between and among populations suggest that global pfama-1 has a relatively stable genetic structure in the global population.
This study has some drawbacks due to the limited number of global pfama-1 sequences obtained from parasites collected at different time points, which could not reflect the genetic structure and evolutionary aspect of the current global pfama-1 population. Further in-depth analysis using a greater significant number of global P. falciparum populations is required to understand the genetic structure of pfama-1.
Supplementary Information
Notes
Author contributions
Conceptualization: Na BK, Afridi SG
Data curation: Zaib K, Khan A, Khan MU, Ullah I
Formal analysis: Zaib K, Khan A, Khan MU, Ullah I, Võ TC, Kang JM, Lê HG, Na BK, Afridi SG
Funding acquisition: Na BK
Investigation: Khan A, Na BK, Afridi SG
Methodology: Zaib K, Khan A, Khan MU, Ullah I, Võ TC, Kang JM
Project administration: Na BK, Afridi SG
Resources: Afridi SG
Software: Zaib K, Khan A, Khan MU, Ullah I, Võ TC, Kang JM, Lê HG, Na BK, Afridi SG
Supervision: Na BK, Afridi SG
Writing – original draft: Zaib K, Khan A, Afridi SG
Writing – review & editing: Khan A, Võ TC, Kang JM, Lê HG, Na BK, Afridi SG
The authors declare no conflicts of interests.
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant (NRF-2024M3A9H5043141).