Codon Usage Patterns of Tyrosinase Genes in Clonorchis sinensis
Article information
Abstract
Codon usage bias (CUB) is a unique property of genomes and has contributed to the better understanding of the molecular features and the evolution processes of particular gene. In this study, genetic indices associated with CUB, including relative synonymous codon usage and effective numbers of codons, as well as the nucleotide composition, were investigated in the Clonorchis sinensis tyrosinase genes and their platyhelminth orthologs, which play an important role in the eggshell formation. The relative synonymous codon usage patterns substantially differed among tyrosinase genes examined. In a neutrality analysis, the correlation between GC12 and GC3 was statistically significant, and the regression line had a relatively gradual slope (0.218). NC-plot, i.e., GC3 vs effective number of codons (ENC), showed that most of the tyrosinase genes were below the expected curve. The codon adaptation index (CAI) values of the platyhelminth tyrosinases had a narrow distribution between 0.685/0.714 and 0.797/0.837, and were negatively correlated with their ENC. Taken together, these results suggested that CUB in the tyrosinase genes seemed to be basically governed by selection pressures rather than mutational bias, although the latter factor provided an additional force in shaping CUB of the C. sinensis and Opisthorchis viverrini genes. It was also apparent that the equilibrium point between selection pressure and mutational bias is much more inclined to selection pressure in highly expressed C. sinensis genes, than in poorly expressed genes.
INTRODUCTION
Tyrosinases (EC 1.14.18.1) comprise the type-3 copper protein family, which contain an active di-copper center, together with catecholoxidases [1,2]. The enzymes mediate the biochemical conversion of tyrosine into dihydrophenylalnine (DOPA; monophenol oxidase activity) and DOPA into DOPA quinone (diphenol oxidase activity) [3,4]. Tyrosinases are ubiquitously distributed across taxa ranging from bacteria to mammals and engaged in diverse biological events, such as pigmentations, innate immunity, sclerotization, and wound healing, depending on their primary structures [2]. In parasitic trematodes, tyrosinases are produced in the mature vitellocytes and packed within secreting vacuoles termed vitelline droplets, together with eggshell precursor proteins [5]. Following secretion in the ootype, the tyrosinases convert tyrosine residues on the eggshell proteins into DOPA quinones, which are then cross linked to other amino acids, such as tyrosine and lysine, on adjacent proteins to make the sclerotized eggshell [6].
Clonorchis sinensis is the causative agent of clonorchiasis in humans, of which clinical manifestations are represented by abdominal pain, mechanical obstruction of the hepatobiliary ducts, cholangiectasis, and biliary stones [7]. Chronic infection of C. sinensis also appears to lead to high incidences of cholangiocarcinoma [8,9]. Human infection occurs by eating raw or undercooked freshwater fish containing C. sinensis metacercariae. Clonorchiasis is highly endemic in Asian countries, such as China, Vietnam, and Korea, where it causes great socio-economic and public health burdens [7]. In a previous report [10], the structural and biochemical properties of multiple tyrosinases, as well as their spatiotemporal expression patterns, have been investigated in the liver fluke. Expression of these vitellocyte-specific tyrosinase genes was tightly related to the sexual maturation of C. sinensis. However, the relative expression levels of these tyrosinase paralogs were substantially different, although any molecular evidence demonstrating functional diversification could not be detected among them [10].
In general, gene expression level is regulated by multiple factors, such as promoter activity [11], open reading frame (ORF) length and amino acid composition [12], intron length [13], and codon usage pattern [14]. Unequal usage of synonymous codons encoding the same amino acid during translation of a gene, which is known as codon usage bias (CUB), is a commonly observed phenomenon in a wide variety of organisms. The codon usage pattern is a unique property of a genome, and it may vary between genomes and genes from the same genome [15,16]. Many genomic factors, such as gene length and GC-content, have been known to be highly associated with CUB [15]. Comprehensive studies with various organisms further demonstrated that multiple factors, including mutation pressure, natural/translational selection, secondary structure of protein, hydropathic character of protein, and numbers of tRNA gene copies, influence on the codon usage patterns [15,17]. It is also likely that highly expressed genes exhibit significant bias in their codon usage toward those with abundant tRNA gene copy numbers. Therefore, the codon usage patterns of genes may be applied in the prediction of expression level of relevant genes [16,18].
In this study, the CUB of C. sinensis tyrosinase genes and their trematode orthologs was analyzed to understand the process of molecular evolution and the probable factors influencing the codon usage profile, which is ultimately related to the expression level of the corresponding gene.
MATERIALS AND METHODS
Sequence data
The genomic and proteomic databases of platyhelminths, of which the whole genome drafts were available, were selected for the screening of tyrosinase genes as follows: C. sinensis in the GenBank (http://www.ncbi.nlm.nih.gov/), Opisthorchis viverrini in the GenBank, Schistosoma japonicum in the GenBank and the Chinese National Human Genome Center at Shanghai (http://lifecenter.sgst.cn/schistosoma/cn/schistosomaCnIndexPage.do), and Schistosoma mansoni in the GenBank and the Sanger Institute. Tyrosinase sequences in the respective databases were identified using the tBLASTn program with the amino acid sequences of C. sinensis tyrosinases (AGG11797-AGG11800 [10]). The genomic and expressed sequence tag (EST) sequences of Schmidtea mediterranea in the GenBank and the SmedGD (http://smedgd.neuro.utah.edu/), as well as the GenBank databases of other platyhelminths, were also targeted during the BLAST examinations. The retrieved sequences were finally filtered to eliminate redundant sequences by comparing chromosomal locations of the respective genes. Otherwise, 95% similarity cutoff was taken for speciation of paralogous genes at the amino acid sequence level. The translated amino acid sequences of matched ESTs were predicted using the ORF Finder (http://www.ncbi.nlm.nih.gov/). The similarity pattern and specific hidden Markov model (HMM) profiles were examined from the theoretical amino acid sequences using the BLASTp (E-value cutoff, 1e-5) and InterProScan (version 5.0; http://www.ebi.ac.uk/Tools/pfa/iprscan5/), respectively.
Determination of base composition, codon adaptation index, and effective number of codons
The frequencies of nucleotide G+C at the first (GC1), second (GC2), and third (GC3) positions of tyrosinase gene codons were calculated using CAIcal (http://genomes.urv.cat/CAIcal/) to quantify the extent of base composition bias. Codon adaptation index (CAI), which reflected the transcriptional activity of relevant gene [19], and effective number of codons (ENC), which was used to quantify the codon usage bias of a gene unconcerned to its length [20], were similarly determined with the program. The CAI values ranged from 0.0 to 1.0, and the higher value meant a likely stronger bias in codon usage and a potential higher expression level [21]. During estimation of the CAI values, genes of Caenorhabditis elegans and Escherichia coli were used as reference groups. The ENC values were between 20 indicating that the gene used only 1 synonymous codon for the corresponding amino acid and 61 indicating that the gene used all the synonymous codons equally for the corresponding amino acid [20].
Analysis of relative synonymous codon usage patterns
Relative synonymous codon usage (RSCU) was defined as the observed frequency of a codon divided by the expected frequency, if all synonymous codons were used equally to encode any particular amino acid [22]. Therefore, a codon with a RSCU value greater than 1 was used more frequently than expected, whereas that with a value smaller than 1 was used less frequently than expected. The RSCU values of tyrosinase gene codons were calculated using CodonW (http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::CodonW). The RSCU values were further applied either in the correspondence analysis using identical program, which partitions the variation along 59 orthogonal axes.
RESULTS
Tyrosinase genes and their nucleotide contents
A total of 5 tyrosinase paralogs were identified in the genomes of C. sinensis and O. viverrini, while those of Schistosoma spp. encoded only 2 paralogous tyrosinase genes. The free-living turbellarian, S. mediterranea, possessed 6 genes homologous to the trematode tyrosinases. In addition to these genes in platyhelminths, of which whole genomic sequence information was available, a single tyrosinase ortholog was further retrieved from the GenBank database of other platyhelminths, such as Paragonimus westermani and Spirometra erinaceieuropaei, as well as Capitella teleta (Annelida). The Capitella tyrosinase information was taken to use as a outgroup gene (Table 1). The similarity patterns of these tyrosinases toward the γ-subclass of type-3 copper protein family [2] and the functional domains conserved in the subclass members, such as epidermal growth factor-like domain and di-copper center [10], were verified by sequence analyses using the BLAST and InterProScan programs (data not shown).
G+C content of tyrosinase genes
The genome-wide G+C content (GC) varied from 28.7 (S. mediterranea) to 44.8 (C. sinensis) across platyhelminth genomes [23,24]. The values in the whole coding DNA sequences (CDSs) were found to be higher than the genome-wide values (36.2–48.0; Table 1). GC values in tyrosinase genes ranging from 31.1±0.21 (S. mansoni) to 47.2±1.41 (O. viverrini) were approximate to the values of CDSs in the respective genomes. The tyrosinase genes of C. sinensis (46.9±1.54) and other schistosomes (32.5±2.55 in S. japonicum and 31.7±0.42 in S. haematobium) showed GC contents similar to those of the O. viverrini and S. mansoni genes, respectively. The S. mediterranea genes exhibited an average GC content of 35.6. The GC contents of P. westermani, S. erinaceieuropaei, and C. teleta genes were estimated as 46.9, 49.8, and 53.6, respectively.
The GC1, GC2, and GC3 were also quite different among the tyrosinase genes. In C. sinensis and O. viverrini genes, the average values of GC1 and GC3 were similar to or slightly higher than that in total codon positions (51.0/47.8 vs 46.6 in C. sinensis and 51.0/48.3 vs 47.2 in O. viverrini), whereas the values were significantly reduced in the second positions (41.9 and 42.4 in C. sinensis and O. viverrini genes; P<0.01). The nucleotide composition patterns in the P. westermani and S. erinaceieuropaei tyrosinase codons were similar to those of the opisthorchiid genes. However, the values were found to be the lowest in the third codon positions in the Schistosoma spp. and S. mediterranea tyrosinase genes (P<0.05) (Table 1). In order to estimate the relationship of the GC content at the 3 codon positions, neutrality plots, i.e., GC3 vs GC12 (average GC content in the first and second codon positions), were drawn for the platyhelminth tyrosinase genes. As shown in Fig. 1, the correlation between GC12 and GC3 was statistically significant (R=0.8272, P<0.0001), and the slope of the regression line was 0.2168.
Relative synonymous codon usage (RSCU) in C. sinensis tyrosinase genes
The patterns of synonymous codon usage were analyzed in the 5 C. sinensis tyrosinase genes (Table 2). The usage patterns were found to be slightly different among the Clonorchis genes. Codons with a RSCU value greater than 1.0, which meant that the particular codon was used more frequently than the other synonymous codons for the corresponding amino acid, were counted as 28, 29, 29, 30, and 28 in GAA27975, GAA32069, GAA48882, GAA48883, and GAA54899, respectively. Of these preferred codons, 9 (32.1%), 10 (34.5%), 13 (44.8%), 8 (26.7%), and 13 (46.4%) codons were ended with G or C.
Relationship between nucleotide composition and selection pressure in association with CUB
The ENC values of tyrosinase genes ranged from 32.8 (AAW 26996 of S. japonicum) to 61.0 (AJE29953 of P. westermani) demonstrating that there were significant differences in codon bias among these genes (Table 1). The relationship between the codon bias and nucleotide composition was examined by NC plot analysis (GC3 vs ENC) of the platyhelminth tyrosinase genes. As shown in Fig. 2A, the NC plots of most tyrosinase genes fell below a reference line showing the expected position of gene, of which codon bias was constrained solely by the nucleotide composition at the third codon positions. Only a few platyhelminth genes, such as GAA32069 of C. sinensis, AJE29953 of P. westermani, and ELU13195 of C. telleta, lay on the expected curve.
Unlike ENC depending upon nucleotide composition in a gene, CAI was a directional measure of codon usage bias, which was highly related to the degree of translational selection acting upon the gene and relative expression level of corresponding gene [20,21]. Therefore, comparison between ENC and CAI values provided insights into the relationship between nucleotide composition and selection pressure affecting CUB. The CAI values of platyhelminth genes obtained by referencing C. elegans and E. coli gene sets (CAI-1 and CAI-2 in Table 1) had a relatively narrow distribution between 0.685/ 0.714 and 0.797/0.837, and were negatively correlated with their ENC (R=0.8561, P<0.0001; Fig. 2B).
Correspondence analysis of tyrosinase gene RSCU
The RSCU values of platyhelminth tyrosinase genes were computed using the CodonW program, and the values were used in a correspondence analysis to investigate synonymous codon usage variation. The corresponding analysis of these genes generated a series of orthogonal axes to reflect the trends responsible for variation in codon usage, of which the first 2 principle axes accounted for 55.0% and 10.0% of the overall variation observed in RSCU of platyhelminth tyrosinase genes. As shown in Fig. 3A, the tyrosinase orthologs were widely scattered along with the primary axis related to their donor organisms, while genes with paralogous relationship were properly resolved along the second axis. The correspondence analysis of synonymous codons also separated G-/C-ending codons from A-/T-ending codons largely along the first axis (Fig. 3B). Correlation analyses between the position of genes along the axes and the nucleotide composition of respective genes demonstrated that the first axis has strong negative correlation with the nucleotide chemistry (correlation coefficient=−0.972), especially with the GC contents at the first (correlation coefficient=−0.887) and third (correlation coefficient=−0.995) codon positions (Table 3). ENC values depending on nucleotide composition as described above were similarly correlated to the primary axis (correlation coefficient=−0.961). Of these compositional criteria of nucleobases, GC2 exhibited the strongest correlation with gene positions along the second axis (correlation coefficient=−0.409).
DISCUSSION
CUB, which depicts unequal usage of synonymous codons to encode identical amino acid, is a unique property of genomes, and it may differ even between paralogous genes [15,16]. Of the diverse factors associated with CUB, compositional constraints under mutational bias and selection pressures, including translational selection, were considered to be the major determinants of codon usage variation among different organisms [15,25,26]. Translational selection was extrapolated from the number of tRNA copies and inevitably from the expression level of respective genes [25], while the mutational bias was largely associated with the GC content of donor genome. Genes encoded in a GC-rich genome tended to use synonymous codons ending with G and C, whereas those in an AT-rich genome preferred codons with A and T at the third position [27]. The fraction of G- and C-ending codons (34.5±6.6) among the frequently used ones in Clonorchis tyrosinase genes was significantly smaller than the average GC3 content (47.8±2.4) of corresponding genes (P<0.05), while the average GC3 content was similar to that of all CDSs encoded in the liver fluke genome (48.0) (Tables 1 and 2). As was in the Clonorchis genes, the average GC3 content of O. viverrini tyrosinase genes (48.3±2.5) was also similar to that of whole CDSs (47.8). However, the values were greatly reduced in the S. mediterranea (29.6±6.5) and Schistosoma spp. (14.7± 3.3) genes compared to the respective GC contents of whole CDSs (36.0 and 36.5±0.36; P<0.05) (Table 1). In the correspondence analysis of RSCU, the tyrosinase gene positions moved toward the right side of the primary axis, which had strong negative correlation with the GC3 content, compared to codons (Fig. 3; Table 3). These facts seemed to demonstrate that codon usage pattern, or that of frequently used synonymous codons, was biased to some extent toward A- and T-ending codons in the platyhelminth tyrosinase genes.
The mutation-selection equilibrium in shaping CUB could be estimated by neutrality analysis, which examined relationship between GC12 and GC3 of respective genes [28]. If the relationship was statistically significant and the slope of the best-fitting line was close to 1.0, mutational bias was taken as the main force that affects CUB. Conversely, selection predominating against mutational bias caused no correlation between GC12 and GC3 and/or the smaller slope of their regression line [29,30]. The GC12 and GC3 were significantly correlated in platyhelminth tyrosinase genes, although the slope was much closer to 0.0 than 1.0 (a=0.2168, R=0.8272, P<0.0001; Fig. 1). Interestingly, the relationship was found to be quite different among paralogous gene sets. In the neutrality analyses performed at the intragenome level, the slope of regression line was slightly increased among C. sinensis (a=0.3825; R=0.7693) and O. viverrini (a=0.2284; R=0.4475) genes. However, the value came close to 0.0 among genes encoded in Schistosoma spp. (a=0.0580; R=0.2391) and S. mediterranea (a=0.0508; R=0.1690). Taken together, these data strongly suggested that the selection pressure, such as translational selection rather than mutational bias, was the main force governing bias in nucleotide composition of tyrosinase codons (refer [31] and references therein), even though it was apparent that mutational bias also provided significant force causing the bias at least in the C. sinensis and O. viverrini genes. The predominant impact of selection pressure on CUB of tyrosinase genes was further supported by plots between ENC and GC3, and between ENC and CAI (Fig. 2) [20,31].
The expression of tyrosinase genes was specific to the vitellocytes and thus, coincided with the sexual maturation in trematodes, including C. sinensis [10,32–34]. Structural and biochemical analyses of the Clonorchis tyrosinases did not show any evidence revealing functional diversification, while the relative expression levels were found to be quite different among them (GAA27975>GAA32069>GAA48883>GAA48882; [10]). CUB-related criteria, such as ENC and CAI, had been known to be reflective of the expression level of corresponding gene [20,35,36]. However, the ENC and CAI values of Clonorchis genes did not have strong relation to their expression levels, which were empirically determined (Table 1). Meanwhile, C. sinensis tyrosinase genes, as well as O. viverrini orthologs, were widely scattered only along the second axis in the correspondence analysis of their RSCU values (Fig. 3A). Correlation analysis between the Clonorchis gene positions along the axis and nucleotide compositions demonstrated that the positions were positively correlated with the GC contents of codons (correlation coefficient=0.971), especially at the third positions (correlation coefficient=0.933). Therefore, the highly expressed tyrosinase genes of C. sinensis, such as GAA27975 and GAA32069, were likely to prefer synonymous codons with the lower GC content, which was also governed by selection pressures, including translational selection [37]. Investigations on the relationship between tRNA gene copy numbers and CUB of tyrosinase genes will be helpful to address the issue in various platyhelminth species.
In conclusion, the codon usage patterns of tyrosinase genes identified in the genomes of diverse platyhelminths, including C. sinensis, were comparatively analyzed to gain insights on the evolutionary process and molecular clues related to the differential expressions of paralogous genes. The CUB detected in the platyhelminth tyrosinase genes was basically governed by selection pressures, rather than mutational bias, although the latter factor provided an additional force in shaping CUB of the C. sinensis and O. viverrini genes. It was also apparent that the equilibrium point between selection pressure and mutational bias was much skewed toward selection pressure in the highly expressed C. sinensis genes than that of poorly expressed paralogs. Further studies on physicochemical properties of tyrosinase proteins, tRNA gene copy numbers, and promoter activity of respective genes will be needed to confirm these results.
ACKNOWLEDGMENT
This work was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), which was funded by the Ministry of Science, ICT and Future Planning (no. NRF-2013R1A1A2012011).
Notes
CONFLICT OF INTEREST
The auther has no conflict of interest related to this study.