Construction of EST Database for Comparative Gene Studies of Acanthamoeba
Article information
Abstract
The genus Acanthamoeba can cause severe infections such as granulomatous amebic encephalitis and amebic keratitis in humans. However, little genomic information of Acanthamoeba has been reported. Here, we constructed Acanthamoeba expressed sequence tags (EST) database (Acanthamoeba EST DB) derived from our 4 kinds of Acanthamoeba cDNA library. The Acanthamoeba EST DB contains 3,897 EST generated from amebae under various conditions of long term in vitro culture, mouse brain passage, or encystation, and downloaded data of Acanthamoeba from National Center for Biotechnology Information (NCBI) and Taxonomically Broad EST Database (TBestDB). The almost reported cDNA/genomic sequences of Acanthamoeba provide stand alone BLAST system with nucleotide (BLAST NT) and amino acid (BLAST AA) sequence database. In BLAST results, each gene links for the significant information including sequence data, gene orthology annotations, relevant references, and a BlastX result. This is the first attempt for construction of Acanthamoeba database with genes expressed in diverse conditions. These data were integrated into a database (http://www.amoeba.or.kr).
INTRODUCTION
Free-living amebae belonging to the genus Acanthamoeba are the causative agents of granulomatous amebic encephalitis (GAE), a fatal disease of the central nervous system (CNS), and amebic keratitis (AK) [1]. The recent increased incidence in Acanthamoeba infections is due in part to infection in patients with acquired immune deficiency syndrome, while that for keratitis is due to the increased use of contact lenses [2]. In addition to these medical importances, Acanthamoeba is also well known as a good model system to study eukaryotic cell biology due to its relatively large size, rapid growth in culture, active motility, and well developed cytoskeleton [3,4]. Over the years, Acanthamoeba has gained increasing attention from the scientific community with these diverse roles [4].
Over the past decade, as the development of tools for genome study, knowledge on genome of protozoan parasites has grown exponentially. Based on these results of genome studies, constructions of various databases have been applied including parasitic protozoa such as Plasmodium species [5,6], Entamoeba histolytica [7], Trypanosoma cruzi [8,9], and free living protozoa such as Dictyostelium discoideum [10,11]. Although Acanthamoeba has been considered to be an important organism in medicine and biological researches, little genomic information of Acanthamoeba has been reported. The genome size of the ameba has been speculated as ~1 × 108 bp [3]. The complete primary sequence of A. castellanii mitochondrial genome was determined as 41,591 bp [12], and the small-sized expressed sequence tag (EST) analysis of Acanthamoeba healyi was reported [13]. Recently, gene discovery in A. castellanii was performed [14] and a taxonomically broad database (TBestDB) from 49 organisms including 13,814 ESTs of A. castellanii was constructed [15]. TBestDB database (http://tbestdb.bcm.umontreal.ca) containing ~370,000 clustered EST sequences of 49 organisms provided information of 5,262 clustered EST sequences in A. castellanii trophozoites [15]. However, these reported genes seem to be expressed in normal conditions or some genes silenced. The virulence of Acanthamoeba can be attenuated by a long-term in vitro cultivation and the cyst form of Acanthamoeba is resistant to immune responses and antibiotics. With these databases, it is difficult to get the information about enhanced virulence genes or encystation mediating genes.
In this study, we constructed the specific database with our previously reported EST sequences generated with Acanthamoeba in a highly virulent condition by mouse brain passage or in encystation. This new database of Acanthamoeba could give more information of various genes concerned with pathogenesis or encystation of the cyst forming protozoa.
MATERIALS AND METHODS
Our previously reported EST sequences, randomly selected from 4 kinds of cDNA library [16, processing], were used to construct database to study various types of genes containing pathogenicity, differentiation, or stress-condition related genes of Acanthamoeba.
The BLAST server for the Acanthamoeba EST database (Acanthamoeba EST DB) was constructed on the basis of the dual Xeon CPU system. After installing the Cent operating system, NCBI www blast package was installed after web server configuration for cgi (common gate interface) (http://www.amoeba.or.kr). To build up the stand alone blast server, it was conducted as follows: first, own EST sequence data for Acanthamoeba and downloaded nucleotide and amino acid sequences related with Acanthamoeba available at NCBI and TBestDB were used [15]. Second, it was translated into the multifasta format that was stored as database by using the formatdb program provided by NCBI. Third, blast results of own EST sequences which were transformed into a table include QueryID (clone name), SubjectID (gi number of NCBI), KOG (Clusters of Orthologous Groups of proteins), QLen (query sequence length), CovQ (coverage of query sequence against subject sequence), SLen (subject sequence length), CovS (coverage of subject sequence against query sequence), Pid (percent identity in the HSP), Psi (percent similarity in the HSP), Frame, E-value, a kind of Database, Annotation results, Source (species), and Link service for original sequence and blast results.
RESULTS
Composition of Acanthamoeba EST DB
Based on our previous report [16], specific Acanthamoeba EST database (Acanthamoeba EST DB) was constructed (http://www.amoeba.or.kr: inaccessible). The sequence data of Acanthamoeba EST DB consisted of 3,897 ESTs data of Acanthamoeba from our previous studies (Table 1), 33,648 sequence data related with Acanthamoeba from NCBI, and 5,260 nucleotide data of Acanthamoeba from TBestDB. Total 42,805 sequence data were used for construction of the database (Tables 1, 2).
Information of Acanthamoeba EST DB
The contents of Acanthamoeba EST DB consisted of 3 search tools and 2 depots of search results data. BLAST system with nucleotide (BLAST NT), BLAST system with amino acid (BLAST AA), and 2-Sequence were developed as the search tools, and BLAST results and statistics were the depots, respectively (Table 3). BLAST NT or BLAST AA contained nucleotide database or protein sequence database, respectively, which could provide predictive information for the functions of Acanthamoeba genes or proteins in any experiments through comparative analysis. The search of BLAST NT worked with blastn, tblastn, or tblastx program, while blastp or blastx program was used for the search of BLAST AA. 2-Sequence was an alignment tool to compare the homology and similarity between 2 genes using the blastn, tblastn, tblastx, blastp, or blastx program. Each searched sequences linked to information of annotated genes and showed the similarity with queried sequences. BLAST results could not only store the results of analysis but also could provide significant information, including the sequence data, blastX results, orthology annotations, KOG analysis, and relevant references for each gene. In statistics, the results of Acanthamoeba ESTs analysis were summarized. Each program or database in the search tool was optionally selected and comparative analysis of Acanthamoeba genes was also applicable for various investigations.
Specificity of Acanthamoeba EST DB
To show the specificity of our database, we compared the redundancy rates between TBestDB and our Acanthamoeba EST DB (Table 4). TBestDB database (http://tbestdb.bcm.umontreal.ca) provided 5,262 clustered EST sequences in A. castellanii trophozoites. Although total EST sequences of Acanthamoeba EST DB (3,897 ESTs) was smaller than that of TBestDB (13,770 ESTs), redundancy was relatively lower than that of TBestDB. Unique cluster EST of Acanthamoeba EST database (2,327 clones, 59.7%) was higher than that of TBestDB (5,260 clones, 38.2%). Among unique ESTs clusters, the not-annotated cluster ESTs including unknown genes, hypothetical or novel proteins of Acanthamoeba EST DB (704 clones, 30.3%) were also higher than TBestDB (372 clones, 7.1%) (Table 4).
Our Acanthamoeba EST DB included various genes concerned with enhanced virulence or different developmental stages of Acanthamoeba. To confirm the specificity of our database, we examined the blast results of Acanthamoeba EST DB. With the amino acid sequences of the protease-associated (PA) domain from Acanthamoeba lugdunensis (ABY6399), PA domain containing proteins were identified using the tblastn program in BLAST NT search tool (Table 5). Our database provided more various informations for the PA domain containing proteins than TBest-DB or NCBI blast search results.
DISCUSSION
As the strategies and techniques for molecular biology are developed and advanced rapidly, the database of nucleotide sequences and genome become a very powerful tool to identify new genes and proteins and to suspect the function of novel genes. Over the past decade, together with genome studies, construction of database has been applied to many organisms including parasitic protozoa [5,7-9]. Entamoeba histolytica genome analysis was carried out on a 12.5-fold coverage of the total genome [7], but that of A. castellanii was carried out on a 0.5-fold coverage of the total genome [14].
Several reasons would explain the poor progress in Acanthamoeba genomic study. First, the gene structure of Acanthamoeba may be more complex than expected. In a previous genome study of Acanthamoeba, average 3.0 exons per gene were calculated and this was higher than those of E. histolytica which has 1.3 exons per gene [14]. Ploidy and chromosome numbers of the genus Acanthamoeba are still undiscovered. Second, the transfection system very useful to study functions or localization of a putative gene has not been completely established in Acanthamoeba yet. Kong and Pollard [17] recently developed the systemwhich is for the transient transfection in Acanthamoeba. Peng [18] reported the system for the stable transfection of Acanthamoeba castellanii. However, these systems have to overcome the low transfection efficiency to be used commonly [17,18]. Third, little data on Acanthamoeba genes and proteins in public database makes more difficulty to identification and speculation of functions of new genes or proteins. When we search for a new gene or a protein in the NCBI blast, the result usually shows the matched genes or proteins of vertebrates. Thus, genes of Acanthamoeba may be shown at a lower part of the list or may not be shown because of a low HSP (high scoring segment pair). This reveals the requirement of more information in genomes of Acanthamoeba. For the proteomic researches, more genomic information of Acanthamoeba is also needed for a comparative genetic study.
In the present study, the specific database of Acanthamoeba named Acanthamoeba EST database (Acanthamoeba EST DB) was constructed. To promote the Acanthamoeba gene study, Acanthamoeba EST DB could provide the specific sequences concerned with specific conditions such as mouse brain passage or encystation. TBestDB showed the information of 13,814 ESTs from Acanthamoeba generated with trophozoites; however, in our database, 3,897 ESTs were generated with diverse conditions. Although the size of Acanthamoeba EST database was smaller than that of TBestDB, the redundancy of information was lower than TBestDB, and the number of non-annotated clusters, unknown, hypothetical, or novel protein was much higher than TBestDB. It means that Acanthamoeba EST DB may contain more diverse genes related with Acanthamoeba life- or infection cycle. Investigation of those unknown or novel proteins, which are expressed specifically in encystation or mouse infection, will provide the clues to understand the pathogenesis and encystation of Acanthamoeba.
This is the first attempt of specific database for comparative studies of Acanthamoeba. In fact, the entire genome of this organism has not been fully sequenced yet. Therefore, the number of ESTs should be increased to improve the usefulness of database for comparative genome studies. This database will be upgraded with new sequences which are related with cyst mediating genes. Acanthamoeba EST DB would make easy the gene study of Acanthamoeba, providing sequence data for proteomics and providing many new opportunities for the scientific community. Acanthamoeba EST DB can be freely accessible via http://www.amoeba.or.kr.
ACKNOWLEDGEMENTS
This work was supported by No. R01-2006-000-10757-0 from the Basic Research Program of the Korea Science & Engineering Foundation (KOSEF) and the Brain Korea 21 Project in 2008. We thank to a KOSEF program (System development for application of genomic sequence information) No. M107520000001-07N5200-00110 funded by the Korea Government (MEST).