The Genome of Model Malaria Parasites , and Comparative Genomics

The field of comparative genomics of malaria parasites has recently come of age with the completion of the whole genome sequences of the human malaria parasite Plasmodium falciparum and a rodent malaria model, Plasmodium yoelii yoelii. With several other genome sequencing projects of different model and human malaria parasite species underway, comparing genomes from multiple species has necessitated the development of improved informatics tools and analyses. Results from initial comparative analyses reveal striking conservation of gene synteny between malaria species within conserved chromosome cores, in contrast to reduced homology within subtelomeric regions, in line with previous findings on a smaller scale. Genes that elicit a host immune response are frequently found to be species-specific, although a large variant multigene family is common to many rodent malaria species and Plasmodium vivax. Sequence alignment of syntenic regions from multiple species has revealed the similarity between species in coding regions to be high relative to non-coding regions, and phylogenetic footprinting studies promise to reveal conserved motifs in the latter. Comparison of non-synonymous substitution rates between orthologous genes is proving a powerful technique for identifying genes under selection pressure, and may be useful for vaccine design. This is a stimulating time for comparative genomics of model and human malaria parasites, which promises to produce useful results for the development of antimalarial drugs and vaccines.


Introduction
Model malaria parasites have proven invaluable in the study of the human form of the disease, where host specificity represents a major constraint for laboratorybased experimentation.Before the development of in vitro cultivation of Plasmodium falciparum, animal models of malaria were widely used and provided researchers with a means to develop a better understanding of the biology of the parasite and its interactions with the mammalian host and vector (Waters, 2002;Waters and Janse, 2004).Their role in providing biological insight continues today, since certain aspects of malaria pathology and biology, for example invasion of hepatocytes by sporozoites (Mota et al., 2001), cannot be studied without the use of an animal model system.In many instances they also provide the only source of biological material for several life-cycle stages, such as ookinetes and zygotes (Janse et al., 1995).Moreover, their usefulness in functional characterization of genes through gene knock-out and modification studies is well established (de Koning-Ward et al., 2000).Three groups of model systems can be identified: (1) simian malaria species that naturally parasitize non-human primates, for example the Plasmodium knowlesi/macaque monkey model system; (2) species of bird malaria that infect domestic fowl, for example the Plasmodium gallinaceum/domestic chicken model system; and (3) species of African thicket rat parasite that have been adapted for growth in laboratory rodents.The latter group, consisting of four species Plasmodium berghei, Plasmodium chabaudi, Plasmodium vinckei and Plasmodium yoelii, have been the most widely used as models for the study of P. falciparum malaria, primarily due to the ease of handling and maintaining rats and mice in the laboratory.In terms of evolutionary relatedness, studies involving the comparison of homologous genes from different Plasmodium species have shown that the genus is comprised of several deep branches.The four human malaria species P. falciparum, Plasmodium vivax, Plasmodium malariae and Plasmodium ovale form separate clades, with P. vivax showing distinct clustering with monkey malaria parasites such as P. knowlesi, and P. falciparum more closely associated with avian malaria species (Escalante et al., 1994;Waters et al., 1991).The rodent malaria species also form a distinct clade.
The notion of a universal model for the study of all human malaria species has been shown to be untenable, and instead, a view of model species selection based upon the complement of genes within the model that best fit the phenotypic trait under study, is more appropriate.With the completion of the P. falciparum genome sequencing project, undertaken by an international consortium of sequencing centers and malaria researchers, additional genome sequencing projects have started to generate substantial information from other model and human Plasmodium species, enabling the full gene complement to be identified within each species.Thus, comparative analysis of genome data from multiple malaria species is now a tangible prospect.
A current list of malaria parasite genome initiatives is given in Table 1.The genomes of two species have been sequenced and published to date, the complete finished sequence of P. falciparum (Gardner et al., 2002), and the partial sequence of one of the four species of rodent malaria parasites, Plasmodium yoelii yoelii (Carlton et al., 2002).Other current sequencing projects include partial shotgun coverage of the monkey malaria parasite P. knowlesi and two more rodent malaria parasite species a ESTs: expressed sequence tags from cDNA libraries b GSSs: genome survey sequences from mung bean nuclease-digested gDNA libraries Data may be accessed through PlasmoDB at http://plasmodb.org(Bahl et al., 2003) P. berghei and P. chabaudi chabaudi, and the finished genome sequence of a second human malaria parasite, Plasmodium vivax (Carlton, 2003), with publications describing the annotation and comparative analysis of the genomes expected before the close of 2004 (definitions of genomic terms used throughout this review can be found in Box 1).All sequence data are being released by the sequencing centers to researchers in advance of final publication so that biological experimentation can be 'jump-started'.This has proven highly successful in the case of prior release of the P. falciparum genome sequence data, resulting in identification of parasitespecific pathways which may represent unique targets for intervention strategies (see for example Jomaa et al., 1999), while acknowledging the perogative of the sequencing centers to publish a whole genome analysis of the final data.
Comparative analyses of genome data, or 'comparative genomics', encompasses several areas of research.Prior to the production of large-scale genome sequencing data, comparative gene mapping studies showed that relative gene location and order can be conserved over large regions of chromosomes of different species (Graves, 1998).This area of research established criteria for defining homologies between genes of different species, which are still adhered to today (Box 1).With the advent of computational biology, algorithms such as the BLAST series for pairwise sequence alignment (Altschul et al., 1990) and the development of the International Nucleotide Sequence Database, comprising DDBJ, EMBL and GenBank, tools were available for comparative analysis of nucleotide and protein sequence data from different species in silico.With the arrival of the genomics and bioinformatics revolution, comparative genomics has scaled up and whole genome comparisons are now used to describe relative genome composition, genome organization, identify orthologous and paralogous genes, classify species-specific genes, and chart the evolution of the organisms being compared, in all three domains of life: bacterial (Fraser et al., 2000), archaeal (Nelson et al., 2000) and eukaryotic (Rubin et al., 2000).
Comparative genomics of malaria parasite genomes is still a science in its infancy.This review will focus primarily on the rodent models of malaria and comparative genomic studies with the human malaria species P. falciparum, since these are the most advanced.A brief, general background concerning the Plasmodium genome and a description of published studies in comparative genomics are given, but since much of this has been recently reviewed (van Lin et al., 2000;Waters, 2002), a greater emphasis will be placed on more recent developments and future directions for research.

The Nuclear Genome and Gene Complement of Malaria Parasites
What does the nuclear genome of a typical malaria parasite look like?By taking data from a number of genome projects both partial and finished, it is now possible to create a generalized view (Table 2A).The haploid genome has a standard size of approximately 22-26 Mb (Carlton et al., 2002;Gardner et al., 2002), distributed among 14 linear chromosomes that range in size from 500 kb to over 3 Mb (Carlton et al., 1999;Janse et al., 1994;Kemp et al., 1987).Note: Karyotype data is not available for all Plasmodium species, however it is unlikely that any species deviates significantly from this number.Genome composition varies from species to species, and is not host lineage-specific.For example, the (A+T) genome composition of P. falciparum is 81% (Gardner et al., 2002) compared to 62% in P. vivax (Carlton, 2003).The rodent malaria species have similarly high (A+T)-rich genomes compared with P. falciparum, whereas P. knowlesi and P. vivax are less biased.The genomes of some species have an additional higher order structuring, in that sections of the genome are compartmentalized into discrete regions Box 1. Glossary of Genome Sequencing and Comparative Genomics Terms

Raw sequence:
Unassembled sequence reads produced from sequencing of inserts from individual recombinant clones of a genomic DNA library.
Finished sequence: Complete sequence of a genome with no gaps and an accuracy of > 99.9%.

Genome coverage:
Average number of times a nucleotide is represented by a high-quality base in random raw sequence.
Partial shotgun coverage: Typically 3-6X random coverage of a genome which produces sequence data of sufficient quality to enable gene identification but which is not sufficient to produce a finished genome sequence Paired reads: Sequence reads determined from both ends of a cloned insert in a recombinant clone.

Contig:
Contiguous DNA sequence produced from joining overlapping raw sequence reads.
Singleton: Single sequence read that cannot be joined ('assembled') into a contig.

Scaffold:
A group of ordered and orientated contigs known to be physically linked to each other by paired read information.

EST:
Expressed sequence tag generated by sequencing one end of a recombinant clone from a cDNA library.

GSS:
Genome survey sequence generated by sequencing one end of a recombinant clone from a genomic DNA library.

SNP:
Single nucleotide polymorphism, i.e a single nucleotide position for which two or more alternative alleles are present at a certain frequency.

ORF:
Open reading frame, stretches of codons in the same reading frame uninterrupted by STOP codons and calculated from a six-frame translation of DNA sequence.

Comparative Genomics Terms
Homologs: Genes related to each other by descent from a common ancestral DNA sequence.
Orthologs: Homologous genes generated by speciation, i.e related to each other by vertical descent.
Paralogs: Homologous genes generated by duplication, i.e related to each other by horizontal descent.
Conserved synteny: Three or more genes located on the same chromosome in different species regardless of gene order.

Conserved linkage:
A group of genes conserved in synteny and order between species.
or 'isochores' of differing (A+T) content (McCutchan et al., 1984).An example is the simian malaria parasite Plasmodium cynomolgi in which protein coding genes have been localized to (G+C)-rich isochores, whereas chromosome ends containing the telomeres appear to be located in (A+T)-rich isochores (McCutchan et al., 1988).Evidence from the complete sequence of two P. vivax YACs, one containing a telomeric chromosome segment (del Portillo et al., 2001), and the other a more central chromosome region (Tchavtchitch et al., 2001), supports a similar organization of the P. vivax genome.In contrast, P. falciparum has a uniform genome composition, with the exception of short regions of >97% (A+T) on each chromosome which most likely contain the centromeres (Hall et al., 2002), and the bias exhibited between coding and non-coding regions (described below).It is tempting to speculate that isochores may encode genes that mediate phenomena specific to the pathophysiology of the species that harbour them (McCutchan et al., 1984), but evidence for this has yet to emerge.
Each Plasmodium species appears to have 5,000-6,000 predicted genes per genome (Buckee, 2002;Carlton et al., 2002;Gardner et al., 2002).Of these, 60% represent orthologous genes between the species, as determined by reciprocal best-match BLAST analysis (Buckee, 2002;Carlton et al., 2002).Many of the genes unique to each species are located within subtelomeric regions, and many are known to code for immunodominant antigens.The difference in gene number between species is due to (a) gene expansion/contraction in different lineages, for example the pyst-a gene family which has more than 150 members in P. y. yoelii but only one copy in P. falciparum (Table 2A); and (b) the presence of a large variant gene family in some Plasmodium species, predicted to be involved in antigenic variation.The family was first described in P. vivax [the vir family (del Portillo et al., 2001)], and latterly in P. yoelii [the yir family (Carlton et al., 2002)], P. berghei (the bir family) and P. chabaudi [the cir family (Janssen et al., 2002)], and P. knowlesi [the kir family (Buckee, 2002)].True homologs of this family so far have not been identified in P. falciparum, which contains other gene families involved in antigenic variation and evasion of immune responses [the var, rif, clag and stevor gene families (Craig et al., 2001)].In P. knowlesi, the SICAvar gene family has also been described (al-Khedery et al., 1999) which is expressed on the surface of infected erythrocytes and is implicated in antigenic variation in this species.No significant homology exists between the var and SICAvar genes.As a cautionary note however, discrepancies in the number of predicted genes between species may also reflect the incomplete nature of partial genome data, which can exacerbate the problems associated with accurate gene prediction.
A comparison of the P. falciparum and P. y. yoelii coding and non-coding regions (Table 2B), suggests that different Plasmodium species exhibit similar characteristics for these regions (Carlton et al., 2002).For example, coding regions of the genome have a lower (A+T) content (76%) than non-coding regions (80-87%), and a similar percentage of genes contain introns (54%).
The main exception appears to be the mean length of genes, which in P. falciparum is almost twice the size of the gene length in P. y. yoelii, and also larger than the mean length of genes in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe (Gardner et al., 2002), both lower eukaryotes.The explanation for increased gene length in P. falciparum is at present not known.
Besides gene families involved in antigenic variation, comparative analysis of several other nuclear gene families in different Plasmodium species is ongoing.For example, members of the P48/45 gene super family have been identified in P. falciparum, P. berghei, P. vivax and P. yoelii (Thompson et al., 2001).This is a large conserved family of proteins expressed during the sexual stages of which there are ten members in P. falciparum (J.Thompson, pers. comm), and it is likely that a similar number will be found in the other species.Given the stage-specific expression and role in the development of transmission blocking vaccines of P48/45, rodent model orthologs of the family are proving to be immensely valuable in functional analyses of the genes, for example by gene knock-out (van Dijk et al., 2001).Members of the Py235 multi-gene family, first identified in P. yoelii as exhibiting a novel form of clonal antigenic variation whereby each merozoite from a parental schizont has the propensity to express a different Py235 protein (Preiser et al., 2002), have been identified in P. falciparum and P. vivax (Khan et al., 2001).Examination of their role in merozoite attachment and invasion of specific erythrocytes is proving to be of value for the determination of the mechanism of erythrocyte invasion in different species, such as P. vivax, which is restricted to growth in reticulocytes positive for the Duffy blood group antigen complex.Another group of genes involved in Plasmodium merozoite invasion and specific recognition of host cell receptors is the ebl gene family, which contains six members in P. falciparum: baebl, eba-175, ebl-1, jesebl, maebl and pebl, and the P. vivax and P. knowlesi Duffy-binding proteins (Adams et al., 2001).The Plasmodium ebl genes are single copy, have a multiexon structure encoding distinct functional domains, and conserved exon-intron splice junctions.Gene duplication has been found to be a common characteristic of the family, providing the molecular basis for the development of alternative invasion pathways.Cross-species analysis of the conserved cysteine-rich domains in members of the gene family has identified certain of the genes as having ancient origins which predate the speciation of Plasmodium (Michon et al., 2002).

Comparative Gene Expression Studies
Table 1 lists gene and protein expression data being generated for different life-stages of various Plasmodium species.Large-scale sequencing projects have generated a substantial number of ESTs and full-length cDNA sequences from P. falciparum (Chakrabarti et al., 1994;Watanabe et al., 2002), P. berghei (Carlton et al., 2001b;Matuschewski et al., 2002) and P. y. yoelii (Kappe et al., 2001), as well as several thousand mung bean nuclease GSSs from P. vivax, P. falciparum and P. berghei (Carlton et al., 2002;Reddy et al., 1993), enabling some preliminary comparative analyses of the transcriptome and proteome of malaria parasites.In one study, clustering algorithms were used to assemble the data and to create several thousand concensus sequences which were compared between P. falciparum, P. berghei and P. vivax (Carlton et al., 2001b).This comparison of partial data identified many protein motifs and signatures as being conserved between the species.Comparison of the Gene Ontology terms [GO terms represent a vocabulary designed to describe all known genes (Ashburner et al., 2000)] assigned to proteins of each species showed similar numbers of proteins in each class for each species, with the exception of the Cell Process and Defense and Immunity classes.This finding was later confirmed by whole proteome comparative analysis of P. falciparum and P. y. yoelii (Carlton et al., 2002), and reflects the non-homologous Nature of the proteins involved in antigenic variation and evasion of immune responses in Plasmodium species.In another study, comparative analysis of genes expressed in salivary gland sporzoites versus those expressed in oocyst sporozoites identified genes that were upregulated in the former, signifying possible developmental changes in the infectious transmission stage of Plasmodium (Matuschewski et al., 2002).Several microarray analyses of gene expression of whole P. falciparum chromosomes (Le Roch et al., 2002) and the complete genome at different developmental stages (Ben Mamoun et al., 2001;Hayward et al., 2000) have been published, as well as serial analysis of gene expression (SAGE) studies (Patankar et al., 2001).The latter study purported to find a significant number of antisense messages in asexual and sexual stages, the first time this has been reported in species of Plasmodium.Microarrays of other rodent model species are also in progress (M.Karras and A. Waters, unpublished), with the specific aim of comparing results to the P. falciparum expression studies, and enabling a transcriptional profile of orthologous Plasmodium genes to be created.Microarays of the mosquito vector have been constructed too, and pilot studies undertaken to determine mosquito genes induced through infection with P. berghei (Dimopoulos et al., 2002).
Two large-scale, high-throughput mass spectrometric analyses of P. falciparum proteins from sporozoite, merozoite, trophozoite, gametocyte and gamete stages were recently published (Florens et al., 2002;Lasonder et al., 2002), and a smaller analysis of the proteins in sporozoite and gametocyte stages of P. y. yoelii (Carlton et al., 2002).These datasets provide validation of gene predictions in both species (52% of predicted P. falciparum genes were confirmed by proteomic data).A comparative analysis between these and other ongoing Plasmodium proteome projects is underway (D.Raine, L. Florens, R. Sinden and J. Yates, unpublished).Finally, data from a number of transcriptome and proteome projects, and mapping of the expression data to the genome sequence, will facilitate a thorough investigation of the phenomenon of "coordinated gene expression clustering", as shown to exist in certain eukaryotes (Blumenthal et al., 2002;Caron et al., 2001;Cohen et al., 2000).Gene clustering can be defined in a number of different ways (Carlton, 1999), depending upon whether the genes under study are functionally related, polycistronically transcribed, expressed in the same pathway, or paralogous gene copies generated by gene duplication events.Preliminary evidence exists for some undefined level of synchronized gene expression (Florens et al., 2002), but to what extent and what consequence remains to be determined.

Chromosome Structure, Comparative Mapping and Gene Synteny Studies
Several features of chromosome structure appear to be well conserved in all Plasmodium species.All possess telomeres consisting of degenerate, canonical, tandem repeats, the most common motif being AACCCT(A/G) (Scherf et al., 2001).The mean length of the telomeric array (~800 to ~6700 bp) varies from species to species, although it remains remarkably constant within species (Figueiredo et al., 2002).Subtelomeric regions of Plasmodium chromosomes consist of a variable number of species-specific repeats that extend 10-40 kb towards the internal part of chromosomes, and which have extensive large-scale similarity between chromosomes, indicative of intra-chromosomal exchange (Carlton et al., 2002;Gardner et al., 2002).Low restriction maps and high-resolution YAC contig maps, in conjunction with the P. falciparum and P. y. yoelii finished sequence data, have now established that species-specific gene families coding for immunodominant antigens and proteins known to be involved in antigenic variation are predominantly found within these regions, whereas conserved 'housekeeping' genes are located within central chromosome regions.Thus Plasmodium chromosomes consist of a central conserved core flanked at each end by less conserved regions containing antigen genes.This chromosomal organization has been confirmed at the genomic level by construction of a SNP map of P. falciparum chromosome 2 from several parasite isolates using an oligonucleotide array (Volkman et al., 2002).Recently, P. falciparum chromosome ends were shown to cluster at the periphery of the nucleus, facilitating ectopic recombination among heterologous subtelomeric chromosome regions and thus providing a mechanism for the generation of different repertoires of antigen genes (Freitas-Junior et al., 2000).Whether this represents a common mechanism shared by other Plasmodium species remains to be seen, but it is interesting to note the shared features of chromosome organization between species which would facilitate this.
The chromosomes of P. falciparum (Kemp et al., 1985;van der Ploeg et al., 1985), P. vivax (Langsley et al., 1988) and rodent malaria species (Janse, 1993) are known to vary considerably in length.Such 'chromosome size polymorphisms' are found to occur in field isolates, most likely as a result of unequal recombination between homologous chromosomes of different parasite clones during meiosis, but also by gene amplification, and deletion and insertion of repeat sequences.P. falciparum chromosomes are also found to vary in size during in vitro culture, due to chromosome breakage followed by healing of the blunt end by the addition of telomeric repeats (Bottius et al., 1998).Most of these largescale chromosomal rearrangements affect non-coding repeat sequences in the subtelomeric regions, since the conserved core of the chromosome appears less prone to rearrangement.An exception are the genome rearrangements that occur in parasites under selective pressure, which have caused changes in ploidy as well as 'amplicons' containing copies of the same gene in tandem (Carlton et al., 2001a).Thus, chromosomal rearrangements in Plasmodium are important for the evolution of the genome, although to what extent this occurs in natural populations of the parasite remains to be determined.
Given the range and diversity of karyotypes seen among different species, a surprising result of chromosome mapping experiments has been the high degree of conservation of gene synteny between Plasmodium species.Initial studies involving mapping of conserved genes to separations of Plasmodium chromosomes showed that gene location (conserved synteny) and gene order (conserved linkage) are preserved over large regions between all four species of rodent malaria (Janse et al., 1994), between species of rodent malaria and P. falciparum (Carlton et al., 1998), and between all four human malaria species (Carlton et al., 1999).These studies have now been extended and show that even exon/intron boundaries and the finescale organization of genes can be conserved between species (Tchavtchitch et al., 2001;van Lin et al., 2001;Vinkenoog et al., 1995).The degree of conservation of synteny is greatest when comparing genomes of more closely related species.The rodent malaria parasites, for example, show conservation of whole chromosome synteny (Janse et al., 1994), whereas synteny is reduced to the level of conservation of large chromosomal blocks between P. falciparum and the rodent malaria species (Carlton et al., 1998).
The initial comparative mapping studies of Plasmodium species described above involved hybridization of a limited number of conserved genes to chromosome separations, and the construction of partial genome synteny maps.With the advent of genome technology and bioinformatics, and the availability of large Plasmodium genome datasets, it is now possible to use computational methods for whole genome comparative analyses, as described below.

Computational Algorithms for Cross-species Comparisons
To some extent, the availability of sequence data from a number of species has outpaced the computational and experimental methods used to compare and decode the information within the data.Whole genome shotgun sequencing has progressed so much as to be a high-throughput science, however, the computational and analytical software to analyze the data coming from the pipeline has not developed in a similar fashion.Comparative genomics tools are being designed to specifically address this problem (Frazer et al., 2003).
The first step in a comparison of two or more sequences from evolutionarily-related genomes is to align the sequences in order to identify conserved regions.Two types of alignment programs exist, 'local' and 'global'.Local alignment tools produce optimal similarity scores between subregions of sequences, for example in cases where sequences exhibit conservation of gene synteny but jumbled order.These algorithms find short common segments between sequences first, and then extend the match as far as possible.Examples of local alignment tools are BLASTZ (Schwartz et al., 2000) and MUMmer2 (Delcher et al., 2002); used to generate the alignment depicted in Figure 1).Global alignment tools produce optimal similarity scores over the entire length of the sequences being compared, for example in cases where the sequences are expected to share similarity over their full length.These methods attempt to find an all-inclusive map between sequences, but can be memory intensive and time consuming.Examples of global alignment tools are AVID (Bray et al., 2003), GLASS (Batzoglou et al., 2000) and OWEN (Ogurtsov et al., 2002); used to generate the global alignment depicted in Figure 4).Several visualization software tools are available for the production of graphical views of alignments, using either The percent similarity at the amino-acid level is given for each contig on the y-axis.The majority of contigs could be linked by PCR into two syntenic groups, and physical map data identified these as being located on P. y. yoelii chromosomes 12 and 5.Note the paucity of P. y. yoelii contigs with matches to the telomeric/subtelomeric ends of the P. falciparum chromosome, indicative of the species-specific immunodominant antigen genes located there.
local (eg PipMaker (Schwartz et al., 2000); ACT http: //www.sanger.ac.uk/Software/ACT/, used to visualize the alignment in Figure 3) or global [eg VISTA (Mayor et al., 2000)] alignment software.Both local and global approaches to aligning sequences are informative; however, a comparison of alignment programs and servers is outside the scope of this essay.MUMmer2, OWEN and the alignment visualization tool ACT have all been used for comparative analysis of Plasmodium sequences by the authors and examples of these are given below.

Whole Genome Synteny Maps of Plasmodium
Using a mixture of computational algorithms and laboratory-based methods, a whole genome synteny map of the complete sequence of P. falciparum and the partial sequence of P. y. yoelii (Carlton et al., 2002) has been created.MUMmer2 was used to identify local matches of at least five amino acids long from six-frame translations of both sequences; these seed matches were extended to create a tiling path of P. y. yoelii contigs against the P. falciparum chromosomes.The contigs were linked where possible by means of 'paired reads' and PCR amplification of the intervening sequence between contigs.The syntenic groups were assigned to a P. y. yoelii chromosome through the use of physical map data.An example of the tiling path of P. y. yoelii contigs against one P. falciparum chromosome is shown in Figure 1.From a total of 4,787 P. y. yoelii genes in the tiling path, 3,525 (74%) were found to be conserved in order between the two species using Position Effect (Carlton et al., 2002) software.This compares with 41/48 (85%) of genes found to be conserved in order in a 200 kb region syntenic between P. falciparum and P. vivax (Tchavtchitch et al., 2001).The P. y. yoelii/P.falciparum syntenic map has identified long contiguous sections of the P. y. yoelii genome, and by extension, of the other rodent malaria parasite genomes, and their syntenic regions in P. falciparum.Studies are underway to complete and extend the map, which can be seen in its current form as an 'Oxford Grid' (a conventional method for displaying synteny between two species) in Figure 2. The construction of synteny maps between other Plasmodium species is also ongoing, although this is limited by the nature of the genome data.For example, creation of a map using partial genome data requires that one of the genomes be finished or at least in megabase 'scaffolds' and preferably with some karyotype and chromosome mapping data available.A synteny map of two human malaria species, P. falciparum and P. vivax, is planned, with preliminary tiling paths already suggesting a high degree of conservation of synteny between the two (Carlton, 2003).Synteny maps between Plasmodium species are particularly valuable for a number of studies: (1) as a means to chart the evolution of the genus, since syntenic break-points represent ancient evolutionary events that most likely occurred prior to speciation of the organisms being compared; (2) as a method of identifying true orthologs between species, for the comparison of molecular mechanisms underlying shared phenotypes; (3) for refinement of gene predictions through simultaneous annotation of multiple Plasmodium genomes; (4) for comparative analysis of gene expression, for example through identification of conserved non-coding regions of the Plasmodium genome ("phylogenetic footprinting"), and the evaluation of coordinated gene expression; and (5) as a means for the classification of genes under different evolutionary pressures.Examples of some of these studies are given below, many of which are works in progress due to the preliminary nature of Plasmodium comparative genomics.

Molecular Evolution Studies of the Plasmodium Genus
Comparison of syntenic regions between Plasmodium species can aid in the creation of an evolutionary map of the genus.For example, DNA alterations leading to the generation of paralogous gene families, or to the loss or gain of genes in certain lineages, can be identified.Figure 3 shows an analysis of three genomes using ACT, a tool which reads annotated DNA sequences and BLAST analyses of the sequences and generates a visual map of syntenic regions.Two genes identified as coding for reticulocyte binding protein-2 (RBP-2) proteins are present in the P. falciparum genomic segment but absent in the P. y. yoelii and P. knowlesi contigs.These represent a gene family that appears to have been gained in P. falciparum or lost from the other species.In close proximity, the tandemly arrayed MSP7gene family appears to have undergone different degrees of gene expansion in the three species.
Figure 2. Genome-wide synteny map of Plasmodium plotted as an Oxford Grid.Syntenic regions conserved between P. falciparum and the rodent malaria species are shaded.For example, chromosome 8 in the rodent malaria parasites is syntenic to blocks of P. falciparum chromosomes 3, 7 and 9.The grid is incomplete as not all syntenic regions between the two species have been assigned to a rodent malaria chromosome.(Additional chromosome mapping data provided by T. Kooij and A. Waters, unpublished.)

Plasmodium falciparum
Rodent plasmodia Figure 3. Graphical display generated by ACT of an alignment of a section of P. falciparum chromosome 13 compared to the syntenic regions in P. y. yoelii and P. knowlesi.Shaded, directional boxes indicate predicted genes on either DNA strand; P. falciparum gene predictions were manually curated.Vertical lines signify BLAST hits between the genomes.Two genes identified as coding for reticulocyte binding protein 2 (RBP2) proteins are present in the P. falciparum chromosome segment but absent in the P. y. yoelii and P. falciparum contigs.These represent a gene family that appears to have been inserted in P. falciparum or lost from the other species.The tandemly arranged MSP7 gene family, located next to the RBP2 genes, show various levels of gene expansion in the three species (present as three copies in P. y. yoelii, four copies in P. knowlesi and five copies in P. falciparum).

Gene predictions
Figure 4. Global alignment of a 40 kb syntenic segment between three species of malaria parasite, P. falciparum (chromosome 3, at coordinates 178 kb to 220 kb), P. vivax (YAC1H14, at coordinates 95 kb to 135 kb) and P. y. yoelii (contigs MALPY2504, MALPY141, MALPY1025), encompassing twelve orthologous genes and 13 intergenic regions.(A) Structure of gene models used to estimate divergence are shown above the DNA strand (horizontal black line), and P. falciparum gene models refined through comparison with orthologous P. vivax and P. y. yoelii gene models are shown below the DNA strand.Gene orientation is represented by arrows.(B) Percent identity of the three pairwise nucleotide alignments, constructed using OWEN (Ogurtsov et al., 2002) and computed using a sliding window of 250 bases with an overlap of 60%.Note the regions of conserved nucleotides in intergenic regions which may represent conserved non-coding regulatory regions.(C) Number of non-synonymous mutations per non-synonymous site plotted for all pairwise comparisons of the three species (see Carlton et al., 2002 for methodology).Synonymous sites are saturated in all pairwise comparisons that include P. vivax (data not shown).Generation of synteny maps between species can also help identify chromosomal rearrangement events that may have led to speciation.Several of the breaks in synteny between the P. falciparum and P. y. yoelii genomes were found to be located within areas containing the rRNA18S-5.8S-28Sgene units, of which there are seven in P. falciparum (Gardner et al., 2002), as well as in regions of the P. falciparum genome containing internal var and rif genes.Thus preliminary evidence exists for one possible mechanism underlying evolution of the Plasmodium genus, that of chromosome breakage and recombination at sites of rRNA genes (Carlton et al., 2002).
Finally, the evolution of P. falciparum has been a matter of much debate (Hartl et al., 2002), with one camp firmly of the view that P. falciparum is an ancient species, and the other that the species is of recent origin having emerged through one or several genetic bottlenecks.A genomics approach to tackling the question was undertaken recently with the creation of a SNP map of P. falciparum chromosome 3 from five parasite clones, which gave further credence to the view that the parasite is a genetically diverse and ancient species (Mu et al., 2002).Comparative SNP studies of the syntenic region in P. vivax are underway and have provided evidence that P. vivax too has a highly diverse genome with an evolutionary history possibly parallel to that of P. falciparum (Feng et al., 2003).

Comparative Studies of Molecular Mechanisms Underlying Shared Phenotypes
Identification in one species of the ortholog of a candidate gene from a second species is important for crossspecies comparison of gene function, and evaluation of molecular mechanisms associated with a shared phenotype.As an example, identification of the ortholog of the P. falciparum chloroquine resistance gene, pfcrt, in the P. vivax genome enabled comparison of the molecular mechanism of resistance to chloroquine in both species (Nomura et al., 2001).A 350 kb YAC containing the P. vivax ortholog pvcg10 was partially sequenced, and orthologs of genes in the same order and orientation as those flanking the pfcrt gene in the P. falciparum genome were identified, distinguishing pvcg10 gene as the true ortholog of pfcrt.However, mutations in the pvcg10 gene did not correlate with chloroquine resistance in P. vivax isolates, demonstrating that all pleiotropic functions are not necessarily shared between orthologs.Orthologs of the cg10 gene from P. knowlesi and P. berghei were also sequenced and used to infer the ancestral haplotype of pfcrt.Since chloroquine-resistant P. falciparum isolates contain pfcrt alleles that deviate significantly from this haplotype, construction of the canonical sensitive allele through analysis of the gene in other model malaria species, enabled identification of the gene as being under strong selective pressure in P. falciparum.
Although in this instance the molecular mechanism underlying chloroquine resistance in two human malaria species was found to be different, rodent malaria models in particular have been used widely to study drug resistance in P. falciparum (Carlton et al., 2001a).While the mechanism of resistance in some instances has been found to be remarkably similar between the species (such as the molecular basis for pyrimethamine resistance, which in many malaria species involves a single point mutation in the drug target dihyrofolatereductase), the fact that the molecular mechanism can vary among different species does not negate the value of investigation into the phenotype in Plasmodium models.Such exploration provides an additional level of insight into the biology of the organism which may be valuable in other areas of Plasmodium research.

Gene Prediction and Annotation Refinement
Comparative genomics lends itself readily to the simultaneous annotation of syntenic regions in multiple species.Both gene models and accurate exon/intron boundaries can be difficult to predict in cases where little experimental evidence exists for verification, and where genome bias confounds the issue, as has been the case for gene prediction in P. falciparum (Gardner et al., 2002;Hall et al., 2002;Hyman et al., 2002).Access to gene models from two or more species provides a way to check and improve on existing models, as shown in Figure 4.A global alignment of a 40 kb syntenic region from P. falciparum, P. vivax and P. y. yoelii shows that the structure and length of the gene models predicted in the three species using various gene prediction algorithms are in good agreement with each other.Four gene models in P. falciparum were altered to match those in P. vivax and P. y. yoelii; in all cases, the alternative P. falciparum model corresponded to an initial prediction made by one of the algorithms and subsequently discarded as a candidate for the final model.One gene model in P. falciparum (between genes 8 and 9) was excluded since it was not detected in either of the other species by any of the gene prediction algorithms.Thus, annotation of multiple Plasmodium genomes can aid in the verification and perfection of gene models in syntenic regions.

Phylogenetic Footprinting
Figure 4 also shows the power of global alignments for identification of conserved intergenic motifs (phylogenetic footprints) that may be involved in gene regulation.Little is known concerning DNA elements that direct the transciption of Plasmodium genes (Horrocks et al., 1998;van Lin et al., 2000).However, promoter elements from one species can function in other species (Crabb et al., 1996), which indicates a significant functional conservation of elements between different Plasmodium species.As outlined above, alignment at the DNA level shows coding regions to be highly conserved between Plasmodium species, as shown by overlapping peaks and troughs of the pairwise comparisons that coincide with exons in the gene models in Figure 4B.(An exception in the example shown is gene 9, annotated as a hypothetical gene, for which very little similarity is found at the nucleotide level between P. vivax and the other two species.This difference is due at least in part to a marked shift in amino acid composition in this protein, with the (A+T)rich codons coding for amino acids isoleucine, tyrosine, asparagine and lysine making up 50% of the protein in P. falciparum but only 20% in P. vivax, which exhibits a more balanced amino acid composition.)However, the pattern of conservation within the coding regions differs markedly between genes; while some are conserved in their entirety (e.g., gene 10), others demonstrate fluctuation of conservation along the length of the gene (e.g., gene 1).In contrast, the similarity between species in intergenic regions is almost negligible, a situation mirrored in syntenic comparisons of mouse and human (Jareborg et al., 1999).Since non-coding and silent positions in intergenic regions are mostly saturated (Carlton et al., 2002), sequence similarity in these positions must be restricted to regions under selection.Prime phylogenetic footprint candidates are motifs conserved across all three species, some examples of which can be seen in Figure 4B.Phylogenetic footprinting has already been used successfully to detect conserved motifs in several eukaryotic lineages (Bergman et al., 2001;Wasserman et al., 2000;Webb et al., 2002).Studies in Plasmodium will continue and expand to encompass alignment of genes known to be expressed at certain stages of the life-cycle (J.Silva and J. Carlton, unpublished).

Identification of Genes Under Selection Pressure
Multiple alignments of syntenic regions can be used in conjunction with simple molecular evolution methods to group Plasmodium genes according to the degree of selective pressure acting upon them.Similar methodology has been used on other organisms (Endo et al., 1996), and in a few single gene studies in species of Plasmodium (Black et al., 1999;Escalante et al., 1998).With the release of large Plasmodium genome datasets, however, this can now be achieved on an automated whole-genome scale.As a detailed example, Figure 4C shows the number of non-synonymous substitutions (those that give rise to a change in amino acid) per non-synonymous site (d N ) for each of twelve orthologs in P. vivax, P. falciparum and P. y. yoelii.The degree of similarity in non-synonymous sites is roughly the same in the three pairwise comparisons for each gene, which suggests that these three species are approximately equidistant in evolutionary terms.However, the genes exhibit a wide spectrum of evolutionary rates, with some genes evolving under very strong 'stabilizing selection' (e.g., gene 10; d N = 0.01) while others seem to be evolving under 'diversifying selection' (e.g., gene 12; d N > d S > 1.0).Differences in evolutionary rate among genes can be attributed to differences in the nature and degree of the selective constraints acting upon each gene.Comparison of d N rates with gene function for genes 10 and 12 reveals that the highly conserved gene10 codes for the 60S ribosomal protein L44, a member of a highly conserved protein family found in widely divergent taxa such as mammals, protozoa and Archaea.In contrast, the highly divergent gene 12 codes for the circumsporozoite surface (CS) protein, a molecule found on the surface of Plasmodium sporozoites and known to interact directly with the host immune system.This class of gene is expected to differ greatly between species since its evolution is fast and dependent on interactions between each Plasmodium species and its host.
Since proteins of genes evolving under strong diversifying selection are likely to be in contact with the host immune system or to be targets of drug therapy, they represent good candidates for further study.Studies are underway to use this method to identify additional genes in this class (J.Silva and J. Carlton, unpublished).Furthermore, this analysis should identify species-specific genes that appear to be under diversifying selection in one species but not in others.In addition, extending this evolutionary analysis to encompass the whole genome will allow us to determine whether a non-synonymous divergence rate of 30% to 50% between the oldest branches of the malaria tree is indeed the norm.

The Future of Plasmodium Comparative Genomics
Comparative studies of model malaria parasites with the human malaria species they exemplify provide an invaluable additional level of insight into the biology of the organism and its interaction with host and vector.There is no doubt that model malaria species provide important knowledge through analogy or contrast with what is known concerning human malaria species.This interaction is set to be transfomed over the next few years as genome-wide comparisons of malaria species become possible on a scale not previously seen.Through the construction of genome-wide synteny maps, it will be possible to identify orthologs of human and model malaria parasites even in cases where sequence similarity is low in less well conserved genes, as is the case for many genes that encode surface-expressed proteins.Gene expression data from different transcriptome and proteome studies will enable the expression profile of a gene to be catalogued and compared in a variety of different species.However, further development of genetic manipulation technologies for use in Plasmodium will become increasingly necessary as a means to determine gene function and phenotype.High-throughput methods in particular, such as those developed for gene deletion-mutants in yeast (Giaever et al., 2002) and RNAi in Caenorhabditis elegans (Kamath et al., 2003), will be of immense value if they are transferable for use in Plasmodium.

Figure 1 .
Figure1.Schematic showing the tiling path of P. y. yoelii contigs along chromosome 10 of P. falciparum determined using MUMmer.The x-axis represents chromosome 10 (1.7 Mb), with vertical bars representing each P. y. yoelii contig that matches the P. falciparum chromosome.The percent similarity at the amino-acid level is given for each contig on the y-axis.The majority of contigs could be linked by PCR into two syntenic groups, and physical map data identified these as being located on P. y. yoelii chromosomes 12 and 5.Note the paucity of P. y. yoelii contigs with matches to the telomeric/subtelomeric ends of the P. falciparum chromosome, indicative of the species-specific immunodominant antigen genes located there.
Invaluable source of information and essential reading for everyone working with probiotics, prebiotics, microbiotflora.ISBN 978-1-910190-04-3 £180/$360 A thorough and up-to-date review of vaccinology research in age of omics technologies.Essential reading.

Table 2 .
Plasmodium genome characteristics A. Comparison of general genome characteristics from six Plasmodium genome datasets.
a Likely to be an over-estimate due to inclusion of partial genes; b Determined from karyotype data; ND: not determined.B.Comparison of Plasmodium coding and non-coding regions.