Genomes of Protozoan Parasites 61 Genomes and Genome Projects of Protozoan Parasites

Protozoan parasites are causing some of the most devastating diseases world-wide. It has now been recognised that a major effort is needed to be able to control or eliminate these diseases. Genome projects for the most important protozoan parasites have been initiated in the hope that the read-out of these projects will help to understand the biology of the parasites and identify new targets for urgently needed drugs. Here, I will review the current status of protozoan parasite genome projects, present findings obtained as a result of the availability of genomic data and discuss the potential impact of genome information on disease control.


Introduction
The resurgence of infectious diseases world-wide has been a major impetus in increasing research activities.The World Health Organisation (http://www.who.int/tdr/) has identified ten major, yet neglected, infectious diseases (African trypanosomiasis, Chagas disease, dengue fever, lymphatic filariasis, leishmaniasis, leprosy, malaria, onchocerciasis, schistosomiasis, and tuberculosis) that are the focus of intense efforts to control or even eradicate the organisms that cause these diseases.Four diseases of this list are caused by protozoan parasites (African trypanosomiasis, Chagas disease, leishmaniasis, and malaria), causing more than 1.3 million deaths annually, possibly as many as 4 million.It is anticipated that the genome information of these parasites will be a major contributing factor in the translation of basic research into applications pertinent to disease control (Hoffman, 2000).In contrast to the human genome project, data of parasite genomes are immediately released into the public domain.It is hoped that the free access and dissemination of the data will de-monopolise research and will lead to a strong involvement of research communities in developing countries where these diseases are most prevalent (Varmus, 2002).Initiatives involving the genome research centres world-wide have been founded around 1996/97 and as a first major landmark in their combined efforts the complete genome sequence of the most important of these parasites, Plasmodium falciparum, has recently been published (Gardner et al., 2002a).In this review I illustrate the scientific framework of genome projects using three representative parasites (African trypanosomes, Leishmania and Plasmodium).Several other projects (Table 1) are at various stages of progress, but the reasoning and motivations for genomic research is virtually identical between all of them (Tarleton and Kissinger, 2001).

The Plasmodium falciparum genome
More than 1 billion people are estimated to carry malariacausing parasites at any one time.The annual mortality rate is between 0.5-3 million people.A massive increase in population, a deterioration in public health services and infrastructures and the problems associated with now widespread drug resistance has led to a re-emergence of malaria as one of the most serious diseases world-wide (http://www.who.int/tdr/diseases/malaria/default.htm).After a period of optimism, mainly due to the relatively successful reduction of the malaria-transmitting mosquitoes with the pesticide DDT, more people die know of the disease than 40 years ago (Guerin et al., 2002a;Miller and Greenwood, 2002).An important factor contributing to the failure to reduce the burden of this disease is the ever decreasing number of drugs that are still effective in treatment.Chloroquine, once the most important anti-malaria drug, is almost useless due to the spread of resistance parasite populations in virtually all endemic areas.Likewise, resistances are now observed for every anti-malaria drug with the exception of artemisinin derivatives (Wellems and Plowe, 2001;Wellems, 2002).Projects to develop vaccines, a strategy probably more appropriate in areas were continuous drug supplies are difficult to maintain, have so-far failed (Richie and Saul, 2002).Therefore, after a period of four decades of relative stagnation in malaria research, it has been recognised that a massive effort at all levels is required to control this disease.Inspired by the successfully initiated large-scale genome projects (human and yeast) and the rapid completion of the first genomes of pathogenic bacteria (Haemophilus influenzae and Mycoplasma genitalium in 1995) the Plasmodium falciparum genome project consortium was founded in 1996 with the objective to sequence the entire genome of Plasmodium falciparum, the most dangerous of the four malaria-causing Plasmodium species (Dame et al., 1996;Hoffman et al., 1997).The major motivation and justification for this project was the prospect that the complete genome information would be the basis for a catalogue of new drug and vaccine candidates.
The genome of P. falciparum is approximately 23 x 10 6 bp in size, comparable to the size of other protozoan parasites (see Table 1).It consists of 14 chromosomes, ranging from 0.6 -3.5 x 10 6 bp (Gardner, 2001).Chromosomes, with the exception of 6,7 and 8, can be resolved by pulsed-field gel electrophoresis and therefore, given also the technical and computational limitations at the beginning of the project, a whole chromosome shotgun sequencing strategy was used to sequence the genome.A major technical challenge was the extremely high A/Tcontent of the genome.On average it is 80.6% but can 1 Estimated by extrapolation from the gene number on already finished chromosomes or contigs rise to more 90% in introns and intergenic regions.In some regions, short 2-3 kb imperfect tandem repeats that are assigned as putative centromeres, the A/T content is higher than 97%.This unusual base composition makes both sequencing and assembly difficult and is one of the reasons why sequencing the whole genome has taken a comparatively long time.New software and improved sequencing tools had do be developed to overcome this major obstacle.Optical mapping techniques were used to support the assembly of sequence data into large contigs (Lai et al., 1999).In 1998 and 1999, respectively, the sequences of chromosomes 3 and 2 were published and in 2002 the entire genome sequence was published (Gardner et al., 1998;Bowman et al., 1999;Gardner et al., 2002a;Gardner et al., 2002b;Hall et al., 2002;Hyman et al., 2002).At the same time both the genome sequence of the model rodent malaria parasite Plasmodium yoelii yoelii and the sequence of the genome of the insect vector of P. falciparum, the mosquito Anopheles gambiae were released (Carlton et al., 2002;Holt et al., 2002).The genome data for a range of different malaria parasites (see Table 1) will be useful in several ways (Carlton et al., 2001;Thompson et al., 2001;Waters, 2002).As mentioned above, the A/T-bias and the presence of introns makes the in silico identification of genes difficult.The availability of several genomes with a high degree of synteny, but differences in detail, will aid the completion of gene annotation.More than 60% of genes identified in P. falciparum have orthologues in P. yoelii.This degree of similarity is, however, surprisingly low given the fact that both parasites cause malaria in mammals.Most strikingly, P. yoelii and P. falciparum genes that are involved in antigenic variation and virulence share few common features between the sequences (Doolittle, 2002).Many of these genes are, however, located in subtelomeric locations on chromosomes, suggesting that underlying mechanisms that drive antigenic variation are shared between different species of Plasmodium (Scherf et al., 2001).The divergence of genes involved in antigenic variation between species makes comparative genomics/ proteomics with respect to host-parasite interactions less useful than was hoped for (Doolittle, 2002).Nevertheless, laboratory models of malaria will be of great use to functionally characterise and exploit genomic information in the future (Carlton and Carucci, 2002).
The mosquito Anopheles gambiae is the most important vector for transmission of malaria parasites (but by no means the only one -there are about 60 other mosquito species capable of malaria transmission) (Spielman and D'Antonio, 2001).The major interest in the Anopheles genome will focus on the specific features of the insect enabling co-existence with the parasite (De Gregorio and Lemaitre, 2002).Firstly, comparison with the genome of Drosophila melongaster, also of the order Diptera, will reveal potential differences that are the basis the different lifestyles of the two insects.An example is the identification of a 58 of fibrinogen-like molecules in Anopheles that probably act as anti-coagulant for an ingested blood meal.In Drosophila only 13 of these genes were identified (De Gregorio and Lemaitre, 2002).Secondly, using techniques such as microarray analysis, the impact on the expression of certain genes in infected and non-infected insects can be measured.Such data will be of practical consequence as one of the strategies to combat malaria is to release transgenic mosquitoes into the wild that are incapable of parasite transmission.This will require an extensive knowledge of parasite-vector interactions, but also other issues such as mating behaviour as the modified mosquitoes have to breed efficiently with wild-type insects to drive the transgene into the population.Transcriptional profiling will also help to identify genes involved in insecticide resistance -an increasing problem to malaria control (Hemingway et al., 2002).
Returning to the Plasmodium genome -what are the main findings?A total of 5268 genes is predicted, 60% of these genes have no assigned function (Gardner et al., 2002a).The overall structure of the chromosomes confirms in much detail previous observations.The subtelomeric regions of all 14 chromosomes are enriched in genes involved in immune evasion and pathogenesis (Crabb and Cowman, 2002).The three known gene families (59var, 149 rif and 28 stevor genes in the genome reference strain 3D7) are preferentially clustered in telomeric regions: a search of the P. falciparum genome database PlasmoDB (http://plasmodb.org)reveals that 29 var, 44 rif and 5 stevor genes are within 20 kB distance of telomeres (Kissinger et al., 2002).The proteins encoded by var and rif-genes (PfEMP1 and rifins) are expressed on the surface of infected erythrocytes (Kyes et al., 1999;Kyes et al., 2001) whereas stevor gene products appear to localise to internal membrane structures of erythrocytes (Kaviratne et al., 2002).The function of PfEMP1 proteins in sequestration of infected red blood cells is well established but the involvement of rifins and stevor proteins in antigenic variation is hypothetical.The host is able to mount an efficient immune response against PfEMP1 which the parasite undermines by transcriptional switching between different var genes (Wahlgren and Bejarano, 1999;Cooke et al., 2000;Kyes et al., 2001).As we will see when discussing the genome data of another parasite capable of antigenic variation, Trypanosoma brucei, the telomeric localisation of genes involved in immune evasion is a common feature of such parasites (Barry et al., 2003).In addition to the presence of these protein coding genes a complex array of repetitive sequence elements has been identified in telomeric locations.Although some of these elements were known the complexity and extent of their arrangement become only clear after assembling large telomeric and subtelomeric chromosomal regions (Gardner et al., 2002a).It has been hypothesised that the presence of large blocks of near-identical interchromosomal sequences leads to the observed clustering of chromosome ends, thus facilitating ectopic recombinational activities (Freitas-Junior et al., 2000;Scherf et al., 2001;O'Donnell et al., 2002).The high degree of conservation observed between subtelomeric regions in P. falciparum could be the result of frequent inter-chromosomal exchange leading to sequence homogenisation.Genes involved cytoadherence and antigenic variation, associated noncoding regions (promoters, intergenic regions and subtelomeric repeats) occupy around 10% of the total genome of P. falciparum.
A surprising result of the gene analysis is the parasite's lack of some enzymes (such as the F0 a and b subunits of the ATP synthase complex) that are required for a functional mitochondrion in terms of energy (ATP) production using an electrochemical membrane gradient (Wirth, 2002).Enzymes of other metabolic pathways (TCA-cycle and oxidative phosphorylation) have been identified.Overall, the proportion of enzymes within the entire genome is much smaller than that calculated for other organisms.Whereas in P. falciparum ~8% of all predicted genes were assigned Enzyme Commission (EC) numbers, the proportion in S. cerevisiae is 17%.This could point to a reduced complexity of metabolic pathways, but could also be a consequence of the technical difficulties to identify orthologous genes in silico.A similar picture emerges for the analysis of transporter proteins.In comparison to other eukaryotic genomes, P. falciparum possesses a very limited number of membrane transporters that are required for nutrient uptake.This is particularly valid for transporters of organic nutrients and less obvious for inorganic ion transporters.A potential explanation for this observation is the lifecycle of the parasite.It is likely that a stable supply of nutrients is present both during the insect stage and the mammalian (intracellular) stage of the development.These relatively stable environments obviate the need for maximum adaptability to both qualitatively and quantitatively fluctuations in nutrients encountered by free-living organisms.It will be interesting to see whether genome statistics of other eukaryotic parasites mirror the situation in Plasmodium.
One of the most significant findings of the genome analysis so far are related to the function of an organelle unique to Apicomplexa, the apicoplast (Roos et al., 2002).The apicoplast is a spherical structure of approximately 1 µm in diameter, surrounded by four (in Toxoplasma) or three (in Plasmodium) membranes (Köhler et al., 1997;Hopkins et al., 1999).This plastid harbours a 35 kb circular genome (Wilson and Williamson, 1997).The presence of more than two membranes and the phylogenetic analysis of the 35 kb DNA suggests that this organelle was acquired by secondary endosymbiosis of a red alga that contained the evolutionary remnants of a cyanobacterium .A number of drugs (fluoroquinolones, clindamycin), known to inhibit certain functions in prokaryotes, have been shown to be effective against apicomplexan parasites by interfering with apicoplast function (Fichera and Roos, 1997).This not only supports the endosymbiosis hypothesis but also shows that a functional apicoplast is essential for parasite survival.As only few apicoplast proteins are actually encoded by the 35 kb circle, most organelle-specific proteins are encoded by the nuclear genome and have to be targeted to and imported into the apicoplast (Waller et al., 1998;Roy et al., 1999).Targeting is achieved by the presence of a bipartite signal sequence at the N-terminus of apicoplast proteins consisting of a 15-40 amino acid signal sequence of hydrophobic nature responsible for ER translocation and a subsequent 100-150 amino acids transit peptide sequence that targets proteins into the apicoplast, presumably by interacting with an as yet unidentified import machinery.Mining the Plasmodium database by applying algorithms that identify typical biochemical features (mainly amino acid features that affect charge distribution) of the transit peptide sequence it was possible to identify more than 450 putative apicoplast proteins (Zuegge et al., 2001;Foth et al., 2003).Given a total of about 5300 genes this represents more than 8% of the total genome.Using these data it was possible to piece together a number of metabolic pathways that are either partially or entirely localised inside the apicoplast.Some of these pathways are not present in the mammalian host and enzymes involved in such specific functions are potential drug targets.One enzyme (DOXP reductoisomerase) involved in the 1-deoxy-D-xylulose 5 phosphate (DOXP) pathway of isoprenoid synthesis is efficiently inhibited by fosmidomycin, a substance originally identified to inhibit the same enzyme in bacteria and plants (Lell et al., 2003).By inhibiting this essential and parasitespecific metabolic pathway parasitemia can be cleared in a mouse malaria model (P.vinckei), in red blood cell cultures of P. falciparum and, more recently, in initial clinical trials on malaria-infected humans (Jomaa et al., 1999;Missinou et al., 2002).Due to the high rate of recrudescence observed after a short treatment regime with fosmidomycin, combination therapies with clindamycin (see above) are now explored to increase the clinical suitability.Likewise, other components of the DOXPpathway have been identified and are now evaluated as anti-malaria drug targets (Altincicek et al., 2002;Kollas et al., 2002).In the context of identifying new drug targets it is remarkable that the biocide triclosane, previously thought to be a fairly unspecific bactericide, has now been found to specifically inhibit the enzyme enoylreductase of the type II fatty acid synthesis pathway in bacteria (Beeson et al., 2001;Kapoor et al., 2001;Suguna et al., 2001).Surprisingly, this pathway is also present in Apicomplexa and localised in the apicoplast (Waller et al., 2003).The currently widespread use of triclosane in many consumer products will most likely lead to the emergence of resistance against this drug and render it useless both as a bactericide and a potential anti-malaria drug.A ban of the indiscriminate use of triclosane is therefore urgently needed (Schweizer, 2001;Tan et al., 2002).
Although some of the metabolic pathways are not found in the host, they are not specific to P. falciparum but are present in other Apicomplexa as well.This offers the possibility that model systems can be used to investigate the molecular components and characteristics of such pathways in much more detail than it is possible in P. falciparum (Mota et al., 2001;Carlton and Carucci, 2002;Waters, 2002).For example, the related parasite Toxoplasma gondii is a much more tractable organism where the whole gamut of molecular techniques (e.g.mutagenesis, gene replacement, ectopic expression) can be employed with ease, something currently still difficult to do in P. falciparum.This demonstrates one the advantages of conducting parallel genome projects on related organisms, such as Toxoplasma gondii, Theileria ssp. or Eimeria tenella.The latter two parasites are also of importance as they cause diseases in domestic animals (Table 1).

Analysing the P. falciparum proteome
Both the development of drugs and vaccines against the parasite requires knowledge about the potential targets and the expression profile of targets.Most parasites undergo complex lifecycles involving developmental changes in the host and vector.On a molecular level these developmental cycles are mirrored by differential expression patterns of a number of genes that encode proteins relevant to a particular stage of the life cycle.Whereas the genome data set gives a blueprint of the entire gene complement only a detailed transcriptional or, preferably, proteomic analysis will provide information about the temporal and developmental implementation of this genetic information (Carucci et al., 2002).Therefore, concomitant with the genome project, a proteome analysis of different stages of the life cycle of P. falciparum was undertaken (Florens et al., 2002;Lasonder et al., 2002).Protein lysates prepared from various life cycle stages were analysed by tryptic peptide fingerprinting using HPLC and tandem mass spectrometry.Three main general findings emerged from this analysis.First, the number of differentially expressed proteins was surprisingly large.Almost half of the detected proteins of the sporozoite stage (the mosquito stage that is infective) were specific to that stage.Even different stages within the mammalian host (trophozoites, merozoites, gametocytes (Bannister et al., 2000;Smith et al., 2002) ) had between 20 and 30% unique proteins.Only 6% of proteins detected in this study were shared between all four stages (Florens et al., 2002).Also, in sexual stages of the parasite (gametocytes and gametes) 575 out of a total of 1289 proteins were not found in asexual stages (Lasonder et al., 2002).It remains to be seen whether these findings reflect in all cases true differences in expression patterns between stages or can partially be explained by technical issues.Between 75% (Lasonder et al., 2002)and 54% (Florens et al., 2002) of all predicted gene products were not detected by the two published studies.The reasons can be of technical (solubility, sample preparation etc.) or biological nature (low abundance, expression in stages not analysed etc.).Secondly, the expression pattern of one of the major virulence factors, PfEMP1 and rifins (encoded by var and rif genes) was unexpected because they were both found to be already expressed by the sporozoite (insect) stage.This, however, is reminiscent of the situation in Trypanosoma brucei where a particular variant of the major surface glycoprotein that has its function in immune evasion in the bloodstream of the mammalian host, is already expressed in the insect stage of the parasite that is infective to mammals (Ginger et al., 2002).It will be interesting to find out whether in Plasmodium a particular subset of var and rif genes is expressed in sporozoites.Thirdly, the proteome analysis identified several groups of coexpressed proteins.Some of these co-expressed groups are physically linked as small gene clusters on chromosomes.This observed co-expression indicates coregulation of expression and possible functional associations between co-expressed proteins.It should be noted that a proteome analysis is not only helpful to analyse expression patterns of predicted genes but is also valuable to achieve a complete annotation of the genome.Currently, the cut-off length of automatically annotated putative genes is arbitrarily set at around 300 bp.Proteome analysis of yeast has, however, shown that even smaller open reading frames code for proteins (Oshiro et al., 2002).Also, missed exons in the genome of Plasmodium could lead to the identification of incomplete genes.Proteome analysis and comparative genomics will both be valuable tools to improve the quality of gene identification.

The genomes of Kinetoplastida
The order of Kinetoplastida includes three parasite complexes of medical importance: Leishmania (several species), Trypanosoma cruzi and Trypanosoma brucei (several species).The species Leishmania major and L. donovani are widespread in South-and Central America, the Mediterranean region, Africa, the Middle East and Indian subcontinent (Herwaldt, 1999).L. major is the cause of cutaneous leishmaniasis, usually a mild and often selfhealing disease, and the L. donovani species complex causes visceral leishmaniasis (Kala-Azar), a much more severe and life-threatening disease that affects internal organs such as spleen or liver.The parasites is transmitted by about 20 different species of sandflies.The second group of kinetoplastid parasites comprises Trypanosoma cruzi, the causative agent of Chagas disease (Rodriques Coura and de Castro, 2002).Its geographical distribution is restricted to South-and Central America.The vector is mainly transmitted by triatome bugs via faecal contamination of bite sites.The third group includes the Trypanosoma brucei complex (Hide, 1999).This parasite causes sleeping sickness and is restricted to sub-Saharan Africa.It is transmitted via the bite of tsetse flies.Two different species, T. brucei rhodesiense and T. b. gambiense, are responsible for an acute and chronic form of the disease, respectively.For disease control it is important to note that, in contrast to malaria, transmission of Leishmania ssp.and Trypanosoma ssp. is zoonotic with considerable host reservoirs amongst wild and domestic animals.Although kinetoplastid parasites share many aspects of their basic biology, their parasitic lifestyles and the associated biology related to pathogenicity are fundamentally different.The most obvious difference is that Leishmania and T. cruzi are intracellular parasites, whereas T. brucei is extracellular.Therefore, I will deal with their respective genome biology separately and point out common features where relevant.Due to the status of progression of the respective genome projects, only the Leishmania and T. brucei projects will be covered in this review.

The Leishmania major genome
Leishmania major (strain 'Friedlin') was chosen as the genome reference strain for technical and biological reasons: although a laboratory strain, it can still be passaged through the sandfly vector, it has served as a model for immunological aspects of infection and some stages of its lifecycle can be reproduced in vitro (Ravel et al., 1998).Throughout its lifecycle it is diploid and sexual stages and meiosis/genetic exchange have not been observed.The nuclear (haploid) genome size is approximately 34 Mb and the genome is composed of 36 chromosomes, ranging from 2.5 Mb to 0.3 Mb (Wincker et al., 1996) .To sequence the entire genome, large insert genomic cosmid libraries were constructed, mapped to individual chromosomes separated by pulsed field gel electrophoresis, and finally sheared into shorter fragments and sequenced.The complete sequence of the smallest chromosome (chromosome I, ) was published in 1999 and several more (chr 3, 4, 5, 24) have been completed since (Myler et al., 1999).The estimated total number of genes is in the region of 8000.In January 2003 about 7500 genes were deposited in the central Leishmania major database GeneDB (http://www.genedb.org/genedb/leish/index.jsp).53% of these genes are currently unclassified in terms of potential functions, a proportion similar to that found in Plasmodium falciparum.
The overall sequence and organisation of the published chromosome I data revealed interesting features congruent with our understanding of some molecular mechanisms of Leishmania and other kinetoplastids.One of the unique features of the genome of trypanosomatids is the way genes are transcribed (Johnson et al., 1987;Lee and Van der Ploeg, 1997).Unlike other eukaryotes, genes are transcribed into mRNA as polycistronic pre-mRNAs that are co-transcriptionally spliced into gene-sized units.In contrast to bacterial operons co-transcribed genes are rarely functionally related and no promoters have as yet been identified that drive transcription of polycistronic transcription units.Polycistronic transcription is reflected in the regional organisation of genes on chromosomes as genes that are part of a particular transcription unit are located on the same coding strand.It was, however, surprising to see that chromosome I has a strictly bidirectional structure: from an inversion point located about 70 kb distal from the left telomere, 29 genes are encoded on the same strand facing towards the left telomere and the remaining 50 genes are encoded by the opposite strand towards the right telomere (Donelson et al., 1999).Transcription is directed away from the point of inversion (divergent).It is tempting to speculate that we are looking at two massive polycistronic transcription units.Nuclear run-on analysis suggests that this is indeed the case (Myler et al., 2001).Analysis of the sequences of other chromosomes has revealed a similar organisation, although transcription can also be directed away from the telomeres (convergent).On the largest of these chromosomes, chromosome 35, several inversion points appear to exist, but the sequence of this chromosome is incomplete and the final product may change the current picture.Interest has focused on the role of the inversion point.It was hypothesised that they might function as origins of replication, promoters for RNA-polymerase II based transcription or act as centromeres (Donelson et al., 1999;McDonagh et al., 2000).In a recent study deletion mutants were created that lacked the inversion point on chromosome I (Dubessay et al., 2002a).Analysis of the cell lines revealed that mitotic stability of the truncated chromosomes was not affected and neither was the ability to express marker genes introduced in close proximity of the inversion point.This argues against a function as centromere, origin of replication or initiation point of transcription.It was, however, remarkable that it was impossible to create a null mutant of this region.The attempt to delete this region on all alleles of chromosome I was countered by the cell by producing an extra copy of the entire chromosomes that contains the inversion point.The conclusion of this result was that the deleted region is essential for parasite survival.The same group also created cell clones with truncations of chromosome 1 that delete all coding DNA sequences from this chromosome (Dubessay et al., 2002b).This truncated chromosome was mitotically stable and it was speculated that telomeric or subtelomeric repeat elements may act as centromeres.Unusually, this chromosome also contained a core region made up of tandemly repeated copies of ectopic plasmid vector sequence, used for creating the deletion and selecting for transformants, stretching over 105 kb.Whether this additional repetitive DNA sequence has an influence on stability is not yet established.
One of the driving forces for this, as for any other, genome project is the prospect that the analysis of the genetic information will identify new drug targets.Experience from Leishmania, however, demonstrates that this route can be very tricky indeed.The surface of Leishmania cells is covered with a dense coat of glycoconjugates, amongst them membrane-bound lipophosphoglycan (LPG).Their extracellular location and abundance has led to the suggestion that these molecules are major virulence factors and may be important in establishing an infection and in parasite survival inside the host (Turco et al., 2001).Using biochemical approaches and aided by available genome sequence information the biosynthetic pathways of LPG assembly have been established.Knowing the key enzymes it was possible to experimentally test the virulence hypothesis by knockingout an essential enzyme of the biosynthetic pathway and thereby preventing the synthesis of the final LPG.When this experiment was done in L. major, the genome strain, the hypothesis was confirmed as the parasite lost its virulence (Späth et al., 2000).However, when the equivalent LPG was knocked-out in L. mexicana, a closely related species also causing cutaneous leishmaniasis, no effect on virulence was observed (Ilg, 2000).These data demonstrate clearly the need for very detailed comparative studies even between closely related parasites to identify common biological traits of pharmacological interest.New drugs against the various clinical manifestations of leishmaniasis are urgently needed.Only three classes of drugs, pentavalent antimony compounds and, as a second line of treatment, amphotericin B and pentamidine, are available.All these compounds have disadvantages, such as difficult administration or toxicity (Guerin et al., 2002b).In recent years only one compound, miltefosine, has been added to the list of promising anti-leishmaniasis drugs (licensed for use only in India so far) (Sundar et al., 2002).The history of miltefosine, an alkylphosphocholine probably interfering with lipid metabolism (Lux et al., 2000), demonstrates another paradigm in the development of antiparasitic drugs.Miltefosine was first developed as an antitumor agent but turned out to be clinically ineffective.Only much later its effectiveness against leishmaniasis was discovered (Croft et al., 1996;Croft et al., 2003).As the development of new substances from synthesis to clinical evaluation is long and extremely expensive, it is difficult to see a commitment from the private sector to initiate such programmes in order to develop drugs specifically directed against parasitic diseases prevalent mainly in poor countries.Therefore, the 'hi-jacking' of already existing compounds for the treatment of parasitic diseases is a perfectly reasonable approach to reduce time of development and costs.As we have seen in the case of drugs effective against malaria (e.g.fosmidomycin), this 'piggy-bag' approach to increase the number of potential drugs appears to be the most promising option to date (Gelb and Hol, 2002).Genomic information will be of immense use to identify cellular components of the parasite (individual molecules, biochemical pathways, or compartments such as membranes) that might interact with already existing pharmacological substances.

The Trypanosoma brucei genome
Parasites belonging to the Trypanosoma brucei species complex are the causative agents of trypanosomiasis or African sleeping sickness (T.b. rhodesiense and T. b. gambiense) and nagana, a wasting disease of domestic cattle of major economical impact (T.b. brucei).The geographical distribution of these disease-causing trypanosome species is restricted to Sub-Saharan Westand East-Africa (Hide, 1999).60 million people are currently at risk from the disease and according to WHO estimates between 300,000 -500,000 are infected at any one time (http://www.who.int/emc/diseases/tryp/).The parasite is transmitted through the bite of the tsetse fly.The life cycle of the parasite, probably with the exception of T. b. gambiense, is zoonotic with large wild animal reservoirs, making control measures difficult.Recent years have seen a resurgence of the disease, even in areas that were previously unaffected (Fevre et al., 2001;Hutchinson et al., 2003).Clinically, the disease caused by the two humaninfective forms are classified into a chronic form, that can last symptom-free for several years (caused by T.b.gambiense) and an acute form (T. b. rhodesiense).During stage I of the disease, the parasite circulates in the blood and other body fluids or tissues.Only in stage II, after the parasite has penetrated the blood-brain barrier the disease has an inevitable fatal outcome if not treated.There is a limited choice of drugs available for treatment of trypanosomiasis (Legros et al., 2002).Pentamidin and suramin are the first choice drugs against stage I of the disease and have been in use for more than 60 years; melarsoprol, eflorithine and nifurtimox are used against stage II.Melarsoprol, since its first use 1949 the first choice for stage II treatment, is an organo-arsenical compound and highly toxic.Up to 5% of the fatality rate of treated trypanosomiasis patients can be attributed to toxic side effects of this drug.Moreover, recent therapeutic failure indicate the emergence of melarsoprolresistant parasites (Matovu et al., 2001a;Matovu et al., 2001b).Eflornithine, first used in 1981, is the only "new" compound registered for trypanosomiasis treatment.The drug is difficult to administer and is less effective against T.b. rhodesiense.Nifurtimox is still under assessment and, although licensed for treating Chagas disease caused by Trypanosoma cruzi in South America, large scale controlled trials have not yet been undertaken to assess its usefulness against African sleeping sickness (Burchmore et al., 2002;Legros et al., 2002).Furthermore, the key drugs to treat the disease (pentamidine, melarsoprol, eflornithine) are currently produced and supplied free of charge as gesture of good will by a pharmaceutical company (Aventis) and the longterm supply is not guaranteed.
The biology of kinetoplastids is distinguished by the presence of a number of biological processes that are either unique to these parasites or have been discovered first in these organisms.The posttranscriptional extensive editing of mitochondrial mRNA by insertion or deletion of uridylate residues and the compartmentalisation of glycolysis in a specific organelle, the glycosome, are examples of unique features (Clayton and Michels, 1996;Guerra-Giraldez et al., 2002;Madison-Antenucci et al., 2002).Trypanosoma brucei has, however, made its entry into general molecular biology textbooks by the phenomenon of antigenic variation (Vickerman, 1985;1989;Cross, 1996).In contrast to other eukaryotic parasites, African trypanosomes are extracellular parasites and do, at no stage during their life cycle, hide inside host cells like Leishmania or Plasmodium.In order to evade the host immune system the parasite has evolved a strategy that is now considered a general paradigm for monoallelic expression of a multigene family.In the bloodstream of the mammalian host the surface of the parasite is covered by a dense coat of about 10 7 identical molecules of a glycoprotein (variant surface glycoprotein, VSG).This molecule is highly immunogenic and within about seven days post-infection the host has mounted a strong immune response and is able to eliminate the parasite.However, within the population of parasites circulating in the blood a few have changed their surface coat to a different isotype of VSG, a process called VSG switching (Vanhamme et al., 2001).This population is unrecognisable to the immune system until a new immune response is developed.By then, another subpopulation with yet another VSG coat will have emerged.The molecular mechanisms of antigenic variation are complex and not fully understood.The overwhelming importance of antigenic variation for parasite survival is, however, reflected in the genome organisation of trypanosomes and appears to be one of the important forces that shapes the overall organisation of the genome.In order to maintain the ability to switch and use new VSG variants the parasite maintains a reservoir of approximately one thousand VSG genes.The distribution of these genes is non-random.A relatively small set of genes is located within so-called expression sites (ES).About 20 ES are located -one per site -in subtelomeric location on most (but not all) of 11 pairs of diploid chromosomes and are, as the name suggests, the sites from where VSG genes are transcribed.However, only one ES is active at a given time within a single cell and all other ES are silenced by an hitherto unknown mechanism.In addition to the few VSG genes harboured in ES, the vast majority of VSG genes are located on internal sites of chromosomes, often in small clusters.The chromosomes referred to so far are also labelled, due to their size range of 1-6 Mbp, megabase chromosomes (Melville et al., 1998).They contain, in addition to VSG-ES and internal VSG genes, all other "housekeeping" genes of the cell.In order to expand its VSG reservoir even further, trypanosomes maintain two extra classes of chromosomes that are, as far as is known, entirely devoted to VSG maintenance.Approximately 5 intermediate chromosomes (ca.200 kilobase pairs in size) also harbour subtelomeric expression sites and about one hundred minichromosomes (30-150 kilobase pairs) also contain subtelomeric VSG genes, although not in the context of an ES (Ersfeld et al., 1999;El-Sayed et al., 2000).Minichromosomal VSG genes have been shown to be a preferred source for a genetic recombination mechanism that leads to the transposition of silent VSG genes into ES on megabase chromosomes from where they can be potentially expressed (Robinson et al., 1999).Therefore, minichromosomes appear to function exclusively as a VSG gene reservoir and are stably maintained during the cell division ( Van der Ploeg et al., 1984;Ersfeld and Gull, 1997).To make things even more complex, two different types of ES sites have been described: bloodstream ES, that are potentially active during the time the parasites resides in the mammalian host, and metacyclic ES, that are active in the tsetse fly during the life cycle stage that is infective to the mammalian host and seem to have the purpose to prime the parasite for life inside the bloodstream after transmission (Barry et al., 1998;Bringaud et al., 2001).Although both ES-types are subtelomeric, their internal architecture is very different (Ginger et al., 2002).Taken together, trypanosomes devote more than 10 % of their genome, which is about 35 Mbp in size, to antigenic variation.In a recent publication several different bloodstream ES sites were characterised at DNA sequence level (Berriman et al., 2002).The theme that emerged from this and previous studies is one of variation and conservation.ES are heterogeneous with respect to their gene content.In addition to a single VSG gene that is located close to the telomeric end of ES, ES contain numerous other genes collectively called ES associated genes.A function for most of these genes has, with one exception, not yet been established.Up to eight different ES-associated genes (ESAGs) can be harboured in one ES, flanked by a promoter and the telomere-proximal VSG gene.These genes, including the VSG gene, are under the control of a RNA-polymerase I-type promoter, an unusual situation as normally all protein coding genes in eukaryotes are transcribed by RNA-pol II (Zomerdijk et al., 1990).A possible hypothesis for the presence of pol I VSG transcription in trypanosomes is the fact that the VSG constitutes about 10% of total cellular protein and that only a highly active polymerase, such as polI, permits transcription at such high levels.Transcription of all ES described so far runs towards the telomere.Analysis of different ES for their content of ESAGs shows that the number of these genes can vary considerably.None of the ES harbours all genes known as ESAGs, some harbour up to seven, some as few as three.The only ESAGs that are always present are ESAGs 6 and 7.These are the only ESAGs with an assigned function and they code for the dimeric transferrin receptor of trypanosomes (Salmon et al., 1994).Except for these two essential genes, all other ESAGs seem to be non-essential.It cannot be excluded that they confer some sort of advantage in the natural environment of the parasite, not obvious under in vitro culture conditions.It is also remarkable that, in the ES analysed, ESAGs 6 and 7 are always located immediately downstream of the pol I promoter.Given a certain 'leakiness' of the pol I promoter, transcription of ESAGs 6 and 7 might also occurs on silent ES, where transcription of the VSG gene, located up to 60 kbp downstream of the promoter, is completely shut down.The role of differential expression of ferritin receptor variants from different ES in host adaptation has been discussed elsewhere (Bitter et al., 1998).This diversity of ES is counterbalanced by a significant conservation in other sequence elements (Berriman et al., 2002).A characteristic 50 bp repeat is located immediately upstream of the ES promoter.The VSG genes are always preceded by a 70 bp repeat element.These elements are probably required for efficient recombination by duplicative transposition between silent VSG genes and expression site.It has, however, also been shown that the absence of the 70 bp repeat does not prevent recombination-based switching in cultured trypanosomes (McCulloch et al., 1997).Towards the telomeres, VSG genes are associated with other repetitive element (Aline and Stuart, 1989).In addition to conservation of repetitive, noncoding DNA elements it was also shown that some genes and pseudogenes show a remarkable conservation between different ES within a single cell.It has been hypothesised that this conservation could indicate that expression sites have only recently been arisen by duplication events from a single ES (Berriman et al., 2002).However, especially when considering a similar high degree of conservation between subtelomeric regions in the Plasmodium genome (see above) it seems equally likely that this conservation is the result of sequence homogenisation due to the recombinational activities that take place in subtelomeric regions of chromosomes.Such homogenisation of sequences will especially affect genes or pseudogenes with no inherent requirement for diversity and will be less obvious in genes where selective pressure drives genes towards sequence diversity (e.g.VSG genes, transferrin receptors).
A unique feature of subtelomeric regions in the trypanosome genome is the presence of a recently discovered family of genes, termed retrotransposon hot spot (RHS) genes (Bringaud et al., 2002).Based on sequence similarities the RHS genes have been divided into six subfamilies (RHS 1-6).In addition to intact genes, many RHS pseudogenes coexist within the RHS clusters.These pseudogenes arise either by frameshifts and stop codons within the open reading frame, or by the insertion of retrotransposon sequences.These insertions occur at exactly the same relative position in all members of this family.The total copy number of RHS genes/pseudogenes has been estimated to be about 130.The RHS sequences show no significant similarities to other proteins and their function is unknown.Antibodies raised against recombinant RHS proteins of each subfamily have shown that, with the possible exception of RHS2, the proteins are located in the nucleus.RHS-related sequence have also been identified in Trypanosoma cruzi.RHS genes/pseudogenes are clustered in the genome and frequently occur as tandem repeats.The clusters analysed so far are found upstream of bloodstream expression sites, but also in subtelomeric location on chromosomes not carrying a bloodstream ES.On the publicly accessible sequence (acc.no.AL929603) of chromosome I of T. brucei stock TREU927, the genome reference strain, the RHS cluster stretches over a length of about 190 kbp from immediately upstream of the VSG promoter towards the centre of the left hand side of this chromosome.
The overall architecture of chromosomes of T. brucei is very similar to that of Leishmania: many genes are located on single continuous coding strand with few inversion points on one chromosome.No centromeres have been identified yet.In contrast to T. brucei, Leishmania lacks the presence of retroelements within the genome.As they are also found in the kinetoplastids T. cruzi and Crithidia fasciculata their absence is remarkable and could hint at some fundamental differences in genome maintenance and dynamics (Bhattacharya et al., 2002).
What will be the potential impact of the T. brucei genome project on combating the disease?As the project is not yet completed only few studies employing a genome wide analysis to identify novel features of trypanosome biology have been published.When searching the databases for proteins that are located in the glycosome, a kinetoplastid specific organelle related to peroxisomes that contains most glycolytic enzymes, Hannaert et al identified several genes encoding for homologs of plant enzymes (Guerra-Giraldez et al., 2002;Hannaert et al., 2003).A total of 16 different genes were identified coding for proteins involved in a variety of cellular functions ranging from carbohydrate metabolism to fatty acid synthesis.Interestingly, most of these proteins contain a targeting signal typical for glycosomal proteins.In contrast to the presence of the apicoplast in Plasmodium and related organisms, trypanosomes do not contain the evolutionary remnants of a plant organelle.The authors therefore speculate that an extensive gene transfer occurred from an endosymbiontic organelle genome to the nuclear genome before this endosymbiont of algal origin was lost from the precursor of today's trypanosomes.Whatever the mechanism of acquisition of these plant-like genes was, it now offers, analogous to malaria parasites, the possibility that herbicide-type substances could be used to target the enzymes encoded by these genes.Orthologous genes have also been identified in Leishmania (Hannaert et al., 2003).In addition to such a genome wide analysis to search for novel drug targets, the genome information will also greatly aid gene discovery within other projects, particularly in combination with mass spectrometry technologies (Oliver, 2002;Beverley, 2003).Examples are the elucidation of the pathway GPI-anchor biosynthesis and fatty acid biosynthesis (Ferguson, 2000;Morita et al., 2000a;Morita et al., 2000b;Chang et al., 2002) or the identification of components of the RNA-editing machinery (Schnaufer et al., 2001;Stuart et al., 2002).
Using Plasmodium falciparum, Leishmania and Trypanosoma brucei I have tried to review the status of current protozoan parasite genome projects and given examples of read-outs of the sequence information at various levels of analysis.Many parallels can be drawn when comparing parasite genomes.Metabolic pathways have been identified that are unique to the parasites and offer potential drug targets.The organisation of the genetic information on individual chromosomes strongly reflects the necessities imposed on the parasite due to host/ parasite interactions.Genes involved in antigenic variation/ immune evasion are located near the telomeric ends of chromosomes where they are exposed to the desired effects of an active recombination machinery to maintain as much diversity as possible (Cano, 2001;Barry et al., 2003).Without doubt the availability of genome information will have a major impact on basic research in parasitology, particular in molecular science and biochemistry.However, justification to fund parasite genome projects was mainly the prospect to develop new drugs or vaccines against infectious diseases.How likely is it that this aim will be realised?They are certainly voices that doubt the impact of genomics in parasite control (Curtis, 2000) and drug discovery in general (Coghlan, 2002).There are indeed data that indicate an overall decrease in productivity in pharmaceutical research (Taylor, 2003).The number of drugs containing new active ingredients has steadily dropped over the previous two decades and it is hard to see that the development of drugs against parasitic diseases largely confined to poor countries would run contrary to this trend.It has been estimated that 20 to 30 new drugs will be needed for a sustainable control of the most important protozoan diseases (Gelb and Hol, 2002).It appears very unlikely that private initiative by companies will raise the necessary funds to develop new drugs against tropical diseases.The costs for developing a new drug is put at around $ 500 million, and, unless different safety standards and regulatory procedures are applied, these costs will be of a similar magnitude for drugs against tropical diseases.If only for ethical reasons, a scenario of "cutting corners" is not acceptable.On the other hand, the objective of drug development against non-infectious diseases is much more complex in comparison to develop a drug against a parasite.In the former case modulation (activation/inhibition/substitution) of endogenous gene products (proteins or metabolic products) is usually required whereas drugs against infectious diseases are designed with only one (simple) aim: to kill the parasite with minimal side-effects on the patient.Screening of compound libraries is therefore more straightforward.Another feasible way of reducing costs has been already mentioned: the exploitation of already existing substances.Drugs that show a good efficiency against malaria (dapson, fosmidomycin) and visceral leishmaniasis (miltefosine), have originally been developed for other applications but are now 'hijacked' and being evaluated to treat infectious diseases (Guerin et al., 2002a;Sundar et al., 2002).The identification of a relatively large number of gene products closely related to plant enzymes both in malaria and kinetoplastids opens possibilities to test a relatively defined set of pharmacological substances (e.g.herbicides) on their effect on these target molecules.In addition it will be necessary to redefine the role of academia in applied research.More development leading towards the clinical application of basic science will need to be done within academic institutions.Closer links between public and private institutions will be necessary to translate research into applications.Finally, a note of caution concerning the informative value of genomic information.As extensively discussed for the human genome project, a genome project is not completed by sequencing a single genome.This is particularly true for parasite genomes strains that are adapted to in vitro cultures.Single genomes can only serve as a reference for detailed comparative studies (Kissinger et al., 2002).Although conservation of syntenic groups of genes has been demonstrated for various related parasites (Bringaud et al., 1998;Gull, 2001;Carlton et al., 2002), recent publications have revealed a surprising genetic variety between various Plasmodium falciparum isolates (Volkman et al., 2002;Wootton et al., 2002;Duraisingh et al., 2003).In one study it was shown that receptor proteins involved in red blood cell invasion are expressed variantly in different lines.It was hypothezised that this expression patterns are an adaptive response to heterogeneity of erythrocyte surface proteins to maximise the parasites' opportunities to survive inside the host (Duraisingh et al., 2003).In a second study it was demonstrated that mutations in a gene associated with chloroquine resistance spread through the parasite population in relatively short period of time (Wootton et al., 2002).Such genetic microheterogeneity has direct implications for drug and vaccine development (McConnell, 2002;Wongsrichanalai et al., 2002).Population analysis will be necessary on a genome wide scale to avoid focusing on targets that will turn out to be too heterogeneous in wild-type populations to be of use as drug targets.