Caister Academic Press

Molecular Approaches in Microbiology

Recommended reading:   Climate Change and Microbial Ecology | Polymerase Chain Reaction | SUMOylation and Ubiquitination
Adapted from Meesbah Jiwaji, Gwynneth F. Matcher and Rosemary A. Dorrington writing in Bishop 2014


Application of molecular approaches to the study of microorganisms

Up until the 1980s, identification of microorganisms was achieved by culturing individual isolates in the laboratory, viewing the organisms microscopically and subjecting these cultures to a variety of biochemical tests. While much valuable data has been generated by this approach, there are several major limitations. Firstly, replication of the complex environment in which microorganisms exist in a laboratory setting is often extremely difficult and it is estimated that less than 0.1% of microorganisms are currently culturable (Bishop 2014). Furthermore, these approaches are time-consuming, labour-intensive and often subjective. Added to this, microorganisms that are abundant and/or those that can be cultured under some environmental conditions may change into dormant or possibly unculturable forms under other conditions (Bishop 2014). The severity of the problem of relying on culture-dependent studies has been highlighted by cultivation-independent surveys where a major discrepancy can be observed between viable plate count technology and direct methods such as epifluorescence microscopic counts and ssRNA phylogenetic analysis. For example, the observation that marine bacterioplankton are infected by huge viral numbers (tens of billions of phage per litre) and that this phage predation is remarkably specific was undetected primarily because of a lack of representative pure cultures.

The application of molecular methods bypass the drawbacks presented by culture-dependent methodologies and have led to an improved and deeper understanding of the microbial components of ecosystems. These new approaches have made apparent our lack of knowledge about environmental biodiversity. It is estimated that there are ca. 1.5 million taxa that have been described at the species level (Bishop 2014) yet this represents only a small proportion of the estimated diversity. Currently, there is also a large discrepancy between the microorganism diversity that we detect in biological samples and what is actually present. Molecular ecology studies suggest that ca. 1-5% of microbial species have been isolated (Bishop 2014). And mycologists estimate that there are 1.5 million species of fungi despite the fact that only 72,000 species have been isolated or described.

Molecular approaches to investigating microbial processes can focus either on individual isolates or on the microbial population as a whole within an ecosystem. Either way, a key aim in the molecular biological study of microorganisms is to determine the genetic information present within the cell followed by correlation of the encoding genetic material to the complex biological processes which occur within the cell. Bioinformatics analyses are a crucial requirement, not only in curating and analysing large scale sequence data but also in bridging the gap between genetic code and the encoded functionality of genes.

The genomes of living organisms are essentially barcodes and contain sufficient information to both identify the microorganism as well as outline its physiological functionality (Bishop 2014). Genomes contain areas of high and low identity with the differences between the corresponding genes typically clustered in sections such as the third (wobble) bases of codons, intronic, and intergenic DNA. As significant stretches of the genome are maintained by selection to be identical or near-identical between members within a taxon, but which vary between taxa, these segments can be applied to both identification and taxonomy. In addition, as these sequences evolve, they represent both specific and systematic data. This makes sequence-based methods incredibly powerful, and this field has revolutionized the ways that we both classify and study microorganisms in their ecosystems, as well as how we screen for novel products and processes.

Whole genome sequencing

Knowledge of the genome of a given organism provides tremendous biological insight into cellular processes that may not be evident when using classical culturing/assay techniques which are limited to the study of phenotypic characteristics under the culture/assay conditions (Xu 2014). This allows for the identification of genes encoding for potentially economically useful metabolites or proteins which may not be produced or expressed under current culture/assay conditions. Furthermore, by increasing our understanding of cellular processes and mechanisms of gene regulation within target organisms, the ability to optimize microbial metabolism for enhanced application in industry is increased. In addition to the information generated when sequencing a single genome, the large number of genome sequences which are currently publicly available on databases allows for comparative studies between genomes of different organisms. Such comparative studies can provide valuable information with respect to the encoded function of genes particularly when genomes of closely related strains with differing phenotypes are compared against one another. In addition to novel species, multiple isolates of the same species are also being sequenced. This is due to the fact that even well-known species such as Escherichia coli show large levels of heterogeneity between strains. For example, comparison of E. coli genomes sequenced to completion show a discrepancy in size from 4.6 to 5.5Mbp. This means that there are close to one million nucleotides worth of sequence data that is present in one strain but absent in another (Bishop 2014; Xu 2014).

Currently, the most widely utilized approach to whole genome sequencing is termed 'shotgun sequencing'. In this technique, the genome of a chosen microbe is randomly sheared into millions of DNA fragments which are then sequenced. Owing to the random nature of the DNA shearing, many of these fragments will overlap in terms of sequence data. By aligning these overlapping fragments against one another, it is possible to assemble a larger contiguous sequence (contig). If there are regions of the genome which are not represented in the sequenced fragment library, this will result in contigs which do not overlap and can therefore not be joined together to form a full length sequence of the genome. In this instance, targeted re-sequencing of the missing region is then done by amplifying this region of the genome using primers specific to the terminal sequence in the contig. Once the genome has been assembled, the annotation of the genome is begun where the structural and functional features of the genome are identified using bioinformatics tools.

Microbial species diversity

When selecting specific genetic regions for analysis of microbial populations, factors such a ubiquity (the target gene must be present in all species), species sequence conservation (to allow for species identification) and evolution-induced interspecies variability (to allow for differentiation between species as well as to infer taxonomic relatedness) need to be considered.


Metagenomics provides a gene-based exploration of the microbial community as a whole on the basis of genetic material (DNA or RNA) and it returns high resolution data rich information (Bishop 2014). The value of this information has been enhanced by the availability of sequence data on a large number of genomes (Marco 2011).

Metagenomic analysis involves isolating DNA from an environmental sample and analysing the totality of the DNA. As a consequence, the DNA libraries contain the genetic information of all organisms present at a specific location at the sampling time. Previously, this DNA would be cloned into suitable vectors, the clones transformed into an appropriate host bacterium, and the resulting transformants screened. The clones could be screened for phylogenetic markers, for conserved genes, for expression of specific traits such as enzyme activity or antibiotic production or they could be sequenced via Sanger sequencing. In the case of sequencing data, these sequences can be used to query databases allowing for the inference of phylogeny or the identification of putative functional genes. With the availability of next generation sequencing technologies, metagenomes can be analysed without the need for time-consuming and labour intensive cloning steps. Instead, DNA isolated from environmental samples can be sequenced directly. This approach has been successfully applied to terrestrial and aquatic environments resulting in the discovery of genes for antibiotics, antibiotic resistance and industrial enzymes. Examples of enzymes isolated from microorganisms using a functional metagenomics approach include lipases, esterases, amylases, amidases and chitinases. Thus the application of metagenomics has paved the way for the discovery of new genes, proteins and biochemical pathways. This technology has also been very important for the identification of new biocatalysts that have been developed by nature, isolated by bioprospecting and optimized by directed evolution. Viral metagenomics studies have shown that up to 60% of the sequences in a viral preparation are unique, these virus sequences represent unknown viral species that would be missed by traditional Sanger sequencing approaches but are detected by the application of next generation sequencing technologies (Bishop 2014; Marco 2011).

Metatranscriptomics and metaproteomics

Adaptive responses are driven by changing levels of transcription in the cell as well as changes in the levels of translation. Metatranscriptomics, and by extension metaproteomics, focuses on microbial gene expression within complex natural habitats, allowing for culture-independent whole-genome expression profiling of complex microbial communities. However, mining the 'transcriptome' and the 'proteome', which represent the collection of transcribed sequences and the translated proteins respectively, poses a significant challenge, particularly when it comes to comparing data generated on different 'omics' platforms. There are both technical and biological hurdles to overcome. Efficient techniques to isolate total environmental RNA and protein are still being developed; however, the inherent complexities of RNA and proteins means that these techniques lag behind those that have been developed for DNA. The relationship between RNA and protein is complex; thus, it is important to be aware of biases in the techniques, for example the differential lifetimes of mRNA and protein. This requires the analysis of temporal changes in transcript and protein levels. The challenge with transcriptomic and proteomic datasets remains the identification of true mRNA-protein concordance and discordance. For this reason, metatranscriptomics and metaproteomics are fields that are still being developed, particularly the ability to study gene expression and protein translation in natural environments, which hold special promise for studying microorganism function in ecosystems (Bishop 2014).


Further reading