Published: 14 November 2019
A School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia
B Division of Obstetrics and Gynaecology, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, WA, Australia
C Tel: +61 6488 3200, Fax: +61 6488 7086, Email: firstname.lastname@example.org
Bacterial 16S rRNA gene sequencing studies are popular across many fields of biology. This technique has allowed us to study bacterial communities like never before, leading to significant insights into microbial ecology and host– microbe interactions. However, 16S rRNA gene-based workflows are vulnerable to confounding and bias at every step. Many studies are plagued by entrenched methodological errors, producing data riddled with experimental artefacts. These issues are amplified in the study of low bacterial biomass samples, such as forensic and ancient samples, blood, meconium, ice and the built environment. It is, therefore, necessary to define the pitfalls of low biomass 16S rRNA gene-based work flows and to identify methods that may allow more accurate characterisation of bacterial communities in such samples.
While the term ‘microbiome’ specifically relates to a microbial ecosystem present in a particular environment, and encompasses all microbes present (including bacteria, archaea, viruses, fungi, yeasts and protozoa), microbiome research remains heavily dominated by bacterial 16S rRNA gene sequencing studies. For this reason, the focus of this mini-review will be on the technical considerations associated with bacterial 16S rRNA gene profiling in low biomass environments (Figure 1).
Low biomass samples are intrinsically vulnerable to contamination. The proportion of contaminating bacterial DNA increases with decreasing starting bacterial biomass1. Ultra-clean facilities, ultra-pure water and even certified DNA-free reagents harbour low levels of bacterial DNA2–7. It is, therefore, not possible to perform a microbiome experiment devoid of environmental contamination. External contamination may be introduced from reagents, laboratory surfaces and instruments or users. Sample-to-sample cross-contamination may be introduced from neighbouring wells or tubes when particles are aerosolised during pipetting, tube cap removal, plate seal removal or from centrifugation of open spin columns8. Further, contamination can occur on sequencing machines where residual barcodes or amplicons may be carried over from previous runs9,10. It is, therefore, crucial to take blank controls at sample collection, DNA extraction and PCR amplification. These controls should be sequenced alongside samples, even if no apparent amplification is present post-PCR, and taken into account during data interpretation.
Bacterial DNA introduced from DNA extraction kits (the ‘kitome’) and PCR master mixes (the ‘mixome’) have both been shown to significantly contribute to contamination11,12. The extent to which one or the other will dominate the contamination profile will vary between labs. We have recently shown that the mixome was the major source of contamination in our own workflow12. Importantly, we have been able to reduce the mixome by 99% using a commercially available dsDNase kit12. Enzymatic decontamination of PCR reagents should be incorporated into all 16S rRNA gene sequencing workflows, especially those of low biomass, to minimise reagent-based contamination.
Numerous studies have shown that different batches of the same reagents will introduce different contamination profiles3,11,13,14. Batch numbers should therefore be recorded for each step, and batch effects should be planned for when processing samples. For example, if blood samples from patients with a particular disease are compared to those without that disease, these should be randomised within a batch. If they are processed in separate batches resulting differences between their bacterial DNA profiles may be mere artefacts of the batch effect. Batch-specific contamination should be identifiable in negative controls, again reiterating the importance of prudent use of controls in low biomass work. While bioinformatic tools are available for detecting batch effects14, the simple use of appropriate negative controls and sample randomisation are sufficient.
DNA-based analysis of bacterial communities is broadly useful for characterising the taxa present; however, it is unable to differentiate live, metabolically active cells from dead cells or cell-free DNA. We have previously demonstrated the importance of excluding non-viable bacteria when analysing bacterial profiles in low biomass samples15. The presence of cell-free DNA and non-viable bacteria in low biomass samples can significantly confound the biological interpretation of sequence data. Exclusion of DNA from non-viable bacteria should, therefore, be integrated into low biomass 16S rRNA gene sequencing work flows. While there are numerous approaches for distinguishing live bacteria from dead16, the use of viability dyes is the most compatible with 16S rRNA gene sequencing work. Viability dyes, such as ethidium monoazide (EMA) and propidium monoazide (PMA), are cell membrane impermeable DNA intercalating agents17. These dyes are able to pass through the compromised cell membranes of dead bacteria, where they intercalate with the DNA inside, and upon photo-activation become covalently cross-linked to it. This strongly inhibits PCR amplification of affected DNA molecules, resulting in the amplification of DNA derived from intact cells only. Viability dyes have previously been used to exclude non-viable bacteria in the analysis of low biomass environments such as meconium, cleanrooms and even the international space station15,18–20. However, viability dyes are imperfect, as their effectiveness varies with different bacterial species and in different sample types17. Their use requires optimisation and validation for the sample type at hand.
In the analysis of clinical samples, host DNA is a double-edged sword. On one hand, 16S rRNA gene primers can mismatch with host DNA, creating a false positive result on a bacterial DNA presence/absence level and consuming valuable reagents in downstream sequencing applications, thus greatly limiting interpretation of resultant data21. Alternatively, host DNA may act as a PCR inhibitor, creating a false negative result22. A number of commercial kits (such as MolYsis) and bespoke methods (osmotic lysis, Saponin) have been trialled for depletion of host DNA with varying levels of success23,24. Host depletion generally consists of selective (mammalian) cell lysis followed by DNase treatment. However, such methods tend to rely on high microbial load and high initial sample volume and may therefore be unsuitable for low biomass samples23,25,26. Furthermore, pre-lysis steps may inadvertently lyse some bacterial cells (such as Ureaplasma spp.) and thereby distort the resulting bacterial DNA profiles. In the context of low biomass clinical samples, it is worth measuring host DNA levels using targets such as human β-globin to ensure that a negative result is not a consequence of sample insufficiency.
The presence of PCR inhibitors in low biomass samples may create a false negative result. Common PCR inhibitors include urea, haemoglobin, and bile in clinical samples, and humic acids, polysaccharides and minerals in environmental samples27. A frequently used method for reducing the effects of PCR inhibitors is dilution of the sample. However, when the quantity of template DNA is low to begin with, dilution may reduce the template to undetectable levels. It is, therefore, important to select appropriate methods for DNA extraction and PCR amplification. In our own experience, we have found that increasing the standard ‘inhibitor removal’ steps in commercial extraction kits can reduce the co-extraction of PCR inhibitors28. Further, different DNA polymerases have varying abilities to perform in the presence of PCR inhibitors29. PCR reagents should therefore be selected based on the inhibitors expected to be present in the sample type being processed. In general, the effect of PCR inhibition in low biomass samples can be assessed by spiking a known concentration of pure bacterial DNA from a species that is unlikely to be found in the sample type (positive control) into the extracted sample DNA. This allows more accurate interpretation of ostensibly ‘negative’ results in low biomass settings.
Data produced from low biomass samples should be interpreted very cautiously, particularly where interpretations of sterility are being made. One perk of low biomass work is the fact that samples often contain a very low diversity of reads, making manual examination of sequences unlaborious. Often, common sense identification of ‘blue whales in the Himalayas or African elephants in Antarctica’14 is useful in recognising potential contaminants. For instance, in our recent publication examining fetal bacterial communities, we identified two OTUs belonging to the thermophilic taxa Thermothrix azorensis and Thermus scotoductus, which were unlikely human microbiome candidates30. The inclusion of contaminating sequences in bacterial analysis of low biomass samples can distort the apparent composition of a sample and inflate diversity measures1. Karstens et al. compared four methods for filtering contaminating reads from low biomass data sets (removing all sequences present in a negative control, removing low abundance sequences, removing sequences that have an inverse correlation with DNA concentration31, and using SourceTracker to predict sequences arising from defined contaminant sources)1. These methods varied in their ability to delineate contaminants from true sequences. Notably, removing all sequences present in negative controls erroneously removed >20% of expected sequences. SourceTracker was able to remove contaminating sequences with a high level of accuracy when the experimental environment was well defined, but performed very poorly when the experimental environment was unknown. The most successful method (identifying sequences that have an inverse correlation with DNA concentration using the R package Decontam31) was still only able to identify 70–90% of contaminants. It is, therefore, vital to decontaminate reagents and to adopt working habits that minimise user- or environmentally introduced contamination. As the golden rule of microbiome research states: rubbish in, rubbish out.
Here we will put forward a series of suggested methods that will minimise the pitfalls associated with low biomass 16S rRNA gene sequencing workflows (Figure 2).
Blank controls should be taken at the point of sample collection and taken through to sequencing to characterise any contamination introduced by the sampling tubes/swabs/environment. Samples and tubes should only ever be handled with gloves to avoid contamination from skin bacteria. It is important to note that specialised laboratories are optimal for low biomass work. These may include specialised clean room facilities, or even simple measures such as having separate labs for DNA extraction (raw samples only), template preparation (raw DNA only, no amplicons) and PCR (amplicons). Regardless of the facilities being used, personnel should always wear disposable gloves and lab coats and work within laminar flow cabinets. All work surfaces should be decontaminated before and after use with 10% bleach and UV irradiation. All plastic-ware used should be certified DNA-free, and pipette tips should be filtered. These basic steps are crucial for minimising lab-ware- and user-introduced contamination.
Depending on the sample type, host DNA depletion may be necessary prior to DNA extraction. Viability dyes may also be used prior to DNA extraction to differentiate viable and non-viable bacteria.
It is important to take negative extraction controls with each batch of extractions and to record where this control sits in relations to other samples. Extraction methods should be optimised for the sample type being used. Some sample types may require optimisation of lysis or inhibitor removal. For example, many commercially available kits are unable to extract DNA from meconium without further optimisation28. These samples may, therefore, be incorrectly deemed sterile if not properly extracted. Internal extraction controls can be spiked in to samples to quantify extraction efficiency13. However, it should be noted that the use of exogenous bacteria as an internal control cannot provide information on the efficiency of an extraction method for lysing/purifying bacteria that is endogenous to a sample. Importantly, spiking bacterial cells into a low biomass sample introduces the risk that the internal control will out-number the bacteria that are truly present, thereby sequestering the majority of the reads in a sequencing run.
Enzymatic decontamination of PCR reagents is a simple and effective method for reducing contamination12. Negative PCR controls should be used and sequenced to characterise any contamination introduced during this step. Again, the position of these controls on the plate in relation to other samples should be recorded to control for well-to-well contamination. Internal positive controls may be used at this stage to detect any PCR inhibition.
Data should be interpreted with caution. Potential contaminants should be openly reported and post-hoc methods to remove contaminating reads should be utilised to increase the accuracy of downstream analysis. However, care must be taken in selecting methods for removing contaminating sequences, as overzealous filtering of potential contaminants can lead to spurious conclusions32.
16S rRNA gene sequencing technologies have allowed extensive characterisation of bacterial communities across numerous environments. These technologies are far more sensitive than previous culture-based methods of bacterial profiling, allowing the detection of very low titres of bacterial DNA from complex samples. However, findings from low biomass 16S rRNA gene sequencing studies can be undermined by reagent-based contamination and other methodological issues. There is, therefore, a need to adopt robust and conservative approaches to study low biomass bacterial communities.
The authors declare no conflicts of interest.
This work did not receive any specific funding.
Lisa Stinson is a Research Fellow in the Hartmann Human Lactation Research Group at the University of Western Australia. She recently completed her PhD studying the human fetal microbiome. Her research now focuses on the human milk microbiome.
Jeffrey Keelan has 30 years’ experience in pregnancy research, focusing on placental inflammation, intrauterine infection and preterm birth. He is Head of Laboratories at the Division of Obstetrics and Gynaecology and Head of the School of Biomedical Sciences at the University of Western Australia.
Matthew Payne is a Senior Research Fellow within the School of Medicine at the University of Western Australia, and a highly experienced molecular microbiologist with expertise in perinatal microbiology.
The tale of a tiny worm, the bacteria that live inside her, and a tree being munched on by a grub.