This research was led by Tao Ma at the Institute of Feed Research, Chinese Academy of Agricultural Sciences in Beijing, China, in collaboration with Morteza H. Ghaffari from the University of Bonn, Germany. The study aimed to systematically evaluate the influence of different reference databases and confidence scores (CS) on the classification performance of Kraken2, a widely used metagenomic taxonomic classifier.
Using simulated metagenomic datasets, the authors evaluated how various databases—ranging from the compact Minikraken v1 to the comprehensive nt and GTDB r202—and a range of CS settings (from 0 to 1.0) affected Kraken2’s key performance metrics, including classification rate, precision, recall, F1 score, and accuracy of relative abundance estimation.
The study found that higher CS values generally resulted in decreased classification rates, with this effect being more noticeable in smaller databases like Minikraken and Standard-16, where no reads were classified above a CS of 0.4. Conversely, larger databases such as Standard, nt, and GTDB r202 maintained higher classification rates even at stringent CS settings. Additionally, larger databases exhibited significant improvements in precision and F1 scores with increasing CS, highlighting their robustness under stringent conditions. In contrast, smaller databases showed a significant decrease in these metrics at higher CS levels. The difference between the calculated and true relative abundance of bacterial taxa also increased with higher CS across all databases, particularly in smaller databases, underscoring the importance of using comprehensive reference databases for accurate relative abundance estimation.
The authors emphasize the importance of careful selection of reference databases and CS parameters, customized to specific scientific questions and computational resources. Their findings suggest that combining a comprehensive reference database with a moderate CS (0.2 to 0.4) significantly enhances the accuracy and sensitivity of metagenomic taxonomic classification.
This study provides valuable insights for researchers using Kraken2 in metagenomic analyses, guiding them in optimizing their parameter choices to achieve more accurate and sensitive results.
See the article:
Investigating the impact of database choice and confidence score on the performance of metagenomic taxonomic classification
https://doi.org/10.1007/s42994-024-00178-0
Journal
aBIOTECH
Article Title
Impact of database choice and confidence score on the performance of taxonomic classification using Kraken2
Article Publication Date
31-Jul-2024