News Release

Comprehensive evaluation of large language models in mining gene relations and pathway knowledge

Peer-Reviewed Publication

Higher Education Press

Figure 1

image: 

An assessment workflow for LLMs in predicting gene relationships and biological pathway components.

view more 

Credit: Azam, M., Chen, Y., Arowolo, M. O., Liu, H., Popescu, M., & Xu, D.

Understanding complex biological pathways, such as gene-gene interactions and gene regulatory networks, is crucial for exploring disease mechanisms and advancing drug development. However, manual literature curation of these pathways cannot keep pace with the exponential growth of discoveries. Large-scale language models (LLMs) trained on extensive text corpora contain rich biological information and can be leveraged as a biological knowledge graph for pathway curation.

Recently, Quantitative Biology published a study titled "A Comprehensive Evaluation of Large Language Models in Mining Gene Relations and Pathway Knowledge." This research assesses 21 large language models (LLMs), including both API-based and open-source models, in their ability to retrieve biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and identifying gene components in pathways, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway as the ground truth, as illustrated in Figure 1.

The results reveal a significant disparity in model performance, with API-based models outperforming their open-source counterparts. The findings suggest that while LLMs are informative in gene network analysis and pathway mapping, their effectiveness varies, necessitating careful model selection. GPT-4 and Claude-Pro emerged as top performers in predicting gene regulatory relations, achieving higher precision and recall rates than other models. This study underscores the importance of selecting appropriate computational tools for specific tasks in biological research. It also provides a case study illustrating the use of LLMs as knowledge graphs for data mining in general.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.