Feature Story | 21-Aug-2025

AI-generated genomes could accelerate precision medicine without compromising patient confidentiality

OncoGAN generates simulated genomes that can be used to train genomic analysis tools without the confidentiality concerns associated with real genomes.

Ontario Institute for Cancer Research

A new AI system that creates simulated cancer genomes could reshape the tools used to analyze tumours, helping bring about more accurate cancer diagnosis and ultimately more effective treatments.

OncoGAN was developed by researchers at the Ontario Institute for Cancer Research (OICR) and the University of Toronto and is described in a new Cell Genomics paper.

It uses generative AI to simulate realistic tumour genomes across eight different types of cancer, including breast, prostate and pancreatic cancers. These synthetic genomes can simulate realistic patterns of genetic alterations, and can be used to benchmark genomic testing and improve the algorithms that make ‘precision oncology’ possible.

Analyzing tumour genomes and the variations within their DNA has enabled new discoveries about how cancer develops, leading to a surge of cutting-edge tests and medicines. It is the cornerstone of precision oncology, where cancer treatment is personalized to the unique biology of a patient’s tumour.

But the algorithms used to analyze genomes are limited because they have been trained on a limited set of cancer genomes, relatively few of which are publicly available. The most commonly used tools were trained on a few dozen legacy genomes, and can’t fully capture the necessary biological diversity. While more recent genome sequencing data exists, access is often restricted due to concerns around the confidentiality of the patients they were sampled from.

“With OncoGAN, we are creating realistic genomes out of nothing, with no connection to any real person, yet a huge amount of value scientifically,” says Dr. Lincoln Stein, Scientific Director (Acting) at OICR, Professor of Molecular Genetics at the University of Toronto, and senior author of the paper. “These synthetic genomes don’t contain any personal health information, and so they can be shared without limitation.”

Beyond privacy, another advantage of OncoGAN’s synthetic genomes is that their exact ‘ground truth’ is known. A genome’s ground truth is its full, error-free DNA sequence with all genomic variants identified. It is nearly impossible to know the ground truth of real-life genomes because they are so complex and sequencing technology is limited. This means that current genome analysis tools could be flawed, because there may have been trained on flawed data.

By generating genomes from scratch, OncoGAN gives researchers fully known, verified DNA sequences that can enable better, more precise genomic testing and analysis.

“Knowing the ‘ground truth’ of the genomes means they can be used to benchmark new algorithms with full knowledge of that the correct answer is,” says Ander Díaz-Navarro, Postdoctoral Fellow at OICR and first author of the paper.

With better, more accurately trained tools to analyze cancer genomes, Stein says scientists could unlock more critical insights with the potential to transform cancer care.

“The more we know about the biological factors that drive cancer, the better equipped we are to detect it as early as possible, treat it more effectively, and even prevent it altogether,” Stein says.

OncoGAN is publicly available for download. Stein, Díaz-Navarro and colleagues have also generated 800 simulated genomes, which are available with open access and are already being used to train analysis tools in Stein’s lab.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.