MorPhiC: Inside the ambitious project to understand the function of every human gene
Memorial Sloan Kettering Cancer Center
image: The MSK MorPhiC team, left to right: Drs. Lorenz Studer, Danwei Huangfu, Ting Zhou, and Thomas Vierbuchen.
Credit: Memorial Sloan Kettering Cancer Center
Imagine you inherit a house only to discover that there are no labels on any of the breakers in the circuit box. How would you go about figuring out which switch did what?
One approach: Turn on all the lights and appliances, then flip each breaker off one at a time to see what changes.
That’s essentially the project that scientists involved in the MorPhiC Consortium are working on — but instead of electrical circuits, they’re trying to determine the biological functions of the roughly 20,000 protein-coding genes in the human genome.
And this work is challenging not just because of the thousands of genes involved, but also because unlike a circuit breaker that always controls, say, the living room lights, each gene can have multiple functions and play different roles in different contexts. Take the gene PAX6: In the eye it is critical for building the lens and retina, while in the brain it’s important to the development of our sense of smell.
Memorial Sloan Kettering Cancer Center (MSK) is one of 12 institutions in the U.S. and abroad that make up MorPhiC, which stands for Molecular Phenotypes of Null Alleles in Cells. In plain English, this means silencing genes one by one and looking at which cellular processes break down and in what ways, especially during early development.
“Doing this for one gene is pretty straightforward,” says MSK developmental biologist Danwei Huangfu, PhD, one of four researchers leading the effort at MSK. “Doing this efficiently in 20,000 genes across multiple cell types is quite the challenge — one that requires a whole network of scientists to knock out the genes, run experiments, and then analyze the resulting data.”
The first phase of MorPhiC
Launched by the National Institutes of Health in 2022, the program’s first phase was funded with a $42.5 million grant. The goal for this initial phase is to study 1,000 protein-coding genes across a variety of tissue types — giving the scientists an opportunity to develop and refine their methods in the process.
“Of those first 1,000 genes, MSK is responsible for 250,” says Dr. Huangfu, who is working alongside MSK researchers Lorenz Studer, MD, Thomas Vierbuchen, PhD, and Ting Zhou, PhD. “We’re making steady progress toward that goal and gradually making these gene knockout cell lines available to the broader research community.”
Ultimately, the MorPhiC project will help scientists and doctors by transforming the human genome from a list of parts into something more like an instruction manual — one that does more than merely catalog our genes but also explains what they do.
The work promises to help interpret rare genetic mutations, identify new drug targets, and shed new light on cancer’s ability to coopt our own developmental blueprint.
Half the genome has never been studied
It’s worth pausing to appreciate just how quickly our understanding of human genetics has evolved. The double-helix structure of DNA was first described in 1953, less than 75 years ago, one human lifetime. The Human Genome Project, which gave us the first complete base map of the human DNA sequence, was completed in 2003 — just over two decades ago, about half the length of an individual scientific career.
Yet today we’re living in an era when patients routinely have the genetic code of their tumors analyzed so that cancer-driving mutations can be identified and precision-matched to available treatments. Words like “genes,” “genomes,” and “mutations” are now staples of mainstream discussions of health and disease. But in the sweep of scientific history, most of this knowledge is brand-new.
Up until now, 75% of all research into protein-coding genes has focused on fewer than 10% of them, according to a perspective about the MorPhiC Consortium published in Nature.
And this makes sense. Scientists have overwhelmingly focused their time and resources on figuring out the underpinnings of our most challenging diseases, including cancer.
Take the famous p53 gene, which is mutated in more than half of all cancers. It has been the subject of some 13,000 research articles — more than any other gene. (And even this well-studied gene isn’t fully understood — the lab of MSK cancer biologist Scott Lowe, PhD, for example, continues to publish new insights about p53 regularly.)
At the same time, more than half of all human genes have received almost no scientific attention. Yet these unsung genes are important to understand, the researchers say.
“In truth, we won’t know which genes will turn out to be important until we actually study them,” says Dr. Zhou, who directs MSK’s Stem Cell Research Facility. “Medical history is full of examples where something dismissed as unimportant later turned out to be fundamental.”
Building a “knockout village”
These days, it’s routine for scientists to “knock out” individual genes in model systems — in everything from cells to fruit flies, zebrafish, and mice — deleting or disabling the gene so it can’t produce a functional protein.
But doing so at scale is another matter. It means knocking out hundreds of genes and studying each one across multiple cell types.
“This would be painfully slow, not to mention labor- and resource-intensive to try to do one at a time,” Dr. Zhou says.
So instead, the MSK team collaboratively developed an approach they call a “knockout village,” a sophisticated technique they describe in a recent preprint (a study shared publicly before formal peer review). Essentially, it’s a barcoding system that lets researchers study dozens of gene knockouts simultaneously.
“We insert a piece of DNA into the gene to disrupt the gene’s function,” Dr. Zhou explains. “We also insert a short DNA sequence that serves as a barcode that’s unique to each gene.”
This identifying sequence can later be detected using single-cell RNA sequencing, a technology that measures gene activity in thousands of individual cells simultaneously. This allows the team to know which cells have which genes knocked out — even when a variety of different knockout cells are pooled together in the same experiment.
Rather than running hundreds of separate experiments, the team can combine dozens of knockouts, grow them together with normal cells in realistic 3D tissue models called organoids, and analyze them in large batches.
From stem cells to mini-organs
MSK’s approach centers on human pluripotent stem cells. These uniquely flexible cells start to differentiate during early development, ultimately giving rise to some 200 specialized cell types throughout the body.
For the MorPhiC project, Dr. Huangfu’s team is using stem cells to study pancreatic cell lineages. Meanwhile, the labs of Dr. Studer and Dr. Vierbuchen are investigating neural development. All the labs are using the same underlying stem cells, which ensures genetic consistency across experiments. (Down the road, however, it would be helpful to study cells that cover the broader genetic variation found in humanity as a whole, the researchers note.)
And the team isn’t just growing simple cell cultures. They coax the stem cells to develop into sophisticated mini-organs that act a lot more like real human tissue than cells on a flat, plastic Petri dish.
For the pancreatic work, the organoids resemble the insulin-producing cell clusters in the pancreas. The neural work, meanwhile, is being conducted in “assembloids,” a fusion of cell types that model the developing brain.
“We can take those knockout pluripotent stem cells and differentiate them into specific, complex multicellular model systems,” Dr. Vierbuchen says. “In our case, modeling the developing cerebral cortex so that we can evaluate the impact of those gene knockouts. This part of the brain is expanded in humans compared to other animals and critical to our cognitive capacity.”
By studying gene knockouts during the process of differentiating from the blank canvas of a stem cell into specialized cell types, the researchers can capture what Dr. Huangfu calls “developmental vulnerabilities” — critical points where losing a particular gene disrupts normal human development.
Tracking changes
The knockout village approach allows them to track these changes across multiple genes simultaneously as cells progress through different developmental stages.
Neurons, for example, develop more slowly in humans than most animals and can take months to reach maturity in the lab. Dr. Studer’s lab previously discovered that a small set of genes — including EZH2, EHMT1, EHMT2 and DOT1L — act as a kind of molecular brake that slows the pace at which neurons mature. By removing those brakes in early nerve cell development, the team was able to speed up maturation in the lab — and with it the pace of research into neurological diseases like Parkinson’s, Alzheimer’s, and autism.
“Knowing how the biology is supposed to work in healthy cells can tell us a lot about how and why things go wrong in disease,” Dr. Studer says. “But an effort at this scale does something more. By looking across every gene without preconceived notions of function, you uncover new biology you weren’t looking for — biology that may open up whole new research directions.”
MorPhiC’s early findings
Even though the team is only halfway through its initial 250-gene target, intriguing patterns are already emerging. In a preprint describing their pancreatic differentiation work, the researchers found that some cells respond to the loss of a gene by changing identities. That is, when cells can’t follow their normal developmental path, they often switch to an alternative cell type, or “lineage,” rather than dying off or staying undifferentiated.
“What we see is not only the loss of a certain lineage — we are actually seeing at the same time the knockout cells gaining a competing lineage, becoming a different cell type instead,” says Dr. Huangfu, who was a co-corresponding author on the study.
While not yet peer-reviewed, the work suggests several mutations in genes linked to diabetes can cause a surprising shift: Cells that should have become insulin-producing cells instead developed into serotonin-producing cells with nerve-like characteristics. This suggests key genes act as guardrails during development, keeping cells on the right path and preventing them from becoming the wrong type of cell, Dr. Huangfu says.
Big ambitions generate big data
The knockout village approach creates a vast raft of data. Each single-cell sequencing experiment produces a trove of information about gene expression across thousands of individual cells in complex tissue environments.
This is where the consortium model becomes essential. Beyond the four centers like MSK that are running experiments in cells, MorPhiC includes a data coordination layer based at the University of Miami, the University of Washington Tacoma, European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), and Queen Mary University of London, plus three specialized data analysis and validation centers at the Fred Hutchinson Cancer Center, Stanford University, and the Jackson Laboratory.
“It takes a village to understand a ‘knockout village,’ ” Dr. Zhou says. “No individual institution, working in isolation, could accomplish it.”
The consortium has committed to releasing its data publicly — making it freely accessible to any researcher in the world who wants to use it.
What MorPhiC means for patients and for cancer
While the scientific goals behind MorPhiC are fundamental, the implications are also practical. Scientists are systematically building a more complete picture of what each of our genes does — and that knowledge, the researchers say, could change medicine in several ways.
To start, genetic testing of children with developmental conditions is now routine. But there’s often limited benefit in knowing that the function of certain genes has been lost if we don’t know what those genes do.
Dr. Vierbuchen describes MorPhiC as a way of getting ahead of the problem.
“There are many genes in the genome whose functions we simply don’t know,” he says. “This is a way of prospectively studying what happens when those genes are missing, so that when we identify people who carry those mutations, we’ll be able to better understand the implications and provide them with better treatments.”
The implications for drug discovery could be equally significant. The MorPhiC data could serve as a roadmap for therapeutic development, identifying which genes are worth targeting and anticipating what the consequences might be.
The approach is especially relevant to cancer.
“Cancers often hijack programs that normally occur only in early human development,” Dr. Vierbuchen says. “At MSK, we have robust research programs in developmental biology, pediatric oncology, and cancer biology all living under one roof and interfacing with each other. This makes MSK a natural home for this work and the advances it will spark.”
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.