Big Data Analysis Identifies New Cancer Risk Genes
There are many genetic causes of cancer: while some mutations are inherited from your parents, others are acquired all throughout your life due to external factors or due to mistakes in copying DNA. Large-scale genome sequencing has revolutionised the identification of cancers driven by the latter group of mutations – somatic mutations – but it has not been as effective in the identification of the inherited genetic variants that predispose to cancer. The main source for identifying these inherited mutations is still family studies.
Now, three researchers at the Centre for Genomic Regulation (CRG) in Barcelona, led by the ICREA Research Professor Ben Lehner, have developed a new statistical method to identify cancer predisposition genes from tumour sequencing data.
“Our computational method uses an old idea that cancer genes often require ‘two hits’ before they cause cancer. We developed a method that allows us to systematically identify these genes from existing cancer genome datasets” explained Solip Park, first author of the study and Juan de la Cierva postdoctoral researcher at the CRG.
The method allows researchers to find risk variants without a control sample, meaning that they do not need to compare cancer patients to groups of healthy people.
“Now we have a powerful tool to detect new cancer predisposition genes and, consequently, to contribute to improving cancer diagnosis and prevention in the future,” added Park.
The work, which is published in Nature Communications, presents their statistical method ALFRED and identifies 13 candidate cancer predisposition genes, of which 10 are new.
“We applied our method to the genome sequences of more than 10,000 cancer patients with 30 different tumour types and identified known and new possible cancer predisposition genes that have the potential to contribute substantially to cancer risk,” said Ben Lehner, principal investigator of the study.
“Our results show that the new cancer predisposition genes may have an important role in many types of cancer. For example, they were associated with 14% of ovarian tumours, 7% of breast tumours and to about 1 in 50 of all cancers. For example, inherited variants in one of the newly-proposed risk genes – NSD1 – may be implicated in at least 3 out of 1,000 cancer patients.” explained Fran Supek, CRG alumnus and currently group leader of the Genome Data Science laboratory at the Institute for Reseach in Biomedicine (IRB Barcelona).
When sharing is key to advance knowledge
The researchers worked with genome data from several cancer studies from around the world, including The Cancer Genome Atlas (TCGA) project and also from several projects having nothing to do with cancer research.
“We managed to develop and test a new method that hopefully will improve our understanding of cancer genomics and will contribute to cancer research, diagnostics and prevention just by using public data,” said Solip Park.
Ben Lehner added, “Our work highlights how important it is to share genomic data. It is a success story for how being open is far more efficient and has a multiplier effect. We combined data from many different projects and by applying a new computational method were able to identify important cancer genes that were not identified by the original studies. Many patient groups lobby for better sharing of genomic data because it is only by comparing data across hospitals, countries and diseases that we can obtain a deep understanding of many rare and common diseases. Unfortunately, many researchers still do not share their data and this is something we need to actively change as a society”.