Genomics England exploits big data analytics to personalise cancer treatment

In August 2014, UK prime minister David Cameron announced £300m government funding for a four-year project to map 100,000 people’s genomes by 2017. This will be undertaken through a partnership between a government-owned organisation, Genomics England, set up in 2013, and US biotechnology firm Illumina.

The aim of the project is to use big data and genetics to develop personalised medicine. This will not, at least for the time being, mean the development of bespoke treatments for individuals. Instead, it will mean that patients with the same condition, who would previously have received the same therapy, will have the most appropriate from a range of treatments. This entails terms such as “stratified medicine” and “precision medicine”, particularly in the project’s current early stages.

Researchers believe it has enormous potential: “Personalised medicine is the most exciting change in cancer treatment since the invention of chemotherapy,” says Peter Johnson, chief clinician at charity Cancer Research UK.

Anthea Martin, science communications manager at the Cancer Research UK, says standard treatments based on type and stage of cancer fail many: “We know it doesn’t work for everyone, because not everyone survives. Every cancer is different.”

As a result, a few cancers are already treated using basic stratified medicine. Tests showing a harmful mutation in BRCA1 or BRCA2 genes – which produce proteins that suppress tumours – lead some women to have mastectomies to avoid breast cancer. The technique is also used for those who develop breast cancer. The drug Herceptin is only useful for patients with a high level of the HER2 receptor, while Tamoxifen is used for those with high levels of oestrogen or progesterone.

Sequencing cancerous DNA

But cancer researchers are aiming to use the DNA of an individual’s cancer cells, rather than the “germline” – that is, the patient’s original, inherited DNA – for the stratification. This requires the genetic sequencing of the three billion components of the cancer’s DNA, an IT-dependent process which has got much faster and cheaper since the first draft sequence of the human genome was produced in 2001.

“The computer power has advanced, allowing us to analyse that massive pile of data,” says Martin. “All that has combined to let us look at the genetic nature of cancers.”

Cancer Research UK is taking part in the National Lung Matrix Trial, which will see National Health Service (NHS) patients given one of a number of drugs, based on DNA tests of cancerous cells. This will be funded by the charity and pharmaceutical companies AstraZeneca and Pfizer, which have developed the early-stage medicines that will be used in the trial. So far, work has focused on setting up the logistics – including the ability to run the complex tests quickly, send the data back to hospitals and also store it for future research – with patient treatments due to start towards the end of 2014.

Martin says IT is central to testing and research, and the substantial amounts of data involved present their own problems – for example, some of the charity’s researchers send and receive hard drives by post, because computer networks cannot handle transfers fast enough. Such problems are familiar to Jim Davies, professor of software engineering at the University of Oxford and chief technology officer of Genomics England, where he is working to specify the ICT requirements of the organisation.

Working with human genomic sequences of three billion items generates specific challenges. “You’re dealing with files of 150Gbytes each,” says Davis, adding that files of this size bring demands for data transfer capacity only needed in a few other industries, such as film and video production. Furthermore, he says: “The application programming interfaces (APIs) and abstractions of that data are still under development – there is a lot of work going on with global standards.”

This work is taking place through the Global Alliance for Genomics and Health but, until a standard way of abstracting data from a genomic sequence has been decided, “we don’t yet have a definitive stable data management architecture,” he says. There is a similar issue with the annotation of genetic data.

Privacy and data security

Aside from the size of data and the lack of standards, Genomics England has to establish systems that allow researchers to work effectively while protecting the security of people’s genomic and medical data.

“The idea we’d be giving people a download is almost unthinkable,” says Davies. “We have to keep it in a managed environment where access is recorded.” This will take the form of a virtual desktop, where users do not have programmatic access and everything they do is tracked: “The data stays in the datacentre, you take the results.”

This means a challenging procurement process services for Genomics England, although it hopes to have contracts in place by next spring. It may use accredited G-Cloud providers for the datacentres – these are already cleared to hold sensitive personal information – but the quantities of data means services will need to be specifically configured. “We couldn’t stick it on Amazon’s cloud or Azure, as the default configurations of their machines will not match the requirements of the bigger genomic data,” says Davies.

However, he remains confident it will be possible to buy the hosting facility and configure the management of the hardware and lower levels of software for Genomics England through a normal government procurement process, which will start soon.

“At the higher levels of the software stack, we are just going to have to do it with the people who do it at moment,” he says, referring to those who already serve research institutions including Oxford and Cambridge universities, US institutions and the European Bioinformatics Institute, based at the Wellcome Trust Genome Campus near Cambridge.

Genomics England will also support – although not necessarily pay for – the development of the higher-level software needed, and hopes this could help people working in science research programmes who launch startups for their software.

The organisation is also working on how it will handle patient data from NHS trusts, which will be accredited by NHS England, the management body for the health service in England to participate in the work (UK devolution means that Scotland, Wales and Northern Ireland will organise their own health services).

A quarter of the scoring as to whether a trust can take part is based on data informatics, so only trusts with good IT are likely to be involved. NHS England is currently part-funding improvements through its £500m Integrated Digital Care Technology fund, which matches spending by trusts.

Patient data and consent

But that still leaves issues over record standards, Davies says. Many NHS pathology reports are dictated verbally by staff: “While formulaic, the data hasn’t been captured in a structured form,” he notes. And parts of the NHS still have legacy communications technology, meaning data recorded in a computerised form can get lost. Davies knows of the output of one multi-disciplinary NHS cancer team meeting which is recorded in Microsoft Word, faxed to another department, scanned then added to the patient’s record as an image. However, he adds: “There’s a push from NHS England to raise your game if we’re going to accredit you as a genomic medicine centre.”

Privacy campaigners have concerns about such work, which they believe are shared by many. “I think people are generally nervous about this shift to genomic data,” says Phil Booth, co-ordinator of MedConfidential. Genetic data can be used for a wide range of purposes and cover more people than just the individual: “It’s not just your genes, it’s your family’s,” points out Booth, giving the example of the potential of DNA records to reveal previously unknown fathers.

Booth believes combining genetic and patient record information “blows out any anonymisation”, with the full set of data making it possible to work out the patient’s identity, even if names and full addresses are removed. “It’s going to need to be managed very carefully indeed,” he says.

While Genomics England intends to securely store data, Booth says there are issues over how consent will be sought. “Absolute clarity is necessary and absolute rock solid processes have to be in place. It has to be accepted that this stuff cannot be anonymised. It should be consensual, safe and transparent. If you have those three together, people can make an informed choice,” he says.

Genomics England’s Davies agrees the security of patient data is vital: “There is as great an expectation on us to manage the confidentiality and integrity of this data as with patient data in the NHS. We have an extra obligation, as we’re taking it out of the care pathway. There is potential controversy.”

However, he says patients involved so far have been supportive, with particular enthusiasm from patients of ill children wanting to enrol them in the work: “The only negative feedback we’ve had from patients is: ‘Why is this taking so long?’”

First published by ComputerWeekly.com, October 2014