Big Data and Acute Leukemia or how large data sets are helping to ask and answer previously unknown Q&A in research

February 19, 2021

Big Data – the term has been a topic of controversial debate for several years.

Why, despite all the criticism, should we gather and analyze big amounts of data? Why does it hold great potential for health research?

How does the HARMONY project launched by the EU’s Innovative Medicines Initiative make use of Big Data, and to what extent do patient organizations have a central role to play in this?

And what role does it play in acute leukemia?

Big Data – a word that has been circulating through the media, politics and public discourse for several years now. Critics usually use the term “big data” to refer to alleged surveillance measures by governmental institutions, such as data retention or the obscure business models of large social media platforms. But it’s worth taking a closer look: Where critics see great dangers in collecting and analyzing big amounts of data, proponents emphasize the opportunities that big data brings for health research, for example. 

More opportunity than risk

“Choosing the optimal therapy is vital for patients.” – Jan Geissler (ALAN Steering Committee member)

What potential does Big Data hold for health research? In light of the evolution from an organ-based to a personalized medical approach based on measurable biological attributes – for example, protein molecules in the blood that indicate a specific cancer – it becomes apparent that Big Data can be an important tool in establishing more fine-tuned research approaches and treatment methods. These can be more specifically tailored to a disease pattern and a patient group – especially in the case of rare diseases.

By generating anonymized personal data sets, such as clinical, imaging or molecular genetic data, and analyzing them (data mining), new insights can be gained into disease development and prevention, diagnosis and therapy. In concrete terms, this means using artificial intelligence to record and interpret the data sets of as many patients as possible in order to identify previously unknown correlations and relationships and thus, for instance, to be able to prefer certain therapeutic methods or exclude other ineffective measures from the outset. 

Know more about Harmony Alliance and Big Data in Blood Cancer

Project HARMONY – seven blood cancers in focus

Although diagnosis and treatment methods for blood cancers have greatly improved in recent years, many are still incurable. For this reason and with the help of 94 partners and members, including seven patient organizations such as LeukaNET as well as pharmaceutical companies and university hospitals, HARMONY gathers genomic data from thousands of patients affected by the seven blood cancers AML, ALL, CLL, MM, MDS, NHL as well as pediatric and adolescent blood cancers. HARMONY then evaluates this information with respect to guiding research questions, for example: How does the body function under pathological change? What mechanisms lead to these changes? What molecular characteristics do cancer cells have?

Patient organizations have been involved and played an essential role in all of these aspects and in the review and evaluation of Big Data research proposals as well as the ethics review of the project from the very beginning. The project has already generated insights by evaluating genomic data from nearly 5,000 AML patients and more than 7,000 patients with multiple myeloma.

Another central goal of the project is to develop core outcome sets for these diseases. This method defines how and what researchers should measure in a disease and helps them to define meaningful endpoints, i.e., the goals of a clinical trial, for future studies. The project has defined Core Outcome Sets in Acute Myeloid Leukemia, and is currently running a Delphi-based process.

Requirements on the data

However, in order to carry out a Big Data project of this kind, a number of requirements have to be met, also regarding the quality of the data. This is because the data are characterized by a high degree of complexity, hardly any structure and a high degree of rapidity – and a large amount of data alone does not automatically provide new insights. For this reason, the source data must be as structured and consistent as possible and has to be standardized with the help of analytical and statistical evaluation tools. 

Moreover, tools are needed that either anonymize personal data, i.e., delete all identifying characteristics such as the patient’s name or date of birth or location, or replace this information with pseudonyms. In the latter case, these identifying characteristics must be stored by a data trustee separate from the personal data. This step is particularly necessary with regard to the General Data Protection Regulation (GDPR) introduced in 2018. This is important especially in the healthcare sector, as anonymized or pseudonymized data are no longer covered by the GDPR and can be used for research purposes.

In the field of health research in particular, the pseudonymization method is often preferred because genomic datasets are highly complex and anonymizing them would distort and thus make them worthless for research. In order to comply with data protection requirements, HARMONY renders all datasets unrecognizable using a two-step pseudonymization process, with additional protection through access mechanisms and stores them on a specially created data platform that complies with EU directives. To render them usable, the data records, which stem from different clinical trial databases and differ in structure and composition, are also harmonized, i.e., standardized. Currently, the HARMONY database contains information from 45,000 patients with one of these seven blood cancers.


We are happy that HARMONY PLUS, a second project launched in October last year as part of a public-private partnership, also builds on HARMONY’s structures by focusing on blood cancers not covered by HARMONY – namely chronic myeloid leukemia (CML), polycythemia vera (PV), essential thrombocythemia (ET), myelofibrosis, Hodgkin’s lymphoma, Waldenström’s disease and other rare blood cancers.

More information: