Diving into the Life Science Data Lake
Mrz 21, 2016 | by Daniel Pelke | 0 Comments

LifeSciences

Word has gotten around that big data is a new and promising IT discipline. There are many examples of successful big data applications. They include connected cars, mobile adversing and security analytics just to name a few. Not much is heard from the life science industry. Clearly the lines between medical devices and lifestyle products such as fitness trackers are gradually blurring.

Bitkom surveyed 102 companies from the pharmaceuticals sector about the digital transformation in 2015. According to this survey, 97 per cent of pharmaceutical companies believe that lifestyle products are going to make a significant contribution to operating results in the future. These companies would therefore hold an important component of the new value chain: the sensor that records a variety of vital functions.

The data is stored in a backend system, either directly or via smartphone. Professor Michael E. Porter from Harvard Business School calls this system a ‘product cloud’ in his article ‘How Smart, Connected Products Are Transforming Competition’ that appeared in the Harvard Business Manager in November of 2014. He thereby expresses that this
is a dedicated, cloud-based backend system.

Furthermore, he provides an overview of the sub-components required for the product cloud. They include:

  1. Product Data Database
    A big data database that supports the aggregation, normalising
    and management of real-time and historical product data
  2. Application Platform
    An environment for application development and execution that supports the fast creation of smart, networked business
    applications with data access and visualisation
  3. Rules/Analytics Engine
    The rules, business logic and comprehensive data analysis functions that
    populate the algorithms and offer new insights into the product and its use
  4. Smart Product Applications
    Software applications on remote servers that handle monitoring, controlling, optimisation and the autonomous operation of product functions

In the meantime, the term ‘data lake’ has been coined for the product database, application platform and rules/analytics engine together with the required IT infrastructure. The data lake not only contains data generated by the product but also data from other company sources as well as information that is freely accessible or commercially available information, such as weather data, from social media or data contributed by partner companies.

The correlation of this data supports new and deeper insights. It helps develop an understanding of how vital signs change in different life phases and when they are perceived as comfortable, normal or uncomfortable. One can predict how the values will change over time and therefore make recommendations for when a medication should be taken and in what dosage.

One characteristic of the data lake is that it keeps growing. It also accepts new data sources from other parts of the company and makes it possible for them to use the existing data for their own purposes. Obviously R&D as an area that traditionally produces large data volumes also benefits from the data lake concept.

Much of this data is subject to restrictive regulations in development and follows highly structured processes. In research, however, unstructured data is produced in huge amounts over the course of years.  While old research data is not deleted, it is often not correlated with the results of current research. Even data from experiments and projects that were not successful represents a valuable source of information in a new context. New data may help in developing an understanding of why a project did not succeed.

Here too, the characteristic of the data lake is clearly revealed: The more data is stored in the data lake, the richer the insights that can be derived from it. It is like diving into the data lake. The longer and deeper you dive, the more surprising the insights that are gained.

‘Think big, but start small’ is the project approach that convinces the specialist departments and IT equally and launches data lake projects quickly. The time has come to implement this new concept.

We look forward to finding out what will be presented at this year’s DIA EuroMeeting in Hamburg on the topics of big data and the data lake.