From Giga to Terabytes: The Dutch National Data Plan
Mirjam van Reisen
22 November 2018
Being fit for the new era of big linked data: The Netherlands is working on a National Data Plan. As we move from giga to terabytes of data, a group of experts brought contributions of what could be the essence of such a plan in a meeting On Data Science for Societal Challenges, in The Hague. The participants included the leaders of the Data Science Centers of all Dutch Universities. The question: How can data science be used responsibly for science, government and the private sector.
There are major concerns. The exponential growth of the big five largest Tech-companies: Facebook, Apple, Amazon, Microsoft and Google (the FAAMG-group) has hijacked the possibilities of the internet. Tim Berners-Lee, the founder of the internet, is so concerned that he has concluded that the internet needs to be returned to the data-users. To enable this, he developed SOLID: Social Linked Data, which he presented this month at the Web Summit in Lisbon on 5-8 November. The essence of SOLID is that it gives data ownership back to data subjects by decoupling data content from an application. With data decoupled from an application, the data subject remains fully in charge in the use of the data related to him or her.
SOLID follows the logic that has been established by the European General Data Protection Regulation (GDPR), which has been applicable since June 2018. The GDPR established the norm that processing of personal data is prohibited, unless the data subject has give their consent (or if it is expressly allowed under the law). The GDPR identifies that data belongs to the data subject. This is a crucial step for open data, as GDPR prevents vendor lock-in of data.
In establishing this regulation, the European Union set itself well apart from other continents. In the United States tech innovation is commercially driven; it is accepted that data is the property of the companies that produced the data or it has bought. Data subject consent for the use of their data is not required. Innovation is private sector driven and has increasingly developed in the context of a few key monopolies, without meaningful government regulation.
In China the government maintains a monopoly over the use of data and its intensive development of extensive data driven policies may appear as control systems of civic space. The Social Credit System is intended to facilitate the distribution of public services based on the credit rating of citizens developed on a range of linked open data. The establishment of this credit rating in the performance of citizens is out of their control. Negative ratings will decrease their ability to function and participate fully in society. Africa, on the other, provides mostly a greenfield, with much space for new experimentation and a continental competition over digital influence on the continent.
So where is Europe? The European approach is based on a norm-setting that data-subjects are entitled to exercise control over their data. In order to do so the data need to be segmented. Such segmentation allows data to be re-used, with or without personal information attached to the data, provided that the data subject has given their explicit consent. This can enhance innovation as developers can use existing data, consent permitting.
The European Union has come forward as a leader in setting an infrastructure that regulates data. This will impact directly on science and EU funded research. In the next generation of Horizon 2020, data will have to be managed according to a segmented format, in which data will be Findable, Accessible (under well-defined conditions), Interoperable and Re-usable – which stands for the acronym of FAIR. The adoption of FAIR for all research data produced by European funding will provide a giant leap forward for science and big data. Based on FAIR it will set a new standard for hypothesis-driven investigation. The deposit of FAIR data is called the European Open Science Cloud, which, in reality will be a distributed linked data architecture, the internet of data. FAIR implementation is already taking place on all continents.
The Netherlands is a leader in the use of FAIR, developed at the Leiden Lorentz Center in 2014 and recently established in Go-FAIR as the implementation node in The Netherlands. In the panel “Agenda-setting: shaping the National Data Agenda”, representatives of the Dutch Data Centers of Amsterdam, Leiden, Groningen and Wageningen agreed that government policy should continue to promote FAIR as the basic public architecture for open data. The Netherlands already did so in 2016 during the Dutch Presidency of the EU.
Go-FAIR will not just provide a set of principles for science. In a letter to the Dutch Parliament, Minister Bruins of Health and Sports, reports that the use of FAIR for health policy has been successful. The introduction of the FAIR principles allowed hospitals to share data without compromising personal health privacy, while personal health records were simultaneously open for patients through unique identifiers provided to the data for each patient. The FAIR-principles have an enormous potential to simultaneously advance science, aggregate data analysis and protection of, and access to, personal data by data subjects.
FAIR is flexible. It can integrate other protocols, such as FACT (Fairness, Accurate, Confidentiality and Transparency) developed as Responsible Data principles. It can equally integrate SHARED, the principles linked to use of data in big cities, identifying the importance of contextualization and localization, ensuring that data do good by recognizing diversity. The acronym of SHARED relates to the principles of: Sustainable, Harmonious, Affective, Relevant, Empowering and Diverse.
Without vendor lock-in the internet of data can grow rapidly and this would give The Netherlands and Europe a unique advantage, allowing the private sector to develop within a space where the public is protected and trust in the use of data is built. Sir Nigel Shadboldt, a close collaborator of Tim Berners Lee agreed. He suggested that data trusts can support the public in providing advice on data consent and that this should enhance the understanding of the algorithms and applications that ask consent to enter the data of a data-subject.
The Dutch National Data Plan will recognize that today we are all data subjects. The policy should embrace the notion that the internet should be given back to the data subjects. It should recognize that machine-readable distributed FAIR data will provide a great leap forward to enable open public and private-led data innovation to flourish in Europe. This will make The Netherlands a leader in Data innovation for societal challenges in Europe and across the world.
 Mark D. Wilkinson, Michel Dumontier […], Barend Mons. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Nature. Scientific Data volume3, Article number: 160018 (2016). Available at: https://www.nature.com/articles/sdata201618#auth-2
 Mons, B. et al. 2017. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services & Use, vol. 37, no. 1, pp. 49-56, 2017. Available at: https://content.iospress.com/articles/information-services-and-use/isu824