Beyond Open Access: The Discovery of Knowledge

The aim of The Hague Declaration on the extraction of knowledge in the digital age is to improve access to knowledge by removing barriers.

Bridge number 4 open, Welland Ship Canal, Canada, 1910.

Bridge number 4 open, Welland Ship Canal, Canada, 1910. Source: Toronto Public Library.

Over the course of the last decade, much progress has been made in open access to scientific publications. But beyond access to publications, increasingly the data and processes that have been generated in the research process are requested in order to be able to reproduce them. With this objective, a pilot project, Horizon 2020, has been set up. Its aim is to make public those data generated or complied by a research project that can be reused. Currently the European Parliament is reviewing the 2001 European directive, defining the laws of intellectual property. Among the proposals presented, there is the introduction of a new exception to allow a text to be analysed and data extracted from it. The Hague Declaration on Knowledge Discovery in the Digital Age has as its objective improving access to knowledge, eliminating any barriers that may prevent research and analysis.

Over the last decade, the concept of open access to scholarly journals has been widely discussed in the academic world. More than ten years have passed since the publication of the Budapest Open Access Initiative, which laid the foundations for this movement that promotes open access to research results without technological, economic or legal barriers. The goal set in late 2001 was to achieve this aim in publications in which researchers do not receive direct financial compensation, mainly journal articles or working documents. The recommendations of the Declaration set out two strategies for achieving the final goal, which have become the two types of open access that we know today: green and gold. On one hand, it encouraged authors to self-archive their articles, or, in other words, to keep a copy of all contributions that had been reviewed and published and to disseminate them through open archives known as repositories. Self-archiving allows anybody to access the published results without passing through a payment stage, although publishers usually demand a period of time or embargo – lasting from six to forty-six months – before making the text publicly accessible. They also stipulate that the reviewed version may be released, but not with the final layout or presentation (the author’s version rather than the final published version). This is what is known as “green” open access. If we had a repository containing everything that has been published, would there be any point in maintaining the current scientific publication system? This is where the second strategy comes in. “Gold” open access is based on changing the magazines and journals themselves, and advocates that copyright be used to allow free dissemination, rather than to set up impediments. The gold option is based on so-called open access journals, which, as the name suggests, are freely accessible journals that publish content under a licence that allows others to reuse it freely, requiring only that authorship be acknowledged, and that that integrity of the work be preserved. Given that access to this content is free, it becomes necessary to seek other means of generating income as alternatives to pay-for-access. One of the systems is pay-to-publish, but there are others [1].

One of the goals of open access is to allow the general public to access research results without barriers of any kind, which is the idea underpinning the different open access (or perhaps “public access” would be a better name for them) policies that currently exist. At the institutional or academic level, in Spain and internationally, there are increasing demands that funding bodies provide public access to research results [2]. In many cases, research is financed with public funds and this is a way of ensuring that there is a public return.

We have repositories, we have open access journals, we have policies, we even have laws [3] that impose open access. But what is the situation now? In summer 2013, the European Commission published the results of a study it had commissioned, which found that we had reached the tipping point given that around 50% of publications dating from 2011 could be accessed free of charge. It would be impossible to reach this figure just with the texts in repositories, because the percentage of articles that are accessible to the public is quite a lot lower. Nonetheless, the introduction of stronger mandates, such as the obligation to deposit and allow public access to any article resulting from projects funded by the European research framework has increased researchers’ interest in and concern over open access.

But perhaps we are at a point where we have to look beyond publications. Research results can take very diverse forms, and as such we need to transfer this idea of openness in publications to these other objects. The first steps are being taken in the field of data. Some publications now require the data generated or used in a research project to be published along with the texts. It’s not enough to read how the results were attained, authors are required to submit the data so that they can be reproduced. To this end, a pilot project has been launched within the Horizon 2020 European research funding programme. The aim of this project is to publicly release the data that is generated or compiled in research projects, so that they can be used extensively, promoting their management and preservation.

We can go a little further still and consider openness in other stages of research, not just when a set of results has been produced. In fact, some researchers already share everything they do on a daily basis [4] or publish every line of code that they generate through public access servers, allowing it to be reused. All of these practices come under what is known as “open science” or “open research”. For the time being they are minority practices, but they are emerging strongly and the European Commission has set open science as one of its priorities for the next few years. Last year, it organised a public consultation to discover the opinion of all interested citizens and, particularly, the parties directly involved: researchers, institutions, publishers, associations,…. After the consultation, four meetings were held to validate the responses received, and the report on this validation process was released a few weeks ago.

The Hague Declaration

It seems clear that we are in the midst of a transition in which traditional dissemination of research results coexists alongside new ways of sharing the entire research process with the general public. We have to make it possible for all researchers to find ways of disseminating their work, without penalising any option, rewarding them by acknowledging their work, and, as far as possible, allowing everybody to know the work that is carried out with public funds.

But while we try to ensure that research remains open and accessible to everybody, new obstacles continue to emerge and hinder the discovery of knowledge. And in many cases the obstacles are not technological but legal, and they generate uncertainty and insecurity above all. Most of these obstacles stem from a restrictive application of copyright.

The European Parliament is currently reviewing the 2001 European Directive that recommended a range of exceptions or limits to state copyright laws in order to meet the challenges of the new information society. This system of limits or exceptions sets out everything that can be done with a work without seeking permission from the copyright holders, such as for example making a copy for private use, quoting, parodying… The first draft of the report prepared by the reporter Julia Reda suggests that the directive is hindering knowledge exchange rather than promoting it, as was the original aim. The final report should have been voted in the commissions by now and submitted to the plenary of the Parliament, but it has been postponed. The plenary vote will have to approve or reject the amendments presented by members of parliament, which are interesting to read as they reveal a disparity of opinions.

Proposed changes include the introduction of a new exception that will allow anybody to analyse a text and extract ideas or data from it. This exception may appear superfluous, given that most of us would agree that whenever we read a text we can extract ideas or data from it without asking permission. But many analysis and extraction processes are now carried out by machines, and these involve copying the text, an action that leads copyright owners to claim that authorisation is required [5]. Most European laws have a legal vacuum in regard to this type of exploitation of works, and this situation provokes legal uncertainty and doubt that needs to be resolved. Meanwhile countries such as the USA have made it clear that no kind of authorisation is required in their jurisdiction. In Europe, there is an example of an exception being introduced to intellectual property law: the United Kingdom legislation was modified in summer 2014 to include a series of exceptions or limits. These new limits include allowing any person to make a copy of a work that he or she has accessed legally, and to carry out computational analysis for the purposes of non-commercial research. It is a step in the right direction but it may not be enough, given that the restriction on commercial purposes leaves some uncertainty, particularly in cases where the research may have some commercial interest, which is a concept that is hard to pin down.

But why are we interested in this process of analysis and extraction in the first place? We currently have a flood of information in all spheres, and research is no exception, Researchers cannot read the huge amount of content that is published, which is why they have created these automated mechanisms to process them and extract the most relevant facts or ideas. These processes have been given the generic name of text and data mining, or TDM. The analysis of texts to extract ideas, facts, data, and patterns in general can help to advance research in any field. For this reason, a new declaration was published last week, which positions itself in favour of more equal access to knowledge, and demands changes to intellectual property legislation in order to protect the free circulation of ideas, data, and facts. This declaration was drafted in 2014 at the headquarters of LIBER, the Association of European Research Libraries, in The Hague, and is named after this city. The aim of The Hague Declaration on the extraction of knowledge in the digital age is to improve access to knowledge by removing barriers that hinder research and analysis.

At the start of the 21st century, Budapest became the benchmark to start moving towards open access. It remains to be seen whether The Hague will take over from the Hungarian city and become one of the benchmarks in the consolidation of open research. I invite you to read the declaration and support it if you agree with the text.


[1] In the field of high-energy physics, for example, an international consortium was founded in 2014 to centralise the costs involved in open publication.

[2] You can see some of these policies on this website that compiles them.

[3] An example is Spain’s Science Act.

[4] For example, Carl Boettiger.

[5] For example, see the conditions set by some publishers in regard to published texts: ElsevierSpringer.

View comments1

  • Consuelo | 13 May 2015

Leave a comment

Beyond Open Access: The Discovery of Knowledge