Wikidata: The New Rosetta Stone

Wikidata is a new step forward in the democratisation of access to information.

Archive of marine geological samples of the Alfred Wegener Institute for Polar and Marine Research (AWI), Germany, 2007.

Archive of marine geological samples of the Alfred Wegener Institute for Polar and Marine Research (AWI), Germany, 2007. Source: Wikipedia

With more than fifteen million items compiled in the space of just three years, Wikidata is set to become the main open data repository worldwide. The eagerly awaited promise of linked open data seems to have finally arrived: a multilingual, totally open database in the public domain, which can be read and edited by both humans and machines. A lot more free information, accessible to many more people, in their own language. The structure of the Wikidata information system and the open format allows us to make complex, dynamic queries, such as: what are largest cities in the world with a female lord mayor or the number of ministers who are themselves the children of ministers, to name just two of innumerable examples. Wikidata is a new step forward in the democratisation of access to information, which is why the most important thing right now is the questions we ask ourselves: what information do we want to compile? How can we contextualise it? How does this new tool affect knowledge management?

With the introduction of the Internet, we now assume that information is just a click away. Thousands of people around the world post their creations online without expecting anything in return: guide books, manuals, photos, videos, tutorials, encyclopaedias and databases. All of it information at our fingertips. To ensure that the sum of all this knowledge reaches all human beings in their own language, free of charge, the Wikimedia Foundation runs many projects, free of charge, with one of the most successful being Wikipedia. The English version of Wikipedia reached five million entries in October 2015. But this version is culturally biased, with an over-representation of Western culture. In fact, it only includes 30% of the items entered in the other 287 languages that form part of the Wikipedia project, which now has a total of more than 34 million articles. Many of the articles that refer to a particular culture only exist in the language of that culture, as can be seen just by looking at the maps of geolocated items. There is a lot of work to be done: it is estimated that in order to cover all human knowledge, an encyclopaedia today should have over 100 million articles. Now that we know that it is possible and that everything is just a click away, we want to have the biographies of all the Hungarian writers available in a language that we understand, and we want it now. Local wiki communities around the world try to compile their own culture in their own language as best they can, but they often have limited capacity to influence the main body of the overall project. There are thousands of articles about Catalans in the Catalan version of Wikipedia, but there are not so many in the Spanish version, much less the French, and much, much less the English version. How can we disseminate our culture internationally if we’re still trying to compile it in our own language? How can we access information that is not written in any of the languages that we are fluent in? The defense of online multilinguism entails as many challenges as opportunities.

Map of geolocated items on Wikidata, October 2015

Map of geolocated items on Wikidata, October 2015

Data is beautiful. Data is information.

For this reason among many others, in 2012 the Wikimedia foundation created Wikidata: a collaborative, multilingual database that aims to provide a common source for certain types of data such as dates of birth, coordinates, names, and authority records, managed collaboratively by volunteers around the world. This means that when a change of government occurs, for example, simply updating the corresponding element on Wikidata will automatically update all the applications that are linked to it, be it Wikipedia or any other third-party application. It means that we do not have to constantly reinvent the wheel. This collaborative model helps to reduce the effects of the existing cultural diglossia, given that small communities can have a greater global impact in a more efficient manner. In the medium term, all Wikidata queries will include data from all over the world, not just from the cultures or historical communities with greater power to influence. A search for “doctors who graduated before they turned 20”, for example, will not only display French and English doctors, but also doctors from Taiwan and Andorra.

This project opens up a whole new world of possibilities, for collaboration and for using the data: Wikidata game allows users to make thousands of small contributions while playing, even from a mobile phone while waiting for a bus. Inventaire allows people to share their favourite books, and Histropedia offers a new way of visualising history through timelines. Meanwhile, scientists from around their world are uploading their research databases, and the cultural sector is building a database of paintings from all over the world . All of these projects run on the Wikidata engine, which is becoming a new international standard.

And why Wikidata and not some other project? Internet standards do not necessarily become accepted because of their ability to generate authority, but because of their capacity to generate traffic, or their capacity to be updated. The winner is not the best, but the one that can assemble the greatest number of users and be updated more quickly. This is one of the strengths of the Wikidata project, given that thousands of volunteers are constantly updating the information. As a result, any application or project based on big data can take advantage of all of this structured knowledge, and do so free of charge. All of this means that we have to reconsider the role that traditional agents of knowledge (universities, research centres, cultural institutions) want to play, and the role or the possible role of the repositories of authorities around the world, now that new tools are mixing and matching and creating a new centrality.

Cultural institutions, for example, have to deal with the challenge of the lack of standard matching criteria used to document artworks in their catalogues, such as for example: dimensions with frame, without frame, with or without passe-partout, descriptions in text format, number fields… institutions have to bring order to their own data at home before opening up to the world. Being open means interoperability. Many institutions are already adapting: authority file managers such as VIAF are openly collaborating with Wikidata, and MoMA has also started using it in its catalogue. In Catalonia, Barcelona University, in collaboration with Amical Wikimedia, is behind one of groundbreaking projects in this field, which aims to create an open database of all works of Catalan Modernism.

Data is not knowledge. Data is not objective.

Data in itself is not knowledge. It is information. With the emergence of a new, very dense ecology of data that is accessible to everybody, we run the risk of trying to over-simplify the world: a description, no matter how detailed, will not necessarily make us understand something. Knowing that Dostoyevsky was born in 1821 and died in 1881 and that he was an existentialist is not the same as understanding Dostoyevsky or existentialism. Now more than ever, we need tools that will help us to contextualise information, to develop our own point of view, and to generate knowledge based on this information, in order to promote a society with a strong critical spirit. And we shouldn’t forget that data in itself is not objective either, even though it supposedly purports to be neutral. Data selection is a bias in itself. The decision of whether or not to analyse the gender, origin, religion, height, eye colour, political position, or nationality of a human group can condition the subsequent analysis. Codifying or failing to codify a particular item of information within a data set can both inform and disguise a particular reality. Data is useless without interpretation.

The impact of the emergence of Wikipedia on traditional print encyclopaedias is common knowledge. What will be the impact of Wikidata? In line with the wiki philosophy, the work is done collaboratively in an asymmetric but ongoing process. We can all collaborate in the creation and maintenance of the content, but also of the vocabulary, of the properties of different items, and of the taxonomies used to classify the information. We are deciding how to organise existing information about the world, and we are doing it in an open, participatory manner, as an example of the potential of technology. We know that human knowledge evolves cumulatively, and that Western culture is essentially inherited. Our reality is determined, in a sense, through the technological, social, political, and philosophical advances of those who came before us. This means that today’s generations don’t have to discover electricity all over again, for example. We enjoy the fruits of the efforts of our ancestors. But the Internet, for the first time, allows us to be involved in a phenomenon that will mark human history: we are defining and generating a new information ecosystem that will become the foundation for a possible cognitive revolution. And we are lucky to be able to participate, question, and improve it as it evolves. Together, we can participate in a historic project on a par with humanity’s greatest advances. We can create a new Rosetta Stone that can serve as an open, transparent key to unlock the secrets of today’s world, and perhaps as a documentary source for future generations or civilisations. Let us take responsibility for it.

More debate

View comments1

Leave a comment

Wikidata: The New Rosetta Stone