Monday, October 7, 2013

Wikidata

Many of the Wikipedia articles contain language-independent factlets, which have to be kept up-to-date in each language edition manually. If these factlets are stored at one central place - quite similar to the graphics used in Wikipedia articles which usually come from one separate Commons-Wiki - then these facts will be easily updated without speaking any of the languages of the Wikipedia articles in which they are used. Additionally, factlets organized in a more strict way than the human-readable text of the articles make other automatic processing possible. The Wikidata project does exactly this, more and more of the factlets in the Wikipedia articles can be imported automatically from the Wikidata Wiki.

The administrative subdivisions are one of the prime examples where this concept can be used, facts like the area, the population data, the country or province to which they belong, or the list of subdivisions already make up a good deal of a decent Wikipedia stub article. Especially the data which is displayed in the so-called infoboxes are mostly already includeable from Wikidata. One major thing which is not yet possible in Wikidata are lists, and not yet all of the data I am collecting has corresponding data categories (called properties in Wikidata). And it is a quite tedious work to add more than one factlet at one time manually, so to really get the Thai subdivision well-covered there I would have to learn how to use a bot for automatic editing.

One thing which already is imported completely on Wikidata are the language links to the Wikipedia article in various languages. In fact, every article which is available in more than one Wikipedia now has a corresponding page in Wikidata. Thus I now have added one more data item in my XML files, which can link every subdivision to the corresponding Wikidata page, and I am now slowly adding all the province and districts. And since OpenStreetMap and Wikimapia are another similar Wiki website which also has specific IDs for geographical entities (though one has to be careful to separate the office and the full entity), these are defined in the XSD as well. As an example, the province of Surat Thani now has these two links within the XML, which easily translate to URLs on WikiData and OpenStreetMap.
<entity type="Changwat" name="สุราษฎร์ธานี" english="Surat Thani" geocode="84">
  <wiki wikidata="Q240463" openstreetmap="1908825" /> 
</entity>
 

No comments: