Friday, March 21, 2014

Wikidata lists and categories

Wikipedia Infobox
On his blog "Words and what not", Gerard Meijssen is showing and advocating the many possible uses of Wikidata beyond what most Wikipedia users know so far. By collecting the actual data in a structured way, it allows queries which would be very difficult to do by extracting data from the infoboxes in the Wikipedias, or simply impossible since the data isn't complete on Wikipedia.

If you are familiar with the articles on the administrative units on the English Wikipedia, you might have encountered the corresponding categories, which help to sort together related items. For example, all the districts of Surat Thani province are within the category Amphoe of Surat Thani Province, and this category like (almost) all Wikipedia pages has a corresponding Wikidata item. Following Gerard's idea that categories are almost the same as a list, and thus using the property "is a list of" in this case as well, Resonator as the smart viewer of Wikidata items is then able to create a list of all the items which should belong to the category - in this case not surprisingly the same 19 entries. Well, almost, the English Wikipedia article on Ko Samui and Ko Pha Ngang are mainly on the island and thus not linked to the data item of the district, someone still has to split the two topics into two separate articles - e.g. in German the two topics have two separate articles.

The categories for the districts are on several language editions already, for the other types of entities (Tambon, Thesaban, TAO) the Thai Wikipedia has the best coverage, but still far from a complete category tree. And of course to use the auto-created list the Wikidata item must be set accordingly, something I have done only for a handful of categories so far. For example, the Tambon of Surat Thani right now show 98 entries, I simply haven't added items for all 131 subdistrict yet. When done, it will in fact show 132 entries, because Tambon Kraison dissolved 1986 because most of its area was submerged by Chao Lan lake. Another interesting category are the Thesaban of Nakhon Ratchasima, because an anonymous editor at the Thai Wikipedia adds article on them from time to time, so unless I create the item before you could see the above list slowly growing from its current value of 75.

While the above is done behind the surface by Resonator creating a database query from the property, one can query Wikidata directly as well. The above Tambon of Surat Thani visualized with the query maker is a start, but only allows a subset of the query API. But somehow I wasn't successful to build a working query to show only those Tambon having an Wikipedia article, maybe this is all not yet fully implemented. But of course the basis of all this is to have complete and good data in Wikidata, and I continue to work on that, just recently I could add the first population data with my bot...

No comments: