Friday, March 21, 2014

Wikidata lists and categories

Wikipedia Infobox
On his blog "Words and what not", Gerard Meijssen is showing and advocating the many possible uses of Wikidata beyond what most Wikipedia users know so far. By collecting the actual data in a structured way, it allows queries which would be very difficult to do by extracting data from the infoboxes in the Wikipedias, or simply impossible since the data isn't complete on Wikipedia.

If you are familiar with the articles on the administrative units on the English Wikipedia, you might have encountered the corresponding categories, which help to sort together related items. For example, all the districts of Surat Thani province are within the category Amphoe of Surat Thani Province, and this category like (almost) all Wikipedia pages has a corresponding Wikidata item. Following Gerard's idea that categories are almost the same as a list, and thus using the property "is a list of" in this case as well, Resonator as the smart viewer of Wikidata items is then able to create a list of all the items which should belong to the category - in this case not surprisingly the same 19 entries. Well, almost, the English Wikipedia article on Ko Samui and Ko Pha Ngang are mainly on the island and thus not linked to the data item of the district, someone still has to split the two topics into two separate articles - e.g. in German the two topics have two separate articles.

The categories for the districts are on several language editions already, for the other types of entities (Tambon, Thesaban, TAO) the Thai Wikipedia has the best coverage, but still far from a complete category tree. And of course to use the auto-created list the Wikidata item must be set accordingly, something I have done only for a handful of categories so far. For example, the Tambon of Surat Thani right now show 98 entries, I simply haven't added items for all 131 subdistrict yet. When done, it will in fact show 132 entries, because Tambon Kraison dissolved 1986 because most of its area was submerged by Chao Lan lake. Another interesting category are the Thesaban of Nakhon Ratchasima, because an anonymous editor at the Thai Wikipedia adds article on them from time to time, so unless I create the item before you could see the above list slowly growing from its current value of 75.

While the above is done behind the surface by Resonator creating a database query from the property, one can query Wikidata directly as well. The above Tambon of Surat Thani visualized with the query maker is a start, but only allows a subset of the query API. But somehow I wasn't successful to build a working query to show only those Tambon having an Wikipedia article, maybe this is all not yet fully implemented. But of course the basis of all this is to have complete and good data in Wikidata, and I continue to work on that, just recently I could add the first population data with my bot...

Monday, March 10, 2014

Population census data since 1960

While working on adding the census 2010 data into my spreadsheet, I noticed that for many provinces I already had the exact population numbers since 1960, however not for all yet. Some of the PDF files with the census results for each province published after the 2000 census had the population history listed. So I tried the same trick which helped me to find the 2010 results - simply put a few of the numbers into Google together with the Thai province name and see if it has some hits.

What I found were some data pages from a Chulalongkorn university study, which listed data on elderly people from the censuses, including the total population. For example with the page on Surat Thani - one of the provinces where I did not have the exact data before - showed the total population in the 1960 census was 324,784, and grew to 747,049 in 1990. The more recent datapoints were then taken from the registration data. Once I filled all the holes, the sum however did not match with the total number, which took some time to finally spot those provinces where I had entered the number wrongly in past - or where the PDF had a mistake, I did not verify back with that one.

Just one thing did not add up correctly - 1960 and 1960 there were still the two provinces Phra Nakhon and Thonburi, which were merged into the special administrative area Bangkok in the early 1970s. The Chulalongkorn data had separate data for Thonburi and Phra Nakhon, but the numbers don't add to the value listed at the NSO as the regional sum.
YearPhra NakhonThonburiSumNSO
It is just a difference of 56 and 25 respectively, so the numbers are really close. Sadly, this final step I wasn't able to find on the net, so I have to wait until I can visit the National Library in Bangkok to check out the corresponding issues of the NSO publication - I only have the 1960 Chiang Mai issue depicted above. If anyone has access to these publication and can find it for me it'd be nice. But since Thonburi had no TIS1099 geocode, I cannot get this into my XML style anyway.

For the older censuses I wasn't able to find the data for each province - while Statoids has numbers listed, the sum 17,256,840 given there is not the same as found in the NSO table, according to that the total population in the 1947 census was 17,442,689.

Thursday, March 6, 2014

Population data 2013

Though I try to check the caption of each Royal Gazette publication, somehow the announcement listing the population for each province as of December 31st 2013 nearly slipped through. Luckily I spotted it being added as a reference to the population data of the province articles on Thai Wikipedia, just one day after it was published in the Royal Gazette.

As of this announcement, the registered population grew from 64,456,695 one year ago to 64,785,909, and increase of 329,214 or 0.5%. Ranong has the biggest population decrease with 4.5%, the biggest gain with 2.3% was Phuket. As the publication is in Thai with only Thai numerals, I have made the data into a little spreadsheet embedded below, including the 2012 data for comparison. In case anyone prefers it in XML, its within my Tambon project now as well.

While writing this posting, this population data is not yet available on the DOPA website, but I expect it to show up soon at The full population data down to subdistrict level and including the municipalities will also show up soon I guess.

In January the new datatype number was added to Wikidata, and I have already added some population data points to a few entities manually to try it out, of course most notably for my favourite province Surat Thani. And since I am also currently working on transferring the census data into my XML structure, I now really have to start to program the automatic editing of this kind of data in the TambonBot.

Wednesday, March 5, 2014

Phimon Rat town created

Yesterday, the upgrade of the TAO Phimon Rat (องค์การบริหารส่วนตำบลพิมลราช) to a town municipality (เทศบาลเมืองพิมลราช) was announced in the Royal Gazette. As the upgrade was effective February 22, this announcement came rather fast compared to other municipal upgrades - many of them still not published though many years effective already. Today, the constituencies were announced as well, so the election for the municipal council and the mayor won't take long.

What is the most interesting part of this new municipality is however its boundary, as one can see on the last page of announcement. The municipality was originally a TAO which covered the area of the Tambon not covered by any other municipalities, and in this case the Tambon Phimon Rat was already covered by parts of the town Bang Bua Thong. And not just parts of the Tambon were cut away, in fact the remaining area consists of two chunks without any connection, approximately one third east of bang Bua Thong and two third west of it.
Also interesting is the population development of the Tambon and the TAO, I have taken the registration data from DOPA into a little sheet to visualize how much the population did grow from 1993 (6946 in whole Tambon) to 2012 (39620 in TAO). Quite a clear indication of how much the suburbs of Bangkok grow.