Thursday, February 12, 2009

How to read the DOPA population statistics

As I was asked by email on a bit of explanation on how to read the population statistics from the Department of Provincial Administration (DOPA), maybe it would be helpful for others as well to have a full explanation here - there even seems to be no Thai explanation on them at the DOPA website...

To begin with, the URL for each year is formed in the same way, http://www.dopa.go.th/xstat/pop51_1.html is the first page for BE 2551 (i.e. 2008). The data is available back till BE 2536 (1993). To get to a specific province directly, the last part of the URL changes to pop51xx_01.html, where xx is replaced with the geocode (ISO 3166-2:TH) of the province. For example for Surat Thani it is pop5184_01.html. And of course the following pages are at _02.htm and so on.

Now you should see the page like in the screenshot. If you only see garbled characters, then your browser did not use the Thai encoding to display the website (the web designer forgot to set this in the header of the webpage), and you have to change the display encoding manually to Windows-874 (or ISO 8859-11). The table columns are from left to right
  • The name of the entity, including the type (Changwat, Amphoe, Tambon, Thesaban)
  • The male population
  • The female population
  • The total population
  • The number of households
Now comes the complicate part. In the above table one only sees 8 subdistricts of Mueang Surat Thani district, even though that district in real contains 11 subdistrict. The reason is that it only shows the non-municipal population at this step, for this district it only shows a population of 30,841 - but in real in is 171,387 when including the municipal populations. And as the district Ko Samui completely forms one municipality, this district is not listed in the table at all.

After the district with their non-municipal population come the municipalities, for Surat Thani 2008 beginning at page 5. Just like for the districts, it lists the municipalities in bold followed by all the subdistricts covered by them. Except those municipalities which were upgraded from a TAO recently, most of the municipalities cover subdistricts partially, so for example the subdistrict Bang Bai Mai (ตำบลบางใบไม้) is listed both under the Mueang district (2,521 citizen) and under the city of Surat Thani (603 citizen).

This structure makes it a bit difficult to get the full population numbers of a district or subdistrict, especially as the subdistrict names within a province are rarely unique, so to find the right entry from the municipal data one has to know to which district the municipality belongs. It was in fact this tedious work which I had to do for each of the 877 district articles on Wikipedia which made me use my programming skills to build a parser for these statistics. The second incarnation of this parser is part of my Tambon coding project. And even though I can now read the Thai letters , the translation of the Thai names into the easier to read Latin alphabet is also useful.

I think this should be enough for an overview, I will leave some special cases in these statistics as well as a few errors which only showed with the automatic parsing for another post.

No comments: