Monday, April 27, 2009

Mistakes in the DOPA population tables

I already mentioned it when I explained the population statistics from the Department of Provincial Administration (DOPA) - there are some cases where these statistics have relatively obvious errors. I only found them because my parser for these data stumbled upon them, and I had to create some workarounds to make sure the specific year and province still gets parsed completely. Of course errors can occur with 15 years with each 76 provinces, or even more descriptive more than 110,000 data entries.

The following list is not complete, only to illustrate three different kinds of errors which occur. If someone needs a full list I can of course work through my files and compile a full list - maybe someone at DOPA wants to fix their data?
  • The most obvious mistakes are misspellings. One example would be 1993 and 1994 data of Samut Songkhram, where the subdistrict Si Sa Chorakhe Yai is spelled ศรีษะจรเข้ใหญ่ instead of ศีรษะจรเข้ใหญ่, the "i" on top of the second instead of the first character.
  • In the 1993 data of Phatthalung, under the district Pa Bon it shows 297 citizen in an unnamed subdistrict, only listed as "ตำบล*** 93080500 ***". This geocode was probably already assigned to a planned subdistrict, which however then wasn't created.
  • The most tricky mistake are subdistricts placed under the wrong municipality, while a subdistrict placed in a wrong district did not occur. The example for this is the 1993 data for Prachuap Khiri Khan: the town Prachuap Khiri Khan includes the subdistrict Nong Khae (ตำบลหนองแก) which is actually part of the town Hua Hin, and also the town Hua Hin contains the subdistrict Prachuap Khiri Khan (ตำบลประจวบคีรีขันธ์) which of course belongs to the town Prachuap.
It's no coincidence that all of the above example are from 1993, as it is especially the oldest data which has more mistakes, for the latest data I cannot find any such errors anymore.

Another thing which looks strange but is probably correct are municipalities which include subdistrict with only a handful of citizen. This might be due to the boundary of the municipality including a small part of a neighboring subdistrict.

