Friday, December 4, 2020

GADM subdivisions geocodes

A month ago I noticed that a new external identifier was added to some of the Thai provinces, taken from the GADM subdivision maps. From the rudimentary info on the GADM website - which even does not explain what the acronym is supposed to mean - this seems to be a project to provide maps for the country subdivisions up to the 2nd level. For Thailand it even goes down to the 3rd level, the subdistricts (Tambon). Additionally to the maps, it also defines a unique code for each subdivision. At first look an interesting project.

However - the website does not state any author of these maps, nor gives any sources. And they are only free to be used for non-commercial use, so not really free. But it got even worse.

In order to avoid wrong codes assignments in Wikidata, I had a look into the subdivision codes for Thailand - adding them to my XML files so I can easily add them by bot later. Starting to look at the alphabetically first province Amnat Charoen I noticed a big mess.

  • Chanuman district: all four subdistrict which are listed in GADM are in fact subdistricts of Hua Taphan district, none of the real five subdistricts has any code
  • Hua Taphan district: all Tambon correct in GADM
  • Lue Amnat district: all five subdistrict listed in GADM are in fact from Mueang Amnat Charoen district, none of the real seven subdistricts has any code
  • Mueang Amnat Charoen: only 11 of the 19 subdistricts have a code
  • Pathum Ratchawong district: 1 subdistrict correct, 4 non-existing subdistricts in GADM, and 6 real subdistrict missing
  • Phana district: 2 subdistrict correct, other two merged into one code
  • Senangkanikhom: all Tambon correct in GADM
Not mentioned in the above - the romanization in GADM does not follow the recommended RTGS transcriptions. Sometimes it the outdated old RTGS like Muang instead of Mueang, sometimes it totally random. All this would have made me ignore these geocodes as they look totally unusable, but in order to avoid wrong data inserted in Wikidata I picked up the task and worked though all provinces and added them into my XMLs. In fact Amnat Charoen was one of the worst provinces, in many other it was just codes missing. Most often the codes for subdistricts created after around 1990, but other recent one are present, so its not just very outdated data. Another hint which indicates outdated data - the minor districts (King Amphoe), which were all upgraded in 2007, are still present as minor districts in GADM. On the other hand, the newly created province Bueng Kan is present.

In total, out of the 7256 Tambon, 1772 are missing in GADM. Since there are a total of 5927 subdistrict codes in GADM, this means 443 entries are total bogus like the one in Amnat Charoen,  or dummy entries indicating a district has no subdistrict codes, or a incomplete list. Only nine of the 77 provinces had no problem.

One month later I completed them in my XML, and can now start the bot to add all these codes to Wikidata, and probably forget about these codes. My attempt to contact the GADM team wasn't answered yet, and new version announced on the website of April 2020 did not show up yet. And if the maps are the same quality as the codes, I can only assume they are sadly totally unusable.