Friday 28 October 2011

ArcGIS 9.3 -> MySQL

At the core of HALOGEN sits a MySQL database that stores several different geospatial data sets. Each data set is generally made up of several tables and has a coordinate for each data point. Now most of the geo-folk here like to use ArcGIS to do their analysis and since we have it (v9.3) installed system-wide I thought I would plug it into our database. Simple.

As it happens the two don’t like to play nicely at all.

To get the ball rolling I installed the MySQL ODBC so they could communicate. That worked pretty well with ArcGIS being able to see the database and the tables in it. However, trying to do anything with the data was close to impossible. Taking the most simple data set that consisted of one table I could not get it to plot as a map. The problem was the way ArcGIS was interpreting the data types from MySQL; each and every one was being interpreted as a text field. This meant that it couldn’t use the coordinates to plot the data. I would have thought that the ODBC would have given ArcGIS something it could understand, but I guess not. The work around I used for this was to change the data types at the database level to INTs (they were stored as MEDIUMINTs on account of being BNG coordinates). I know this is overkill, and a poor use of storage etc, but as a first attempt at a fix it worked.

Then I moved on to the more complex data sets made up of several tables with rather complex JOINs needed to properly describe the data. This posed a new problem, since I couldn’t work out how to JOIN the data ArcGIS side to a satisfactory level. So the solution I implemented here was to create a VIEW in the database that fully denormalized the data set. This gave ArcGIS all the data it needed in one table (well, not a real table, but you get the idea).

If we take a step back and look at the two ‘fixes’ so far, you can see that they can be easily combined in to one ‘fix’. By recasting the different integers in the original data in the VIEW, I can keep the data types I want in the source data and make ArcGIS think it is seeing what it wants.

And then steps in the final of the little annoyances that got in my way. ArcGIS likes to have an index on the source data. When you create a VIEW there is no index information cascaded through, so again ArcGIS just croaks and you can’t do anything with the data. The rather ugly hack I made to fix this (and if anyone has a better idea I will be glad to hear it) was to create a new table that has the same data types as those presented by the VIEW and do an

INSERT INTO new_table SELECT * FROM the_view

That leaves me with a fully denormalised real table with data types that ArcGIS can understand. Albeit at the price of having a lot of duplicate data hanging around.

Ultimately, if I can’t find a better solution, I will probably have a trigger of some description that copies the data into the new real table when the source data is edited. This would give the researchers real-time access to the most up-to-date data as it is updated by others. Let’s face it, it’s a million times better than the many different Excel spreadsheets that were floating around campus!