Last week I gave a GIS tutorial using some prototype data we cleaned up from the original KEPN, PAS and GUL data sets to Turi, Jayne, Phillip, Dave and Mark W. The object was to help empower these blokes by showing them how to load up the data into a GIS environment and chop up the data with some simple querying methods thus stimulating the construction of new research questions.
Talk about the wow factor - they were really chuffed to see a spatial plot of what they used to know as rows and rows of tabulated data. After filtering the data suddenly their hypotheses were mapped out in front of them, e.g. Place names with Cornish elements did gravitate to the county of Cornwall, place names with Norse language elements did gravitate to the North and East of England. When ancillary data such as roads and rivers were plot as background layers I think the cogs and wheels started spinning and ways to answer research questions were suddenly looking so much easier for the researchers.
The data was questioned though, and quite rightly so, it should be a standard procedure for any researcher to be sure of the origins and quality of their data.
(i) Some of the grid references were slipping through our padding procedure and looking too accurate (by this I mean our rounding up of grid references to 0.5km). We did this to ensure privacy of data and maintain a consistent resolution between datasets. This is a small technical issue we need to address.
(ii) Cornish place name elements were detected in Herefordshire and way up in Lancashire. In retrospect Dave and I examined the original data source a few days later and found that these results were true. It was the original data that was throwing up the anomalies, technically the HALOGEN team appeared to get things right.
What do we learn from this? Firstly all the hard work is paying off and the researchers find this a really useful tool. Secondly we can only deal with the data we receive. We did our own quality check to be sure we had it right, if the source data is wrong HALOGEN cannot 'make up' data that fits, a strategy of quality control on the original data is required.