Halogen 2: November 2011

This final blog post provides information on the primary product, a geospatial visualisation tool called HALO-view for researchers, produced by the HALOGEN 2 project.

Key Elements of HALOGEN2

The HALOGEN 2 project set out with two key objectives which should provide some context for the HALO-view tool we have produced.

Firstly, to develop and deliver better tools to researchers so that people with different levels of technical skill could access and get value from the HALOGEN datasets. HALOGEN (History, Archaeology, Linguistics, Onomastics and GENetics) is cross-disciplinary spatial research database established in 2010 (www.le.ac.uk/halogen). The database holds data for England from the British Museum’s Portable Antiquities Scheme, place name data from the University of Nottingham’s Institute for Name Studies and genetics data from the University of Leicester.

To do this we wanted to create a geospatial visualisation tool and a prototype data extraction tool was developed using Business Objects (http://www.sap.com/uk/solutions/sapbusinessobjects/index.epx) to allow researchers an easy way of accessing HALOGEN data and extracting subsets of data for statistical analyses with tools that they are already familiar with such as SPSS, SAS, Excel.

Secondly, we wanted to both extend the coverage of existing data sources and to add new data sources to the HALOGEN database. This involved acquiring, cleaning, formatting and ingesting two new sources of data into the HALOGEN database. The first of these is data on surname distributions based on 1881 census [1], and the second was additional genetics data from a previously published study [2]. In addition, the existing Portable Antiquities Scheme data was extended to cover the whole of England.

HALO-view Visualisation Tool

HALO-view - a spatial visualisation tool: simplifying and improving access to spatial data

The tool can be accessed at http://halogen.le.ac.uk/ Please email dpc15@le.ac.uk with any feedback.

The HALO-view tool allows non-technical users to query HALOGEN datasets through a simple web-based interface with the results displaying on a Google map.

When you begin using the visualisation tool it will automatically detect your location and display Place Name data associated with your current place name. From this first screen you can run a query to:

Discover the meaning of your own chosen place name.
View a whole county of place names.
Interrogate the entire country by language type and language element.
Search for the ‘treasure’ of England in the Portable Antiquities database using county or historical period.
Query parish, county or country-wide surname data from the Victorian census in 1881.
Explore summarised genetics data by county or counties, and for the specialist by genetic haplogroups.
Additional soil and Roman road map overlays to stimulate thinking on patterns and coincidences between the HALOGEN datasets.

For each data set there are a variety of ways to navigate the data: you can choose to explore either exact matches or use “fuzzy searches”; you can view the output either in mapping form or as a flexible tabular output; and you can also zoom in and out of mapped data and switch between satellite and map views.

Searching for English Place-Names

This screenshot of HALO-view shows a search for English Place-Names

If you ‘zoom in’ you can see a lower level distribution of points and by clicking on a ‘point’ you display its name and data on its derivation.

Looking for Treasure!

This screenshot shows a view of the ‘treasure’ of England in the Portable Antiquities database using county or historical period.

Zooming in and clicking on find displays further information.

Querying parish, county or country-wide surname data from the Victorian census in 1881

This screenshot shows the 1881 Census Surname overview

This screen shot shows an example of tabular output relating to the distribution of the Butters surname in Northamptonshire, Nottinghamshire, Leicestershire and Derbyshire.

Soil and Roman road map
The example below shows distribution of roman finds against the roman road overlay.

The only way to get a feel for the tool is to have a go using it.
GO ON GIVE IT A SPIN!

Who Is HALOGEN For?

Initially the project was targeted at two specific groups of researchers at the University of Leicester. The first group was the cross-disciplinary ‘Roots of the British’ collaboration (www2.le.ac.uk/projects/roots-of-the-british), a group of scholars grounded in humanities and genetics. Their mission is to interrogate evidence for the migration and/or continuity of human populations in the British Isles. The second group is from the ‘Impact of Diasporas’ Project (http://www2.le.ac.uk/projects/impact-of-diasporas) who plan to analyse and model relevant migration data. This tool is seen as a key facility for their data analysis work.

Quote from Professor Mark Jobling (Roots of the British Collaboration): "This project has been a very positive experience for us. It was efficiently managed, flexible (accommodating the introduction of new expertise as it went along), and has come up with a useful product that we will continue to develop".

"The novelty and multidisciplinary nature of the project has contributed to the success of other multidisciplinary grant applications, and in turn these will feed back into further developing, and sustaining the HALOGEN resource."

In addition the IT expertise, modernisation and quality control of old database structures has also opened the audience to specific user groups. For example, the project will be replacing the existing web enquiry facilities for the Institute of Name Studies ‘Key to English Place Names’ database run from the University of Nottingham. This will be used by researchers working at and with the Institute and by members of the general public who are interested in place-name etymology and who access the website out of general interest.

Quote from Jayne Carroll, Director of Institute for Name Studies:‘HALOGEN has not only incorporated KEPN into a larger dataset with interesting and potentially significant results, it has added functionality to KEPN as a stand-alone research tool, allowing finer-grained searches and a range of map interfaces, altogether improving on the original.’

What Are Our Future Plans?
It’s the view of the project team, and evidenced in the quotes from our researchers above, that the HALOGEN and HALO-view systems have huge potential. Ideas on improving the products are many and include:

Increasing the number of datasets included in the database and available through HALO-view. Discussions are underway at Leicester to consider adding further genetics data that could support genome and population researchers.
Improving HALO-view so that it allows users to run queries across multiple datasets at the same time as an aid to easily exploring relationships and patterns between different types of data.
Developing and offering a ‘service’ to researchers who do not have access to the level of technical expertise available at Leicester. The idea being in return for a researcher sharing their data, we would clean it, apply spatial data, ingest it into HALOGEN and make their data available back to them through the HALO-view tool. They would then be able to visualise their data against other datasets, and in return another source of data would be made available to our community.

To address the above, funding is required. We feel HALOGEN and HALO-view provide a model for researchers looking to explore multiple complementary geographically referenced data sets. There is an opportunity for other researchers to reuse our approach and tools, as well as learning lessons from our work as they work on projects in their own institutions and disciplines.

Licensing

The outputs from the project will be backed up, managed and supported by IT Services at the University of Leicester for at least 3 years.

All project documentation whatever its form is governed by a license agreement complying with Creative Commons Attribution-Non Commercial- Share Alike 3.0, and all code is licensed under terms compliant with the terms of the GNU General Public License version 3.0. The code will be made publically available at: https://svn.rcs.le.ac.uk/public/halogen/HALOview

References and Documentation

These links provide access to information on the HALOGEN2 data sets and how to use the HALO-view tool:

[1] Schürer, K. and Woollard, M., 1881 Census for England and Wales, the Channel Islands and the Isle of Man (Enhanced Version) [computer file]. Genealogical Society of Utah, Federation of Family History Societies, [original data producer(s)]. Colchester, Essex: UK Data Archive [distributor], November 2000. SN: 4177, http://dx.doi.org/10.5255/UKDA-SN-4177-1
[2] Capelli, C., Redhead, N., Abernethy, J. K., Gratrix, F., Wilson, J. F., Moen, T., Hervig, T., Richards, M., Stumpf, M. P. H., Underhill, P. A., Bradshaw, P., Shaha, A., Thomas, M. G., Bradman, N. & Goldstein, D. B., A Y Chromosome census of the British Isles (2003). Current Biology 13, 979-984.
User guide for visualisation tool: http://halogen.le.ac.uk/guide/
Data Glossary for HALOGEN database: http://www2.le.ac.uk/offices/itservices/resources/cs/pso/project-websites/halogen/documents/Data-glossary-V2.3.pdf
Technical Guide for Visualisation Tool: (In preparation - link to be added)

Acknowledgements
The team gratefully acknowledge the support and funding provided by JISC, without which this work would not have been possible.

We would also like to thank our partners and data providers, a full list of whom can be found here: http://halogen.le.ac.uk/partners/

Halogen 2

Wednesday 30 November 2011

JISC GEO Event – Break Out Session – Repurposing Geospatial Data

Thursday 10 November 2011

HALOGEN 2 Project - Final Product Post