Wednesday 9 March 2011

"Small WIN(s") & "FAIL(s)" - Progress Update for February 2011

February Update & Issues

The project has now formally started and an internal project proposal has been approved.

Resources have been allocated and the contract of our GIS specilist has been extended to cover the specialist GIS resource required to support this project.

Good progress has been made on the Data Sources Work Package (WP2). The HALOGEN system has been extended to include a full down load of Portable Antiquities Data (PAS) for all English counties. The increase in the volume of PAS data has created problems with the use of the ArcGIS tool used by the team (see issue below).

The project team have changed the scope of the Data Sources work slightly and agreed that 2 new sources of data will be added. These will be additional Genetics data from the published datasets of Capelli and the 1881 Surname/Census data from Kevin Schurer’s work with the Essex Data Archive at the University of Essex.

This later source is a replacement for the surname distribution data from the Archer Surname Atlas originally referenced in the project proposal. Initial investigations identified issues with the use of this data which have been addressed by the Essex Data Archive.

Requirements relating to the Capelli data have been agreed and the source data obtained. The data has been enhanced by the addition of relevant spatial data. Design work to include this in the HALOGEN database is in progress.

An initial meeting has been held to review the format and content of the surname census data from the Essex Data Archive and further sessions are planned.

Extension of PAS Data – Impact on ArcGIS


The addition of extra PAS data has caused problems with the use of shape files in ArcGIS. To temporarily overcome this issue PAS data is being extracted in 3 separate files for use with ArcGIS. The team continue to investigate a permanent fix to the problem.

Any hints or tips welcome !

Project Plan Post 2 of 7: Wider Benefits to Sector & Achievements for Host Institution

The benefits from specific project deliverables are listed in the table below.


Deliverables
Benefits and Outcomes
1. Enhanced Data Sources

– Extension of existing PAS datasets to provide national coverage. This is a relatively simple exercise as the Key to English Place Names and Genetics data is already present in the database at national level but some additional cleaning is required. A new ‘national’ extract of PAS data would be needed.

- Addition of 2 new data sources relating to the geographical distribution of surnames and further genetics data to the database.
A review of existing HALOGEN data extraction, cleaning and load procedures to cope with ingestion of ‘national’/larger data sources.

A prioritised list of data source related requirements that can be used to guide the future development of the service.

Data extraction, cleaning and load procedures for new datasets. Updated data glossary for researchers.
2. A Revised Data Management Plan

Additional requirements, policies and practice recommendations covering new features will be documented.
An assessment of the effectiveness of the DCC’s drafts DMP as an aid to research data management.

Information and lessons learned to source a JISC case study and input to community synthesis project.

3. Evaluation and Selection of Data Extraction Tool

Requirements for an appropriate tool will be documented. These will initially be used to assess the feasibility of using the existing tools supported by IT Services. If this is not appropriate a market evaluation will take place.
A contribution to wider JISC community to help develop awareness of good practice in terms of the selection and availability of similar tools.
4. An Implementation Plan for the Data Extraction Tool
An implementation plan covering the timescales, costs, risks and issues relating to the deployment of the selected tools will be documented. If feasible within the 9-month project window then the preferred product will be implemented.

5. Feasibility Study for the Provision of HALOGEN Database Enquiry Facilities for Institute of Place Names Website Users.
A contribution to wider JISC community to help develop awareness of good practice in how to deal with similar requirements and problems.
6. Interim project reports
Compliance with JISC requirements for project control.
7. A project blog and updated Halogen Project website
Sector-wide dissemination of findings and engagement with key stakeholder communities.

Project Plan Post 7 of 7: Budget

Budget Summary

The total cost of the project was c.£353,000 of which the JISC award of  £85,000 represents 24% of the total project costs. A summary of the total project budget (covering both JISC and institutional contributions) is given below.

Category                                        %

Directly Incurred Staff                      5
Directly Incurred Other                    7
Directly Allocated                          50
Indirect                                         
38

Total                                           100

The largest forecasted costs relate to staffing. It is estimated that a 'virtual team' of 3.27 FTE's will work on the project for it's duration.


Budget Management

The project manager will be responsible for managing and monitoring the project budget on a day to day basis. The project manager will be accountable to the Project Board and will report any significant variances to the Board for discussion and authorisation.

The project board is chaired by Professor Annette Cashmore (Sub-Dean for Medicine and Biological Sciences, Director of CETL (GENIE)) and its membership includes David Flanders, JISC Programme Manager, Professor Mark Jobling (Department of Genetics), Dr Jayne Carroll (Director Institute for Name Studies, University of Nottingham), Mary Visser (Director of IT Services) and Dr Nick Tate (Senior Lecturer, Department of Geography).

Project Plan Post 6 of 7: Projected Timeline, Workplan & Overall Project Methodology

Work Plan The high level workplan is outlined below. Listed against each workpackage are the initials of the principal team member(s) responsible for its delivery.

Months ð
Workpackage ò                       
02
11
03
11
04
11
05
11
06
11
07
11
08
11
09
11
10
11
WP1 - Set-up and governance (DC)

Project set-up, induction and PID









Steering and project group meetings









JISC Programme level activity and reporting to funder









WP2 – Data sources (AB, OB, LG)

Extend coverage of PAS data









Additional data sources - establish detailed requirements and investigate feasibility for 2 new sources.









Extract, clean, transform and load new data sources









Update data glossary and system documentation









WP3 – Update HALOGEN data management plan (DC)

Review and update DMP









WP4 – Data Extraction Tool (AG, MW)

Establish and document requirements









Evaluate tools and select preferred supplier









Procure and produce implementation plan for preferred tool/supplier









WP5 – Web Enquiry Facility (OB)

Establish requirements









Investigate options and feasibility of delivery









Document and publish findings









WP6 – Develop & Disseminate HALOGEN Case Study (DC, OB, AB)









Prepare case study re:community syntheses input









Disseminate to stakeholders









WP7 - Project Evaluation (DC)

Develop interim and final project reports (see evaluation milestones below)









WP8 – Dissemination (DC, OB, AB)

Maintain project blog, halogen web site and run briefing sessions











Project Approach

When identifying and evaluating specific tools and technologies, open source solutions will be considered alongside those which are already licensed by the University, the aim being to reduce the ‘barrier to entry’ for other institutions wishing to adopt the approaches used by UoL.

The evaluation will involve some desk based research but will be heavily biased towards the building of 'prototypes' using different tools.