Data Architecture Update – February 2021

Its been over a month since I shared the progress in NHS Digital Data Architecture. During that time we’ve made good progress with the Terminology Server moving into private beta, our metadata in the HDR UK gateway seeing a significant uplift in quality and our Mauro Data Mapper getting close to implementation.

Terminology Server

The Terminology Server rollout has just passed an important milestone as it entered the private beta phase. We have also now added some additional identity providers to the system, so users will be able to register and log in using their NHS Mail, Microsoft 365, GitHub, Google and LinkedIn accounts. The Private Beta phase will run for at least 3 weeks and will include use of the system by colleagues in NHS Digital as well as from other areas of the NHS, in addition to some external organisations such as Clinical Architecture, Facts and Dimensions, Clinisys, Graphnet and Oxford University.

The start of the private beta followed a “Learnathon” hosted by INTEROPen sharing how a terminology server can take the complexities of codes and maps with the UK Health IT community. Around 70 people joined us for this day. The “Learnathon” was an educational event to prepare developers wishing to take part in the INTEROPen Terminology Server Hackathon on 24 & 25 February.

Michael Lawley gave a delightful session showing how Dr West could maintain a record of Dorothy Gale’s “accident caused by a tornado” and the clinical finding that she “has imaginary friend”. This demonstrated how codes could be quickly found and validated. He then demonstrated how this history can be translated into a classification for commissioning or epidemiology.

Tracey Francis then shared the experience in NHS Wales providing a direct care interface for use in outpatient clinics. Tracey showed how NHS Wales were using their Terminology Server and the journey they went on to get an easy-to-use clinical interface, speeding up the time to record clinical circumstances. One benefit highlighted was the low level of training required for users of these outpatient forms.

Richard Kavanagh of Graphnet shared his experience using a terminology server to improve the quality of the diverse data received. He highlighted the variety of information received (for example a guardian’s consent sent as a record of an immunisation). He highlighted how diverse data could be curated to a canonical form by exploiting the power of SNOMED CT using a terminology server.

Richard’s talk was followed by Charles Gutteridge who demonstrated the analytical power of using SNOMED CT and shared his experiences helping intervention radiologists in Chelsea and Westminster to identify patients who needed closer follow-up suing the power of SNOMED CT expressions.

The “Learnathon” completed with three breakout sessions covering simple UI design using JavaScript, validation and translation of codes and descriptions, and the use of SNOMED CT expression constraint language.

There’s still time to join the two-day follow-on INTEROPen “hackathon” – see Terminology Server Hackathon – INTEROPen.

Platinum and gold medallions for our Research Data Sets

I’m delighted that we have increased the quality of the metadata we share through HDRUK Innovation Gateway | Homepage (healthdatagateway.org). For our National Core Studies data sets, the majority rated in the Platinum or Gold categories. This makes discovering our data sets much easier. The areas assessed are metadata completeness and validity.

This has led us to a challenge around Digital Object Identifiers (DOI), one approach to implementing the HM Government policy to use persistent resolvable http URLs as identifiers for resources. In NHS Digital we had no clear use case for DOI, but following consultation with a range of stakeholders including the British Library we are looking at how we could include DOIs in our standards stack for metadata, including the NHS Data Dictionary. The DOIs will be created and maintained for data set specifications, data models, data classes / groups and for data elements but not for individual codes or descriptions. An implementation pattern will be reviewed at the next Data Design Authority meeting on 2nd March.

Mauro Data Mapper

We are now in the last stages of refining the Mauro Data Manager solution (see metadata-catalogue.org) that we are using to publish the Data Dictionary. These developments are to allow us to author the Data Dictionary in Mauro. I was reviewing the list of new capabilities The scale of the progress impressed me. I expect to be complete by the end of March.

Open API for all functionality
Joining up the key models – UK FHIR Core, PRSB Core Information Standard and derivatives, SNOMED CT and ValueSets based on SNOMED CT, ICD-10, OPCS 4, dm+d (in SNOMED CT and XML forms), NHS Data Dictionary including all its data sets
SPARQL endpoint
Ability to ingest and interpret models and reference data from a wide range of sources including SQL Databases (SQL Server, Oracle, mySQL, Postgress etc), XML and JSON, CSV, any JDBC connection, AWS Glue
FHIR Terminology server client and authoring for simple code lists and value sets
OAuth for access delegation (sign on)
Flexible publication, including the Data Dictionary
Flexible metadata profiles for models, classes, data elements and rules, including the HDR UK data set profile.
Analytics through PowerBI
Federation with other Mauro catalogues, allowing open and transparent sharing between health and care bodies

I anticipate we will be able to make our models available over the Mauro API and SPARQL endpoint in the spring.