Human studies are one of the most important sources of evidence for the advancement of medicine. Human studies data and metadata are dispersed in institutional review board (IRB), clinical trial management, trial registry and other systems. Sharing such data and metadata would facilitate large-scale data mining and synthesis. The most feasible data sharing approach is to “federate” queries over locally controlled databases, whose metadata are standardized to a common model of clinical research. The Human Studies Database (HSDB) Project is a consortium of CTSA research institutions that has been developing semantic and data sharing technologies to federate descriptions of human studies design using a generalizable approach to data access based on the Ontology of Clinical Research (OCRe), which provides the reference semantics. We describe the use of an XML Schema Document (XSD) interchange format with embedded references to OCRe as a practical use of OCRe for data acquisition and federation to support scientific analysis of human studies data by clinical researchers.
Simona Carini, MA1*, Samson W. Tu, MS2, Landon T. Detwiler, MS3, Karl Burke, MS4, Jim Brinkley, MD, PhD3, Ida Sim, MD, PhD1, for the Human Studies Database Project
1University of California, San Francisco, CA;
2Stanford University, Stanford, CA;
3University of Washington, Seattle, WA;
4Johns Hopkins University, Baltimore, MD
The current goal of the HSDB project is to federate human study administrative and design information using OCRe as the reference formalization of the necessary concepts and relationships. OCRe is an OWL 2.0 ontology organized as a set of modular components. OCRe models the entities and relationships of human studies. OCRe_ext defines new classes and properties in terms of OCRe constructs to serve specific needs of HSDB. To add annotations to generate HSDB_XSD (see below) we defined HSDB_OCRe, which imports OCRe_ext. After exploring various approaches to manual and automated instantiation of OCRe-compliant study data (including UML, XSD, RDF), we developed an XML Schema. XML is easy to use, widely understood, and editing tools are available. Data from relational databases are easily exported to XML. This makes for a reasonably low overhead for adoption by participating institutions, none of which use OWL or RDF in production systems.
We identified a set of administrative and study design data elements for HSDB federation. We defined an XSD schema where data elements and types are indexed through their IRI to classes, properties and value sets in OCRe. We developed a data model extractor to automatically derive the schema HSDB_XSD from HSDB_OCRe in OWL. Given the XSD schema, we used oXygen editor to generate a sample XML file then used it to guide the manual instantiation of studies based on protocols at two institutions (JH and UCSF), and to create a mapping from another institution’s (Rockefeller) e-IRB system’s SQL database to automatically acquire data. We used XSL Stylesheet Transform (XSLT) to map and transform selected data elements from studies in the ClinicalTrials.gov register (in XML) into HSDB_XML files.
After posting HSDB_XML study instances to the web, we issued federated queries using Query Integrator, a web-based application that supports distributed queries in various languages (e.g., XQuery, SPARQL) over multiple web-based sources and resources (e.g., OCRe and SNOMED-CT on BioPortal). We built and chained complex queries over the federated instance data that exploit OCRe’s logical structure and SNOMED’s taxonomic hierarchies to execute more methodologically and clinically precise queries over institutional and ClinicalTrials.gov data (for examples, see: https://hsdbwiki.org/index.php/Queries#Demo_QI_queries).
Using OCRe as reference semantics and XML technologies, we developed an informatics infrastructure to allow human study data acquisition into HSDB and federated querying.
This publication was made possible by Grant Numbers RR026040, and UL1RR025005 (Johns Hopkins), UL1RR024131 (UCSF), and UL1 RR 025014 (UW) from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH) and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.
Sim,I. et al. (2010) The Human Studies Database Project: federating human studies design data using the ontology of clinical research. AMIA Summits Transl Sci Proc., 1, 51-5.
Brinkley,J.F. and Detwiler,L.T. (2012) A Query Integrator and Manager for the Query Web. J Biomed Inform., in press.