IUSSP-CODATA Working Group on FAIR Vocabularies

Description

This joint IUSSP/CODATA panel aims to contribute to making data “Findable, Accessible, Interoperable, and Reusable” (FAIR) in the area of population research. Developed in collaboration with Simon Hodson (Executive Director of the Committee on Data (CODATA) of the International Science Council), its target is the development of machine actionable vocabularies, which will vastly simplify the task of merging or combining information across data sets, i.e., to make them easily interoperable.

Programme of activities

Launch event for the working work's final report - Monday 12 June, 12:00-13:30 UTC (5:00 Los Angeles / 8:00 New York / 9:00 Rio de Janeiro / 14:00 Paris / 14:00 Cape Town / 17:30 New Delhi / 20:00 Shanghai / 22:00 Canberra)

Read the report on FAIR Vocabularies in Population Research

Rationale and work envisioned

A growing movement aims to make data “Findable, Accessible, Interoperable, and Reusable” (FAIR). Population research is an empirically focussed field with a long tradition of widely shared, easily accessible data collections. The FAIR Principles point to ways that this tradition can be enhanced by taking advantage of emerging standards and technologies. This Panel will focus on the development of FAIR Vocabularies for demographic data, which is an essential step in making data reusable and interoperable.

FAIR vocabularies yield benefits when data from different sources must be combined. Consider the most basic variable in demographic analysis: age. OECD has a list of 643 age categories, and the UN Population Division copes with more than 1100 age groups. If the meanings of variables in a dataset are only available through human-readable documentation, like a pdf, harmonizing data from two providers will remain a tedious manual process. However, if the age categories are linked to persistent identifiers in machine actionable metadata, age groupings can be harmonized by software. If these operations are performed across dozens of variables in hundreds of data sources, enormous amounts of human time will be saved. As a consequence, combining information across data sets becomes significantly more feasible, greatly enhancing their comparability and reuse.

In cooperation with CODATA, this IUSSP scientific panel will build upon the work of the FAIR Vocabularies Group, who recently released “Ten Simple Rules for making a vocabulary FAIR”. Most of their guidance is straightforward, like "Determine the governance arrangements and custodian responsible for the legacy vocabulary." But some steps require specialized expertise in standards like Simple Knowledge Organisation System (SKOS) or the Web Ontology Language (OWL). In the longer term, FAIR vocabularies also need to be maintained, which requires sustainable institutions with the capacity to support necessary technologies. The Panel will seek advice from members of the FAIR Vocabularies Group and experts from other scientific domains to evaluate alternative strategies (e.g. centralized versus federated) and software.

The ultimate goal of this initiative is to make demographic data more interoperable by publishing controlled vocabularies that can be found and acted upon by software. This will reduce the costs of merging data from multiple sources, especially for researchers from other disciplines who want to use population data. We aim to advance this effort by working with three to five partners in international organizations and academia to convert their existing vocabularies to FAIR principles. These case studies will highlight the value of this approach and identify the difficulties. We will learn where additional technical development is needed and when community involvement through IUSSP and other organizations is beneficial.

The panel began meeting in May 2021 completed in May 2023 delivering its final report.

Chair

Members

IUSSP Secretariat

Council Liaison