The FAIRsFAIR task group is looking for feedback and comments from the scientific community on the initial recommendations tabled in report D2.2 FAIR Semantics: First recommendations. The full report is available on Zenodo here https://zenodo.org/record/3707985. Comments and suggestions can be added directly as a comment to this page.

NB: you need to login in order to leave your comments

 

P-Rec. 1: Use Globally Unique, Persistent and Resolvable Identifier for Semantic Artefacts, their content and their versions

Page 20 in the report

F1

Description

Semantic artefacts are typically structured text files. They are de facto digital objects and should be unambiguously identified by globally unique, persistent and resolvable identifiers (GUPRI). In the context of a web of FAIR data, these identifiers should be resolvable and support the retrieval of both the semantic artefact itself and also its metadata (see Rec. 2 regarding metadata).

As shown in fig. 1 in the report, semantic artefacts are composite digital objects requiring at least three levels of identifiers: one for the semantic artefact itself, one for its content and one for the metadata (including both the global metadata and the metadata associated with the content). The latter is described in the following recommendation (Rec. 2). Finally, semantic artefacts are living digital objects by nature, evolving over time. Another specific GUPRI should be added to track the different versions of semantic artefacts allowing to get the latest version but also to have access to previous version in use in existing information systems.

As Web-based documents, semantic artefacts are usually identified by globally unique (i.e. two different files cannot have the same identifier) and resolvable identifiers. In the scope of WWW, usually semantic artefacts are represented by two key URIs: the URI pointing to the file and the URI namespace of the semantic artefact. As an example, consider a semantic artefact hosted on github which has a local namespace that points to the content of the artefact. This goes against the principle of uniqueness of the identifier. To solve this issue, the namespace and the file URI can be joined through HTTP redirects. This doesn’t address the issue of persistence. To cope with these issues, the Web community developed the concept of Persistent URLs30 and implemented dedicated servers guaranteeing the persistence of the URL and any associated necessary HTTP redirects. The value of this approach has been identified in the Biomedical domain by the OBO foundry which explicitly recommends the usage of PURL for identifying semantic artefacts within its ID policy (see Existing recommendations). However it has demonstrated a limitation when in 2016 the central PURL server has been stopped due to lack of funding. Fortunately the system has been integrated into a more perenne organisation, the Internet Archive31. Finally, the Industrial Ontology Foundry recommends using IRI (enabling the use of Unicode for defining web addresses) that are registered in their system.

Another alternative to implement GUPRIs is the use of Persistent IDs based on the handle system32. The handle is a Web-based identification system using a prefix which identifies a "naming authority" and a suffix which gives the "local name" of a resource that can be resolved through a handle server which will provide direct access to the associated metadata through a redirect to the landing page corresponding to the record for human consumption. This approach is currently being investigated and promoted through the scientific data community (RDA, EOSC, ...). A particular kind of handle i.e. the DOI could be used to identify a particular which should support citations (see Rec. 17). However, a limitation of the DOI is that it only refers to the landing page which represents a dead-end for machines. One of the limitations of the PIDs compared to URL/URI is the lack of control of the practitioners. PID are attributed by international organisations which require you to pay a fee for minting new PIDs. In a sense this business model allows to foster the perennity of the Ids. However, it does require to use a dedicated service to mint and affect new PIDs.

As discussed as introduction, these identification systems should apply to the semantic artefact but also to its content. Indeed, semantic artefacts can be considered as datasets of concepts and relations. Therefore, in this context, each element of the semantic artefact should also have an associated GUPRI. Both OBO Foundry and Industry Ontology Foundry are proposing to use special conventions to define URI based identifiers (see BP-Rec. 1 and BP-Rec 2).

Finally, a unified identifier schema should be used to identify each version of semantic artefact. This can be done using versioned URI as proposed by OBO Foundry. Using GUPRI for the different version allows information systems to retrieve automatically the latest version and older versions of the semantic artefact.

This recommendation emphasizes the need for reliable and persistent identification systems without any technical constrains.


30 PURL https://www.oclc.org/research/themes/data-science/purl.html

31 Internet Archive https://archive.org/

32 Handle System https://en.wikipedia.org/wiki/Handle_System

Related recommendations

  • W3C Data on the Web - Best Practice 9: Use persistent URIs as identifiers of datasets namespace33
  • OBO Foundry - Principle 334
  • OBO Foundry - Identifier Policy35
  • OBO Foundry - Principle 436
  • Industrial Ontology Foundry - principle 11 IRI and identifier space
  • Industrial Ontology Foundry - principle 12 Identifier and naming conventions
  • EOSC PID policy recommendation (Hellström et al., 2019)

33 W3C Data on the Web - Best Practice 9 https://www.w3.org/TR/dwbp/#DataIdentifiers

34 OBO Foundry principle 3 http://www.obofoundry.org/principles/fp-003-uris.html

35 OBO Foundry ID policy http://www.obofoundry.org/id-policy

36 OBO FOundry principle 4 http://www.obofoundry.org/principles/fp-004-versioning.html

Stakeholders

Practitioner and Repository

 

2,012 Read