Linked Open Data and Persistent ID’s

Axiell has implemented a solution for the management of Persistent Identifiers (PIDs) in Axiell Collections. Such a PID consists of a unique URI (Uniform Resource Identifier), essentially a persistent, external identifier for your resource (usually registered with a third party) and a publicly accessible URL (Uniform Resource Locator) to your internal digital resource like a database record, a digital media file or digital document in Axiell Collections (so never an actual physical object), to permanently identify it.
A database record in Axiell Collections consists of metadata about a physical catalogue object or a more abstract concept like a thesaurus term, a creator's name or an exhibition, for example. To register a PID to get a publishable URL to your digital resource is only recommended if the resource is worth sharing because it contains information that is useful to others outside your organisation. A created PID will be saved in the record to which it refers. PIDs from external sources can be registered in your records too.

The support for PID forms the basis of making the data of an organisation “Linked Open Data” (LOD). Each record that is made “Linked Open” will have its own unique PID that can be used to refer to it. This implementation covers three components that we’ll describe in this document:

1.A URI data type
2.PID fields to store URIs and associated URLs.
3.Management of registration of PIDs with an external broker: Handle/SurfSARA - https://www.surf.nl/en/data-persistent-identifier-data-always-findable-by-permanent-references

The application changes for LOD (PID fields and such) have been implemented from Collections application version 5.0. The core software functionality (the ability to use fields of the URI type at all) in Axiell Collections has been introduced in version 1.6. The application changes could potentially also be ported back into customer specific and older application versions if needed. Do note that the application implementation has changed in model application 5.2 (more about this later).

While this implementation focusses around creating and managing PID’s, to make your metadata “Linked Open” also requires exposing records (their data) via their PIDs in a format which other (usually) systems can understand. Exposing records is usually done in an RDF format (Resource Description Framework) for which there are various standards. An RDF implementation has not yet been done as part of this implementation.

Making data “Linked Open” is not only relevant for objects in a collection, but also for descriptors and standardised terminology that is used to describe these objects. When open vocabulary resources are used to describe an object, such as the Getty AAT, the PIDs of the used terms/concepts can be stored in the Axiell Collections database. When publishing object records as Linked Open Data, these records will then contain references to Getty AAT concepts. This not only means that it’s clear what the meaning and context of each term/concept is, but it also allows for the creation of links with data from other datasets that are using the same identifiable concepts/terms as descriptors. E.g. if a keyword that is used to indicate the creation location is London, that would be an ambiguous descriptor as there are multiple places in the world that are called London. But if besides (or even instead of) the term London it’s source ID would be listed (e.g http://vocab.getty.edu/tgn/7011781) then it’s clear which London is meant here. Therefore it’s relevant to be able to manage PIDs in all entities in the system. In some cases, the PIDs are the IDs from your organisation, in other cases PIDs from external parties are stored in the system.

The URI field data type

A field in an Axiell .inf (database configuration) file can be assigned the URI data type. This type of field is used to register both URIs and URLs for PIDs. The URI data type is covering the following:

The content of a URI type field will be validated according to the .NET URI object rules as described in https://docs.microsoft.com/en-us/dotnet/api/system.uri.-ctor?view=netframework-4.8#System_Uri__ctor_System_String. URIs usually have the form of a URL (even though they're not meant for normal public use) so if the URL is valid, it’s also a valid URI.
A field of the URI data type has the option to be associated with another field of the URI data type. One will be defined as the URL and the other will be the URI. This is especially useful if PIDs are managed through an external broker like Handle/SurfSARA. The external URI will point to the broker that in turn will refer to your internal URL. The idea behind this is to really keep the URI persistent as it does not include an internal domain name of your organisation. So if the internal URL has to change because of, for instance a change in the name of the organisation, the new internal URL can be associated with the external URI (by entering the new URL in the relevant records and using the Persistent ID submission task with the Update action for those records). The internal URL could e.g. link to the Axiell WebAPI to present the record in a technical format (maybe RDF) which can be read/harvested by third-party software or even in an HTML presentation.
The URI data type also allows for the definition of an Initial value format which will be entered (and completed) in the field automatically upon the creation of a record.

A field definition of the URI data type gets a URI field properties tab where you can enter the relevant properties. See below screenshots for an example of two associated PID fields. The PID_data.URI field for example, could have a setup similar to the following (if field tag ZZ in the record contains a unique record identifier like a GUID, as is the case from model application version 5.0):

URIfieldPropertiesTab

From 5.0, the mentioned field tag ZZ is filled by a storage adapl (or actually by code in force.inc) with a GUID (a globally unique number).

The PID_data.URI field could also get an alternative setup like the following (if instead of a GUID the current .inf name plus the record number - priref - would create a unique record ID, since just a priref wouldn't be enough for uniqueness across all your database tables):

URIfieldPropertiesTab2

For associated PID_data.URL field on the other hand, the setup could be similar to the following:

URIfieldPropertiesTab3

Note that the URI type property refers to the type of the currently defined field, not to the Related URI field.

The PID fields in Axiell Collections

In the standard Axiell Collections application (version 5.0 and up), PID fields have been implemented for all entities, so not only for the objects database but also for e.g. the Thesaurus and Persons & Institutions databases. Versions 5.0 and 5.1 differ from version 5.2 and up.

Model application 5.0/5.1

For the objects database we have implemented three groups of PID fields: Work PID, Data PID and Other PID’s. See the screenshot further down in this paragraph. The PID fields have been configured on a screen tab called PID, so it’s easy to switch off for users that don’t need to see these fields. The URL and URI fields are of the URI data type and are inter-linked.

Only for each newly created record, a URI and URL are automatically generated by Axiell Collections when the PID fields have been configured properly (which may not be the case for all types of PID fields, dependent on your application customization): in existing records you could add a URI and URL manually. (To add URIs and URLs to existing records in bulk, some ADAPL procedure would have to be written first, maybe in the form of a task set up per data source.)

Setup without Designer - While this automatical generation of URI and URL is core software functionality when the PID fields have been configured as described in the previous paragraphs, it can (optionally) also be implemented via a special after-storage adapl that reads a text file which contains default URI/URL settings and only runs if the relevant fields are still empty. This way, this functionality is also available for customers that do not have access to Axiell Designer. In the screenshot below the base URI for the Work-PID is:

https://epic5.storage.surfsara.nl:8003/api/handles/21.T12995/

The first part of the configured suffix (in the text file) is collect. while the second part is the record number/priref (tag %0), which together with the base URI make a unique URI. This is what a sample text file looks like (lines which start with * are comments):

* PIDtext.txt
* settings for PID creation
* use no space between a line number and the text behind it
*1 base uri, including institution specific code
1https://epic5.storage.surfsara.nl:8003/api/handles/21.T12995/
*
* table specific settings
*
* collect:
*
*2 suffix for uri for records in collect table, e.g. collect.
2collect.
*3 field, unique part of uri of collect table
3%0
* settings for URL creation in collect table
4https://ais.axiell.com/collect.
*5 field for unique part of url
5%0
*

The base-URI on line 1 has been provided by the Handle service of SurfSARA. For this example, the organisation code/prefix provided by Handle.net is: 21.T12995. This prefix was provided by Handle.net as part of a one-off registration process.

The settings of this text file must be done by the customer (or by Axiell) before the PID-implementation is taken into production.

 

An example of a filled-in PID screen tab in a model 5.0 application:

PIDscreen

It's up to you, as an application manager to decide which PID types should be filled in automatically upon record creation (then the relevant PID fields should have been properly configured) and which should remain empty to be filled in optionally and manually by the user when appropriate.

The PID work and PID data field groups are non-repeatable fields while the PID other field group is repeatable indeed. The PID data fields should be used to identify the current record containing metadata about an object or descriptor. In the object catalogue, the PID work fields could be used to identify a similar metadata record for the same object in the database of another organisation that maybe has the object on loan indefinitely, for example. In the Multimedia documentation data source on the other hand, you could use the PID work fields to register a PID for the digital media file itself. PIDs from a previous system that are still current should be entered into either the PID work field or the PID data fields.
In the PID other fields one can record URIs from a different organisation describing the same object (such as those from WikiData if they describe the same object). To clarify the URL/URI you can enter a short comment in the Qualifier field.

One may notice some ambiguity in the purpose of the work PID, so that's why in model application 5.2 and up the work PID fields have been removed in all data sources except for the Multimedia documentation data source where they have been renamed to URI/URL for the digital media.

Model application 5.2 and up

Because of some ambiguity in the purpose of the "work PID" in model application versions 5.0 and 5.1, the available PID types per data source have changed in model application 5.2 and up: work PIDs are no longer present in 5.2. Now, on the PIDs and other identifiers screen tab, all data sources have a non-repeatable URI/URL for the metadata in this record field group for the PID which identifies the current record containing metadata about an object or descriptor. PIDs from a previous system that are still current should be entered here too.

All data sources also have a repeatable Other identifiers field group for all other PIDs and non-URI IDs:

Any URIs here are always third-party URIs so there's no matching URL field because you won't have an internal URL to that resource. In all data sources, this other URI field can be used to identify a similar metadata record for the same object or descriptor in the database of another organisation, the information source where the content in this record was (partially) derived from, for example.
The Non-URI ID can be used for all IDs (your own or third-party) which are not URIs, but identify (or used to identify) this record too. This can be a record ID from a now deprecated system, an old record number or object number for example.

For both URIs and Non-URI IDs you can also indicate the Source of the URI or ID, like the AAT or WikiData for example. This is a linked field to the Thesaurus, with the domain PID source.

To clarify the URI or ID you can enter a short comment in the Qualifier field. The Publishable checkbox is for your own information only, to indicate if an ID or URI is (potentially) publishable: it actually does nothing.

PIDscreen52-1

Because the PID screen in all data sources (including the Thesaurus, the Geographical thesaurus and Persons and institutions) now contains the Source and Non-URI ID fields, the old Source, Number/URI, Date and Time fields which were present in records in these three data sources before have been removed from there in application version 5.2 because of redundancy.

Only in the Multimedia documentation data source there is yet another PID field group, the URI/URL for the digital media field group, to register a PID for the digital media file itself, if you want. So you can have a PID for this record (the resource for the metadata of the media file) and a separate PID for the digital resource that is the media file.

PIDscreen52-2

Only for each newly created record, a URI and URL are automatically generated by Axiell Collections when the PID fields have been configured properly (which may not be the case for all types of PID fields, dependent on your application customization): in existing records you could add a URI and URL manually or via the Assign PIDs task.

To add URIs and URLs to existing records in bulk for marked records (without ever having to configure the relevant fields in Designer), an Assign PIDs task is available in all data sources with PIDs. Prior to using this task, appropriate URL and URI formats need to be set in the TaskAssignPIDs.txt file in the \texts sub folder of your Axiell system. Please read the comments in that text file for more information about the formats you need to enter if you’d like to use the Assign PIDs task.

The task fills empty URI and empty or filled URL fields with the configured values. If the Replace any existing URLs checkbox on the task screen is marked, existing URLs will be replaced too. This is handy when placeholder URLs were used at first. (Existing URIs won't be replaced as URIs are permanent and should never need to be changed after you've submitted them to the handler. (Before submission to the handler you could still change them with a search-and-replace action though.)

In most data sources the task screen will look like this:

DSAssignPIDsTaskScreen1

Only in Multimedia documentation the task screen will look as follows, offering the extra choice to select the PID types you’d like to process with this task (since media records have two types of PIDs). You can mark both PID type checkboxes or just one of them, but don’t leave them both empty.

DSAssignPIDsTaskScreen2

After running the Assign PIDs task for records which did not have a URI yet, you must run the Persistent ID submission task and use the Create option to create new submission records.
After running the Assign PIDs task for records which already had a URI, you must run the Persistent ID submission task and use the Update option to have the handler associate the new URL with the existing URI. (URIs can't be "updated".)

It's up to you, as an application manager to decide which PID types should be filled in automatically upon record creation (then the relevant PID fields should have been properly configured) and which should remain empty to be filled in optionally and manually by the user when appropriate.

The PIDs and other identifiers screen tab can of course be hidden or switched off for users that don’t need to see these fields.

Management of PIDs via Handle/SurfSARA

To ensure PIDs are truly PIDs, there are service providers that offer unique URLs that can be used as URIs. As these URIs don’t contain the domain name of the organisation, they don’t need to change if the organisation changes its name. It’s these external URIs that are exposed/published. However, these URIs need to be associated with ‘real’ URLs to make sure that when the external URI is called, there is a valid response. This is where service providers like SurfSARA come in. They can host the URI and its association with the real URL. This association can be managed via the SurfSARA API. It’s this API implementation that we have done in Axiell Collections. When the URI is called, SurfSARA automatically refers to the internal URL.

To manage PIDs via SurfSARA, the organisation first needs to get a so called ‘prefix’. This is a unique code for the organisation that is added to the URI and is the identifier of the organisation for SurfSARA. The one-off registration process at SurfSARA is quite complex and involves multiple steps. Most Axiell customers will not be able to do this by themselves. After this one-off process, the SurfSARA API can be used.

The organisation prefix can be obtained through Handle.net for a one-off fee. The services of SurfSARA are charged annually.

The PID implementation in Axiell Collections includes features that enable the collections manager to register any automatically or manually entered URI/URL combination at SurfSARA. For this, a task Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) is present in all data sources. This task is run by the user on a selection of records that is ready for publication (i.e. containing a valid URI/URL combination). The task creates PID submission records in a data source called Registration/Synchronisation Queue Permanent IDs; for each URI that needs to be registered a record is created or updated in this table. Possible actions to choose from in the task screen are: Create, Update, Delete. After a Create, these registration queue records get an initial status ‘to be synchronised’. They are automatically linked to the catalogue records so that the status is visible in those records.

A so-called trigger service (the Axiell Change Tracking Service) has been implemented to run directly on the SQL table. The plug-in that Axiell has developed for this trigger service, detects that records are present in the Registration/Synchronisation Queue Permanent IDs data source that need to be registered at SurfSARA. This plug-in communicates with the SurfSARA API for each record and performs the action as defined (Create, Update, Delete). When the registration is successful, the status in the registration record is updated accordingly. If the action fails, this is also recorded in the registration record.

The most common action would be to Create a URI/URL combination. If URLs change, e.g. because of a domain name change as a result of a change in the name of the organisation, the existing URI/URL mapping can be Updated to replace the old URL with the new URL.

The Delete action (to delete both the URI and associated URL from the broker's registry) would normally not be used as this is rather contradictory to the idea of persistency. In a test environment, this action may have a use case though.

Actions performed via the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task affect the URIs and/or URLs at the broker and change the submission records in the Registration/Synchronisation Queue Permanent IDs data source, but they don't change the URIs or URLs registered in the catalogue records themselves. So an update of any PID fields in catalogue records must be done manually, by search-and-replace or by some custom task. Only a link to a newly created submission record will be added to the processed catalogue records.
Actions chosen in the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task can only actually be performed when certain conditions have been met: for a Create there should not yet be an action history other than a Delete action; for an Update, the history should be either Create or Update; for Delete the history should be either Create or Update. Both a URI and URL should always be filled in before an action is possible. An error message will be displayed if these conditions are not being met.

An example of a PID synchronisation record:

PIDsynchronisationRecord

Workflow

From the user perspective and considering that the initial configuration has been done, the workflow to register new PIDs would be as follows:

1.The user creates catalogue records. The URI and URL fields (and possibly any GUID field) are automatically filled when you save the record.
2.When records are ready for the PIDs to be published, the user selects these records and marks them in the result set.
3.In the result set, the user starts the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task from the Features drop-down in the Result set context toolbar..
4.This task opens a new window enabling the user to choose the action that needs to take place for the selected records: Create, Update or Delete. The default and most used action would be Create.
 
PIDregistrationTask2
5.After confirmation with OK, a message will appear confirming that the selected records have been put in the registration queue. From model application 5.2, any skipped records and the reason for being skipped will be reported too.
6.The trigger service immediately starts processing the submissions with SurfSARA and updates the registration/submission status which can be seen at the top of the PIDs and other identifiers screen. If you don't see the changed status in the current record directly, try reloading it by switching to a different record and back.
7.The URIs are now ready to be used and they will refer to the associated URLs. Simply click the URI in the Collections record and the associated URL will be retrieved from SurfSARA and opened in a new browser tab. (You can also copy the URI and paste it in the address bar of a browser tab manually to get the same result.)

Notes

It’s no use entering a URI or URL in a new record manually, as they will be overwritten when you save the record. (In an already existing record you can edit these fields though.)
The automatic assigning of any GUID doesn’t depend on whether the record is new or existing but on whether the field is empty: only if it is empty will it be assigned upon saving the record. The automatic URI and URL do depend on whether the record is new.
When you copy a record which already contains PIDs and possibly a GUID, none of these fields and other fields on the Permanent ID’s screen tab will be copied along because the GUID and PIDs need to be unique. (These fields are or should be non-exchangeable when it comes to copying records.) Upon saving the copied record will these fields be filled with new values.
When you derive (move) a record from the Internal object catalogue to the External object catalogue or vice versa, initially all data from the Permanent ID’s screen tab will be moved along (because it’s a move, not a copy), but since Collections considers it to be a new record, the record will get new URIs and URLs. Any GUID will remain the same.