Linked Open Data and Persistent ID’s
Axiell has implemented a solution for the management of Persistent Identifiers (PIDs) in Axiell Collections. Such a PID consists of a unique URI (Uniform Resource Identifier), essentially a persistent, external identifier for your resource (usually registered with a third party, although not necessarily) and a publicly accessible URL (Uniform Resource Locator, although not required when using ARKs) to your internal digital resource like a database record, a digital media file or digital document in Axiell Collections (so never an actual physical object), to permanently identify it.
A database record in Axiell Collections consists of metadata about a physical catalogue object or a more abstract concept like a thesaurus term, a creator's name or an exhibition, for example. To register a PID to get a publishable URL to your digital resource is only recommended if the resource is worth sharing because it contains information that is useful to others outside your organisation. A created PID will be saved in the record to which it refers. PIDs from external sources can be registered in your records too.
The support for PID forms the basis of making the data of an organisation “Linked Open Data” (LOD). Each record that is made “Linked Open” will have its own unique PID that can be used to refer to it. This implementation covers three components that we’ll describe in this document:
1. | A URI data type. |
2. | PID fields to store URIs and possibly associated URLs. |
3. | Management of registration of PIDs with an external broker, Handle/SurfSARA (https://www.surf.nl/en/data-persistent-identifier-data-always-findable-by-permanent-references), or without third-party PID registration, using ARKs. We'll first discuss the Handle/SurfSARA implementation as it is what the original implementation in Collections model applications 5.0 up to at least 6.0 is meant for. Using ARKs instead has its advantages though and if you're starting out with PID registration, this method is certainly worth considering and can be applied in current Collections model applications too. For existing Handle/SurfSARA users, a transfer to ARKs might be a more complicated process. We'll discuss the implementation of ARKS further down. |
The application changes for Handle/SurfSARA LOD (PID fields and such) have been implemented from Collections application version 5.0. The core software functionality (the ability to use fields of the URI type at all) in Axiell Collections has been introduced in version 1.6. The application changes could potentially also be ported back into customer specific and older application versions if needed. Do note that the Handle/SurfSARA application implementation has changed in model application 5.2 (more about this further down).
While this implementation focusses around creating and managing PID’s, to make your metadata “Linked Open” also requires exposing records (their data) via their PIDs in a format which other (usually) systems can understand. Exposing records is usually done in an RDF format (Resource Description Framework) for which there are various standards. However, an RDF implementation has not yet been done as part of this implementation.
Making data “Linked Open” is not only relevant for objects in a collection, but also for descriptors and standardised terminology that is used to describe these objects. When open vocabulary resources are used to describe an object, such as the Getty AAT, the PIDs of the used terms/concepts can be stored in the Axiell Collections database. When publishing object records as Linked Open Data, these records will then contain references to Getty AAT concepts. This not only means that it’s clear what the meaning and context of each term/concept is, but it also allows for the creation of links with data from other datasets that are using the same identifiable concepts/terms as descriptors. E.g. if a keyword that is used to indicate the creation location is London, that would be an ambiguous descriptor as there are multiple places in the world that are called London. But if besides (or even instead of) the term London it’s source ID would be listed (e.g http://vocab.getty.edu/tgn/7011781) then it’s clear which London is meant here. Therefore it’s relevant to be able to manage PIDs in all entities in the system. In some cases, the PIDs are the IDs from your organisation, in other cases PIDs from external parties are stored in the system.
The URI field data type
A field in an Axiell .inf (database configuration) file can be assigned the URI data type. This type of field is used to register both URIs and URLs for PIDs. The URI data type is covering the following:
• | The content of a URI type field will be validated according to the .NET URI object rules as described in https://docs.microsoft.com/en-us/dotnet/api/system.uri.-ctor?view=netframework-4.8#System_Uri__ctor_System_String. URIs usually have the form of a URL (even though they're not meant for normal public use) so if the URL is valid, it’s also a valid URI. |
• | A field of the URI data type has the option to be associated with another field of the URI data type. One will be defined as the URL and the other will be the URI. This is especially useful if PIDs are managed through an external broker like Handle/SurfSARA. The external URI will point to the broker that in turn will refer to your internal URL. The idea behind this is to really keep the URI persistent as it does not include an internal domain name of your organisation. So if the internal URL has to change because of, for instance a change in the name of the organisation, the new internal URL can be associated with the external URI (by entering the new URL in the relevant records and using the Persistent ID submission task with the Update action for those records). The internal URL could e.g. link to the Axiell WebAPI to present the record in a technical format (maybe RDF) which can be read/harvested by third-party software or even in an HTML presentation. |
• | The URI data type also allows for the definition of an Initial value format which will be entered (and completed) in the field automatically upon the creation of a record. |
A field definition of the URI data type gets a URI field properties tab where you can enter the relevant properties. See below screenshots for an example of two associated PID fields. The PID_data.URI field for example, could have a setup similar to the following (if field tag ZZ in the record contains a unique record identifier like a GUID, as is the case from model application version 5.0):
From 5.0, the mentioned field tag ZZ is filled by a storage adapl (or actually by code in force.inc) with a GUID (a globally unique number).
The PID_data.URI field could also get an alternative setup like the following (if instead of a GUID the current .inf name plus the record number - priref - would create a unique record ID, since just a priref wouldn't be enough for uniqueness across all your database tables):
For associated PID_data.URL field on the other hand, the setup could be similar to the following:
Note that the URI type property refers to the type of the currently defined field, not to the Related URI field.
The PID fields in Axiell Collections
In the standard Axiell Collections application (version 5.0 and up), PID fields have been implemented for all entities, so not only for the objects database table but also for e.g. the Thesaurus and Persons & Institutions database tables. Versions 5.0 and 5.1 differ from version 5.2 and up.
Model application 5.0/5.1
For the objects database table we have implemented three groups of PID fields: Work PID, Data PID and Other PID’s. See the screenshot further down in this paragraph. The PID fields have been configured on a screen tab called PID, so it’s easy to switch off for users that don’t need to see these fields. The URL and URI fields are of the URI data type and are inter-linked.
Only for each newly created record, a URI and URL are automatically generated by Axiell Collections when the PID fields have been configured properly (which may not be the case for all types of PID fields, dependent on your application customization): in existing records you could add a URI and URL manually. (To add URIs and URLs to existing records in bulk, some ADAPL procedure would have to be written first, maybe in the form of a task set up per data source.)
Setup without Designer - While this automatical generation of URI and URL is core software functionality when the PID fields have been configured as described in the previous paragraphs, it can (optionally) also be implemented via a special after-storage adapl that reads a text file which contains default URI/URL settings and only runs if the relevant fields are still empty. This way, this functionality is also available for customers that do not have access to Axiell Designer. In the screenshot below the base URI for the Work-PID is:
https://epic5.storage.surfsara.nl:8003/api/handles/21.T12995/
The first part of the configured suffix (in the text file) is collect. while the second part is the record number/priref (tag %0), which together with the base URI make a unique URI. This is what a sample text file looks like (lines which start with * are comments):
* PIDtext.txt
* settings for PID creation
* use no space between a line number and the text behind it
*1 base uri, including institution specific code
1https://epic5.storage.surfsara.nl:8003/api/handles/21.T12995/
*
* table specific settings
*
* collect:
*
*2 suffix for uri for records in collect table, e.g. collect.
2collect.
*3 field, unique part of uri of collect table
3%0
* settings for URL creation in collect table
4https://ais.axiell.com/collect.
*5 field for unique part of url
5%0
*
The base-URI on line 1 has been provided by the Handle service of SurfSARA. For this example, the organisation code/prefix provided by Handle.net is: 21.T12995. This prefix was provided by Handle.net as part of a one-off registration process.
The settings of this text file must be done by the customer (or by Axiell) before the PID-implementation is taken into production.
An example of a filled-in PID screen tab in a model 5.0 application:
It's up to you, as an application manager to decide which PID types should be filled in automatically upon record creation (then the relevant PID fields should have been properly configured) and which should remain empty to be filled in optionally and manually by the user when appropriate.
The PID work and PID data field groups are non-repeatable fields while the PID other field group is repeatable indeed. The PID data fields should be used to identify the current record containing metadata about an object or descriptor. In the object catalogue, the PID work fields could be used to identify a similar metadata record for the same object in the database of another organisation that maybe has the object on loan indefinitely, for example. In the Multimedia documentation data source on the other hand, you could use the PID work fields to register a PID for the digital media file itself. PIDs from a previous system that are still current should be entered into either the PID work field or the PID data fields.
In the PID other fields one can record URIs from a different organisation describing the same object (such as those from WikiData if they describe the same object). To clarify the URL/URI you can enter a short comment in the Qualifier field.
One may notice some ambiguity in the purpose of the work PID, so that's why in model application 5.2 and up the work PID fields have been removed in all data sources except for the Multimedia documentation data source where they have been renamed to URI/URL for the digital media.
Model application 5.2 and up
Because of some ambiguity in the purpose of the "work PID" in model application versions 5.0 and 5.1, the available PID types per data source have changed in model application 5.2 and up: work PIDs are no longer present in 5.2. Now, on the PIDs and other identifiers screen tab, all data sources have a non-repeatable URI/URL for the metadata in this record field group for the PID which identifies the current record containing metadata about an object or descriptor. PIDs from a previous system that are still current should be entered here too.
All data sources also have a repeatable Other identifiers field group for all other PIDs and non-URI IDs:
• | Any URIs here are always third-party URIs so there's no matching URL field because you won't have an internal URL to that resource. In all data sources, this other URI field can be used to identify a similar metadata record for the same object or descriptor in the database of another organisation, the information source where the content in this record was (partially) derived from, for example. |
• | The Non-URI ID can be used for all IDs (your own or third-party) which are not URIs, but identify (or used to identify) this record too. This can be a record ID from a now deprecated system, an old record number or object number for example. |
For both URIs and Non-URI IDs you can also indicate the Source of the URI or ID, like the AAT or WikiData for example. This is a linked field to the Thesaurus, with the domain PID source.
To clarify the URI or ID you can enter a short comment in the Qualifier field. The Publishable checkbox is for your own information only, to indicate if an ID or URI is (potentially) publishable: it actually does nothing.
Because the PID screen in all data sources (including the Thesaurus, the Geographical thesaurus and Persons and institutions) now contains the Source and Non-URI ID fields, the old Source, Number/URI, Date and Time fields which were present in records in these three data sources before have been removed from there in application version 5.2 because of redundancy.
Only in the Multimedia documentation data source there is yet another PID field group, the URI/URL for the digital media field group, to register a PID for the digital media file itself, if you want. So you can have a PID for this record (the resource for the metadata of the media file) and a separate PID for the digital resource that is the media file.
Only for each newly created record, a URI and URL are automatically generated by Axiell Collections when the PID fields have been configured properly (which may not be the case for all types of PID fields, dependent on your application customization): in existing records you could add a URI and URL manually or via the Assign PIDs task.
To add URIs and URLs to existing records in bulk for marked records (without ever having to configure the relevant fields in Designer), an Assign PIDs task is available in all data sources with PIDs. Prior to using this task, appropriate URL and URI formats need to be set in the TaskAssignPIDs.txt file in the \texts sub folder of your Axiell system. Please read the comments in that text file for more information about the formats you need to enter if you’d like to use the Assign PIDs task.
The task fills empty URI and empty or filled URL fields with the configured values. If the Replace any existing URLs checkbox on the task screen is marked, existing URLs will be replaced too. This is handy when placeholder URLs were used at first. (Existing URIs won't be replaced as URIs are permanent and should never need to be changed after you've submitted them to the handler. (Before submission to the handler you could still change them with a search-and-replace action though.)
In most data sources the task screen will look like this:
Only in Multimedia documentation the task screen will look as follows, offering the extra choice to select the PID types you’d like to process with this task (since media records have two types of PIDs). You can mark both PID type checkboxes or just one of them, but don’t leave them both empty.
After running the Assign PIDs task for records which did not have a URI yet, you must run the Persistent ID submission task and use the Create option to create new submission records.
After running the Assign PIDs task for records which already had a URI, you must run the Persistent ID submission task and use the Update option to have the handler associate the new URL with the existing URI. (URIs can't be "updated".)
It's up to you, as an application manager to decide which PID types should be filled in automatically upon record creation (then the relevant PID fields should have been properly configured) and which should remain empty to be filled in optionally and manually by the user when appropriate.
The PIDs and other identifiers screen tab can of course be hidden or switched off for users that don’t need to see these fields.
Management of PIDs via Handle/SurfSARA
To ensure PIDs are truly PIDs, there are service providers that offer unique URLs that can be used as URIs. As these URIs don’t contain the domain name of the organisation, they don’t need to change if the organisation changes its name. It’s these external URIs that are exposed/published. However, these URIs need to be associated with ‘real’ URLs to make sure that when the external URI is called, there is a valid response. This is where service providers like SurfSARA come in. They can host the URI and its association with the real URL. This association can be managed via the SurfSARA API. It’s this API implementation that we have done in Axiell Collections. When the URI is called, SurfSARA automatically refers to the internal URL.
To manage PIDs via SurfSARA, the organisation first needs to get a so called ‘prefix’. This is a unique code for the organisation that is added to the URI and is the identifier of the organisation for SurfSARA. The one-off registration process at SurfSARA is quite complex and involves multiple steps. Most Axiell customers will not be able to do this by themselves. After this one-off process, the SurfSARA API can be used.
The organisation prefix can be obtained through Handle.net for a one-off fee. The services of SurfSARA are charged annually.
The PID implementation in Axiell Collections includes features that enable the collections manager to register any automatically or manually entered URI/URL combination at SurfSARA. For this, a task Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (in 5.2) is present in all data sources. This task is run by the user on a selection of records that is ready for publication (i.e. containing a valid URI/URL combination). The task creates PID submission records in a data source called Registration/Synchronisation Queue Permanent IDs; for each URI that needs to be registered a record is created or updated in this table. Possible actions to choose from in the task screen are: Create, Update, Delete. After a Create, these registration queue records get an initial status ‘to be synchronised’. They are automatically linked to the catalogue records so that the status is visible in those records.
A so-called trigger service (the Axiell Change Tracking Service) has been implemented to run directly on the SQL table. The plug-in that Axiell has developed for this trigger service, detects that records are present in the Registration/Synchronisation Queue Permanent IDs data source that need to be registered at SurfSARA. This plug-in communicates with the SurfSARA API for each record and performs the action as defined (Create, Update, Delete). When the registration is successful, the status in the registration record is updated accordingly. If the action fails, this is also recorded in the registration record.
The most common action would be to Create a URI/URL combination. If URLs change, e.g. because of a domain name change as a result of a change in the name of the organisation, the existing URI/URL mapping can be Updated to replace the old URL with the new URL.
The Delete action (to delete both the URI and associated URL from the broker's registry) would normally not be used as this is rather contradictory to the idea of persistency. In a test environment, this action may have a use case though.
Actions performed via the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task affect the URIs and/or URLs at the broker and change the submission records in the Registration/Synchronisation Queue Permanent IDs data source, but they don't change the URIs or URLs registered in the catalogue records themselves. So an update of any PID fields in catalogue records must be done manually, by search-and-replace or by some custom task. Only a link to a newly created submission record will be added to the processed catalogue records.
Actions chosen in the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task can only actually be performed when certain conditions have been met: for a Create there should not yet be an action history other than a Delete action; for an Update, the history should be either Create or Update; for Delete the history should be either Create or Update. Both a URI and URL should always be filled in before an action is possible. An error message will be displayed if these conditions are not being met.
An example of a PID synchronisation record:
Workflow
From the user perspective and considering that the initial configuration has been done, the workflow to register new PIDs would be as follows:
1. | The user creates catalogue records. The URI and URL fields (and possibly any GUID field) are automatically filled when you save the record. |
2. | When records are ready for the PIDs to be published, the user selects these records and marks them in the result set. |
3. | In the result set, the user starts the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (from 5.2) task from the Features drop-down in the Result set context toolbar.. |
4. | This task opens a new window enabling the user to choose the action that needs to take place for the selected records: Create, Update or Delete. The default and most used action would be Create. |
5. | After confirmation with OK, a message will appear confirming that the selected records have been put in the registration queue. From model application 5.2, any skipped records and the reason for being skipped will be reported too. |
6. | The trigger service immediately starts processing the submissions with SurfSARA and updates the registration/submission status which can be seen at the top of the PIDs and other identifiers screen. If you don't see the changed status in the current record directly, try reloading it by switching to a different record and back. |
7. | The URIs are now ready to be used and they will refer to the associated URLs. Simply click the URI in the Collections record and the associated URL will be retrieved from SurfSARA and opened in a new browser tab. (You can also copy the URI and paste it in the address bar of a browser tab manually to get the same result.) |
Notes
• | It’s no use entering a URI or URL in a new record manually, as they will be overwritten when you save the record. (In an already existing record you can edit these fields though.) The automatic assigning of any GUID doesn’t depend on whether the record is new or existing but on whether the field is empty: only if it is empty will it be assigned upon saving the record. The automatic URI and URL do depend on whether the record is new. |
• | When you copy a record which already contains PIDs and possibly a GUID, none of these fields and other fields on the Permanent ID’s screen tab will be copied along because the GUID and PIDs need to be unique. (These fields are or should be non-exchangeable when it comes to copying records.) Upon saving the copied record will these fields be filled with new values. |
• | When you derive (move) a record from the Internal object catalogue to the External object catalogue or vice versa, initially all data from the Permanent ID’s screen tab will be moved along (because it’s a move, not a copy), but since Collections considers it to be a new record, the record will get new URIs and URLs. Any GUID will remain the same. |
ARKs (Archival Resource Keys) are just PIDs which can easily be converted automatically into an internal URL where the referenced metadata is published in some way. This automatic conversion (called "resolving" in this context) is possible because an ARK is a URI in the shape of a URL which should be similar to the internal URL already and because the ARK already contains the target database table name and record number from Collections.
The main difference however, is that in the Handle/SurfSARA implementation the PID (URI) and internal URL must be submitted to SurfSARA (and kept up-to-date if the URL were ever to change) so that SurfSARA can keep a register of those, allowing each PID to redirect to your internal URL via their service, while if you use ARKs as your PIDs, no third-party register needs to be kept because the uniqueness of your PIDs is ensured in a different way and the redirection to your internal URL is handled by a so-called "resolver" service, and resolver services are fairly easily interchangeable so you're not dependent on one particular PID registry. Moreover, for SurfSARA, the required organisation prefix must be paid for and the ongoing services of SurfSARA are charged annually, while the equivalent organisation code (a so-called NAAN) for ARKs is free of charge and the global ARK resolver N2T can be used freely although you still have the option to use a resolver of your own or the resolver at https://data.axiell.com (any fee for the latter still unknown).
Especially if you're starting out with PID registration, this method is certainly worth considering and can be applied in current Collections model applications (from version 5.0) too. For existing Handle/SurfSARA users, a transfer to ARKs might be a more complicated process because if your current PIDs contain a GUID, a resolver will have to be built to match those PIDs to internal URLs containing a dataset name and record number. We'll only discuss the setup and use of ARKs for PID registration starters here.
Requesting a NAAN or not
A NAAN (Name Assigning Authority Number) is just a unique 5-digit number representing your organisation as a publisher of ARKs, assigned to you by the ARK Alliance once. This ensures the uniqueness of your future PIDs because it will have to be included in all your ARK PIDs. If you use https://data.axiell.com to publish your record metadata, which for you as a customer is the easiest method, you can opt to have your ARKs published under Axiell's NAAN (and the WikiData Q number for your organisation): then you don't have to request a NAAN yourself, but your ARKs which are supposed to be persistent identifiers will then always contain our NAAN, even if you were to transfer to a different Collections management system someday. On the other hand, you could request a NAAN for your own organisation, making your ARKs more closely tied in with your organisation (and still use https://data.axiell.com to publish your record metadata if you want, while not being tied down to our data publishing service).
If you'd like to request your own NAAN, you can do that via the official form. Most questions on the form speak for themselves, but:
• | Only answer Yes to the question if you are a service provider if you are planning to manage the ARKs of other organisations too: then each of those organisations can request their own NAAN too. Service providers can use so-called "shoulders" in their ARK namespaces to create sub-namespaces: click here for more information about that. |
• | In the Organization base URL question, preferably enter the base URL of a web server (possibly with its own ARK resolver but not necessarily) chosen by your organisation which can serve up the metadata requested by a resolved ARK. If you're just starting out, you can point to https://data.axiell.com (if you have an agreement with Axiell to publish your metadata), or point to the base URL of a web server of your own on which metadata is published (regardless of whether it has an ARK resolver or not) or just point to the base URL of your organisation for now. A base URL is something like https://data.mymuseum.org. You can always change the URL you provided here, by using the request form again, but then To update an existing NAAN. |
Requesting a NAAN means also that your NAAN will be registered with the global ARK resolver N2T for free. Reserve up to 24 hours before your NAAN will be recognized by the N2T.net resolver. From then on, any ARK can be resolved (in other words: redirected to the provided organisation base URL including the ARK URI parts to identify a single Collections record) by putting "https://n2t.net/" in front of the ARK section of the URI (starting with "ark:"). If you're using https://data.axiell.com to resolve your ARKs, you don't need to use n2T, but you still can: the ARK section of the URI is the real PID, so to speak.
Anatomy of ARKs
For information about the general anatomy of an ARK, please see the official topic. However, when using https://data.axiell.com to resolve ARKs and publish your exposed record data, the Axiell-specific anatomy of an ARK PID to be registered in records is as follows:
https://data.axiell.com/ark:/49254/<customer_code>/dataset/<database_table>/<priref>
This URI has the following parts:
• | https://data.axiell.com/ - the Name Mapping Authority (NMA), the organisation publishing the ARKs (Axiell in our case). |
• | ark: - a literal, fixed text (the ARK label) identifying the ARK part of the URI, behind it. |
• | 49254 - the Name Assigning Authority Number (the NAAN), the unique identifier of Axiell with the ARK consortium. |
• | <customer_code> - the identifier uniquely identifying the customer amongst all Axiell Collections customers. This code consists of the letter Q followed by one of more digits. This so-called Q number corresponds with the object number from WikiData identifying this customer. (If it doesn't exist yet, you can request it yourself or we can do it for you.) This identifier also leads to all knows metadata or your organisation, in linked data format. Museum Kranenburgh, for example, is know at Wikidata under number Q4350196. |
• | dataset - a literal, fixed text indicating the URI is about a dataset in the context of Linked Data. |
• | <database_table> - name of the database table in which the record has been stored. The following names/database tables are available for use (in other words: for registering ARKs in): collection (refering to Axiell database table collect.inf), media, persons-and-organisations (refering to people.inf), thesaurus (for thesau.inf), geothesaurus (thesaugeo.inf). |
• | <priref> - the unique identifier (the record number) for a record in the Collections database table. |
An example of an ARK for the Museum Kranenburgh would be:
https://data.axiell.com/ark:/49254/Q4350196/dataset/collection/23
Setup of URI fields
For ARKs, assuming URI fields are already present in your application, you can set the Initial value formats similar to the set up for Handle/SurfSARA, although you have to use database table references and priref place holders in your ARK PIDs, instead of GUIDs. So for a URI field a valid Initial value format could for example be:
https://data.axiell.com/ark:/49254/Q123456/dataset/collections/%priref%
or
https://n2t.net/ark:/12123/Q123456/dataset/collections/%priref%
or
https://myownresolver.org/ark:/12123/dataset/collections/%priref%
Further, you could complete the existing URI/URL fields configuration by entering an Initial value format for the URL field too, like for example:
https://data.axiell.com/Q123456/dataset/collections/%priref%
but you don't need to because the URL to the data presentation won't be used in any way. If you don't enter an Initial value format for the URL fields, you should break the connection between a URI field and its matching URL field by emptying the Related URI field property in both.
That's really all you have to do as far as configuration goes.
PID tasks
Because with ARKs you don't need to submit your record ARK PIDs to a third party - the original NAAN request and the resolver do the rest - you will never need to execute the Permanent ID Registration (in 5.0-5.1)/Persistent ID submission (in 5.2) task and the Registration/synchronisation queue Permanent ID's data source will become redundant.
The Assign PID's task is currently still geared towards the Handle/SurfSARA implementation, but it can be adjusted (reprogrammed) to deal with ARKs instead. Until that has been done, you shouldn't use it for ARKs.