Import job properties: Mapping
On the Mapping tab, which is present when you have opened a new or existing import job in the Import job editor, you pair tags in the exchange file to tags in the target database table, for the current import job.
Click here for information on how to edit job properties in general.
On the current tab you'll find the following settings:
In most cases, the field names or tags in the exchange file to be imported will differ from the field names (tags) in Collections. With the field mapping you specify how the import must translate field names, by pairing source tags from the exchange file to tags from the target database table.
The naming of the fields in the exchange file depends on the exchange file format (see below for a list) and on how that file was created. If you don't know the field names in this file, try to open it in a text editor to find out what the source tags are; and if you can't open it in a text editor, use the original database program.
If you already chose a destination database on the General tab of this import job, you can easily fill in the destination tags, by clicking the ... button next to the Destination field column cell of the current tag pair. A list with available fields in the target database will open, for you to choose from; the destination field name is automatically entered in the proper table cell. You can also type a field tag directly into the field but if it doesn't change into a field name when you leave the field, you've entered a non-existing tag and you shouldn't do that, but only Designer 7.7.3 and up actually validate the Destination field mapping against the target data dictionary and report errors like this when trying to execute an import job with non-existing destination fields:
import error '1', info: 'Tag 'TI' not found in database 'thesau'; field ignored.'
import error '1', info: 'Tag 'Df' not found in database 'thesau'; field ignored.'
import error '1', info: 'Tag 'TY' not found in database 'thesau'; field ignored.'
import error '1', info: 'Import stopped on record 0, error code = 1'
If the names of the source tags (or XML field elements) in the exchange file correspond exactly with the destination tags (or the associated field names, in the case of XML import) in your Collections database table (e.g. because you exported from Collections first), you can define a one-to-one translation, by providing only one special tag pair as the mapping:
** |
** |
To exclude certain fields from being being imported if you use the mapping ** | **, map each of those fields literally to: <null> (the word "null" enclosed by sharp brackets). Fields mapped to <null> will be ignored during import.
Examples of tag pair lists for specific exchange file formats can be found in the description of the relevant format (see the list below).
Which tags should be included in your mapping depends on which data from the exchange file you actually want to import and on the field definitions in the target database table. This is especially relevant for linked fields which have associated link reference tags and possibly some merged-in fields. See the import options Process external links and Process internal links, for information about exporting and importing such fields.
If you want to import data into a field of incompatible data type, you need to process that data during import (after reading in a record from the exchange file and this field mapping has been applied, but before writing it to the target database table), via an Adapl procedure which must be written for this purpose.
You can use temporary (two-letter) destination tags only if those have been specified in the data dictionary (the .inf) of the database table in which you are working, meaning that you have to add new tags of the Temporary data type to the .inf specifically to be able in the import job to map to (these) tags in which in no actual record data appears. This allows you to process that data by means of a custom adapl without already having it mapped to an actually used field in the target database: this way you can collect data from multiple fields, for example, before you assign the result to the real destination tag in your adapl. By using temporary destination tags of the Temporary data type, you also ensure that values you assign to them in an adapl won't be saved in the record by accident, even if you do not empty them explicitly at the end of the adapl code. Also note that if you want to map multilingual data to a tag that you'll only be using in the adapl, you should define this Temporary tag as a multilingual field in the database table structure too. Note that in the past (for Adlib), it was allowed to use temporary tags without specifying them in the .inf. To apply import jobs with such (non-specified) temporary tags for Collections you will need to define those tags in the relevant .inf still as tags of the Temporary data type.
For importing (Adlib) Tagged or CSV with importtool.exe however, there's a better solution: here you can use temporary fields without the need to create those fields in the data dictionary of the target database table definition first, by using the PARAMETERS FACS construction (as was already available for Collections tasks and output formats). See the full importtool.exe topic for more information.
For importing metadata from an image directory, there is a special source tag available to include in the tag pair list in the import job: PATH. In the exchange file, in PATH, the absolute path will be stored, including the file name of the imported image. (The Windows FILE source tag on the other hand, only stores the file name, without the path.) An example of an absolute path is: C:\myimages\collection1\v1203dr.jpg
In the tag pair list you can also specify field and record separators that are different from the default separators for the current input format.
There's also the possibility to specify a role for running import jobs, which is required for environments where running in elevated administrator mode is not allowed.
Especially for formats very similar to the ASCII delimited (CSV) exchange file format, you can specify an alternative character or string which marks the beginning and ending of field data. By default, field data in this format is enclosed by double quotes, but if the format of your exchange file uses some other character, then you can specify this here in the field mapping by providing the literal character or string as the source tag, and literally <LITERAL> as destination tag. If field data is not enclosed by any characters, then literally enter as source tag: <null>. Examples:
' |
<LITERAL> |
<null> |
<LITERAL> |
Note that the default character to enclose field data is the double quote ("). You do not need to specify the default setting in the field mapping, as it is assumed implicitly.
If you are about to import DOS, ANSI, UNICODE or OCLC encoded records from an exchange file to the proper (wider) character set of the target database - Collections typically uses UTF-8 encoded Unicode - you must also convert the data before or during import explicitly. Otherwise, (special) characters from your source data may be replaced by other characters during import. So if you are about to import an existing DOS or ANSI database into your UTF-8 Unicode encoded data, you must convert the imported records to the proper encoding explicitly. Doing this before import is a good custom because you don't have to include the conversion in the import job nor on importtool.exe doing the actual conversion. If you open the exchange file in e.g. Notepad++ and check the Encoding menu, you'll see what the current encoding is and you can use the Convert... options to convert it to UTF8 if necessary.
In the current field mapping of an import job you must then use a special tag pair to execute this character set conversion of your data during import. As source tag, enter: DOS, and as destination tag, enter: <CHARSET>. (Type both terms literally as printed here.)
If you are about to import an existing ANSI encoded database into a UTF-8 encoded database, then enter as the source tag: ANSI, and as the destination tag, again: <CHARSET>.
The third possible conversion is from UNICODE to UTF-8, which requires UNICODE as source tag and <CHARSET> as destination tag.
The fourth possible conversion is from OCLC to the encoding of the target database (ANSI or UTF-8 Unicode). The relevant tag pair is: OCLC<CHARSET>.
You can document your mapping by adding comments to individual tag pairs. (These comments are also included in the Documentation that you can generate of this import job.) This is only useful if you save this import job.
From 6.5.1, this Language option and the Default language option on the Options tab provide a way to specify the language attribute (using a standard language code*) with which an imported value (originally without language attribute) has to be stored in a multilingual target field (in a SQL database only), during import. With the current option you specify this language per mapped field, while the Default language option on the Options tab sets the language for all multilingual fields in the target database table together.
These two options always apply if your exchange file is of the (Adlib) Tagged type, because it is mono-lingual: it cannot hold multi-lingual fields.
For AdlibXML exchange files, both options apply only to mono-lingual field values (values without language attributes): Adlib XML format can still hold mono-lingual fields, even if multilingual fields are present as well. So on import of mono-lingual fields (fields without language attribute) into a database that does contain multilingual fields, you’ll want to select the data language into which the imported values will be stored. If you do this in the field mapping, you can specify per field into which language a value must be imported. You can even repeat a target field in the mapping, provided that you specify a different language for all repetitions; this allows you to e.g. import tag ti into multilingual tag ti in the Dutch language, while importing tag ta (which in this example contains a translation of the title from ti) to the same multilingual tag ti in the English language.
For multilingual fields in AdlibXML type exchange files, the Default language and field mapping Language setting do not apply. This is because AdlibXML contains a language attribute per multilingual field value. When you import a record with multi-lingual fields, all multilingual values from the exchange file will automatically be copied to the target database table, thereby leaving other, existing multilingual values intact. So, if a source field were to hold three values in the Dutch, English and French languages, and the target field would hold two values in the Dutch and Greek language, then the Dutch value would be overwritten, the Greek value would remain intact, and the English and French values would be added. If multilingual fields in an AdlibXML exchange file contain a flag marking the invariant language, then this flag will be imported as well.
If you use neither option, values will be added to multi-lingual fields without any language attribute; however, when a record with a multi-lingual field is written again, for instance by editing and saving it, Collections will check if the field already has a data language attribute and if not, it will add the active data language as the attribute.
* The language code is a code put together from an abbreviation for a language and a region identifier (for more information about this, see the “Using Language Identifiers (RFC 3066)” document which you can find on the internet). The code for British English, for example, is en-GB.
From 6.5.0, you can indicate here that you do not want to replace the occurrences of a field, but wish to append any new occurrences from the exchange file to the existing occurrences of the field in the target record. This is useful if a record in the exchange file which you want to import contains new information in certain field occurrences, but not (all of) the old information is still present in your database, while after import you want the old and the new information to be stored in separate occurrences of the field. This functionality is intended for single repeatable fields, not repeatable field groups. In the Update option drop-down list you have the following possibilities:
• | Overwrite: this is the standard setting, which does nothing other than what older versions of Designer did by default: the contents of the target field occurrences must always be overwritten with the imported data.
For a multilingual field this means that existing translations will be overwritten with the imported ones from the same language, that additional translations will be imported and that existing translations which are not being imported won't be deleted. (This functionality is available from version Designer 7.3.15285.1.) For example: if an existing Thesaurus record contains three translations of a term: myEnglishTerm, myFrenchTerm and myDutchTerm, and you perform an update import on the English translation of the term with the following data for the term field: myEnglishTerm, myOtherFrenchTerm and myGermanTerm, then the resulting database record after import will contain: myEnglishTerm, myOtherFrenchTerm, myGermanTerm and myDutchTerm. Note that you cannot overwrite an existing value in an updated record, with an empty value from an exchange file: you'd have to use an adapl associated with this import job to clear certain existing fields. |
• | Append: all occurrences of this field will be appended as new occurrences to occurrences of this field already present in the target record. The Delete data in existing records option on the Options tab must be set to No, and of course Clear database must not be marked. |
• | Append if not present: occurrences of this field will only be appended as new occurrences to occurrences of this field already present in the target record if the contents of those new occurrences is different from the contents of already present occurrences. The Delete data in existing records option on the Options tab must be set to No, and of course Clear database must not be marked. |
See also
Exchange file formats:
• | Tagged |
• | XML |