Blog

EPUB Book identifier

EPUB Book identifier

EPUB3 files contains a lot of metadata, metadata is an important part of the book. Although this data is usually unseen by users, this is still very important. This data is usually used by cataloging systems to autmatically catalog the books. The main metadata items are: Book title, authors, editor and publisher. This article will discuss one very important metadata item, the book identifier. The book identifier usually contains the book ISBN number, that is used by cataloging systems.
The identifier is also internally used by Helicon Books cloud system to identify the books sent to users. This may seem obvious and thus why do we dedicated an article to this meta element?

This identifier must be unique that no other book in the world contain the same identifier, however there is no predefined method for defining identifiers, there are the common ISBN or ISSN for magazines, there is also ISTC or DOI all are methods designed to achieve the same thing, to create a unique identifier. Some automatic EPUB generation software generate something similar to DOI, however it is not really a DOI, it is a random number and thus may conflict with other publications generated by different manufacturers.

The identifier metadata is marked by the dc:identifier tag.
This tag, like all other metadata tags resides in the metadata section on the OPF file.
An EPUB file must contain at least one dc:identifer tags, however it may contain more then one dc:identifier tags, so which tag should the system use?
The dc:identifier to be used should be marked with a special id attribute. This id attribute should be stated as the 'unique-identifier' attribute of the OPF file.
For example, the following is an example from one of the files produced by Helicon Books:

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="bookid" dir="rtl" >
<metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier id="bookid">HBOT140901</dc:identifier>

Note the unique-identifier attribute of the package tag has the value "bookid".
The dc:identifier tag has an attribute: This tells the system that this tag should be used as the book identifier. The book identifier itself is in this case: HBOT140901, this number is an identifier used by Helicon Books instead of ISBN. If customer has ISBN for this book, the ISBN will be used.

In many cases the same book has several versions, for example, if we found some typos in the book and produce a new book. We probably do not want to change the identifier, as this should not be cataloged as new book. But we do need that cataloging systems and reading applications will understand this is a different book and will also be able to tell which file to use.This is done by another meta tag which is mandatory in EPUB3 this is the dcterms:modified tag.
For example:2014-09-15T15:51:48Z

This tag shows the exact time this file was created. We have to use this time and not the file modification time since the file may be sent by email which does not maintain this data.
The time is given in the following format:
CCYY-MM-DDThh:mm:ssZ
where:
CCYY - is the year in 4 digits.
MM - is two digits month number (01 - 12).
DD - is two digits date of month (01 - 31).
T - a constant separator between the date section and the time section.
hh - two digits hour (00 - 23)
mm - two digits minute (00 - 59)
ss - two digits second (00 - 59)
Z - Constand suffix signifying this time is Coordinated Universal Time (UTC) (similar to GMT).

A reading system or cataloging system should construct an internal identifer from both the main dc:identifier tag and the dcterms:modified metadata tag. This internal identifier should be used as the book identifier. If the book dc:identifier is the same as another book, but the dcterms:modified tag contains a newer time, this book should be updated.
Automatic systems that generates EPUB file such as Adobe inDesign usually sets the main dc:identifier tag to something like: urn:uuid:D522D30B-2891-4E3C-A663-027BC4CFE313

The numbers and letters used after the uuid, are random but would be preserved in future exports of the same book (only the dcterms:modified meta data would be updated).
When sending this file to stores or other systems, it is best to change this number to ISBN or an internal cataloging scheme, to make sure no other book will contain the same identifier.
The number generated by the software is never truly random and therefore there is a good chance that two systems used by two different vendors would produce the same number for two completely different books.
When changing this number in the OPF file, make sure you change in also in the NCX file (if you use one). In the NCX file the identifier is given in a meta tag  for example:

<meta name="dtb:uid" content="urn:uuid:D522D30B-2891-4E3C-A663-027BC4CFE313" /> 

Note: If any vendor finds the number given in this example is one of the book they produced, this shows exactly what I stated that this number is not truly random as I invented this number and did not take it from any book I had in my system.

Share this post