What is an OEBPS Document? What is an OEBPS Publication?

An OEBPS Document is one XML-coded piece of book text. There is no restriction on its size or on what portion of the book it represents. It could be a single chapter or an entire book.

An OEBPS Publication includes the marked-up text, all image and stylesheet files associated with it (as well as files of other types, when in use), and the OEBPS package file. In other words, the OEBPS Publication is the “whole book.”

What is an OEBPS reading system?

First, what an OEBPS reading system is not: It is not (necessarily) the device or software on which one reads. It is not any sort of file. It is not software that preprocesses and distills OEBPS Publications.

An OEBPS reading system is any combination of hardware and software that takes in an OEBPS Publication and spits out a reading experience.

The OEBPS conceives of a reading system as a big black box. The OEBPS talks about what goes into the black box, and (to a somewhat lesser extent) what has to come out of the black box. It really says very little about what happens inside.

A reading system may comprise multiple pieces of software. Different parts of a reading system may do their jobs on multiple computers or other devices. Nothing in the OEBPS forbids a completely self-contained reading system, but nothing in the OEBPS requires one, either. In fact, no such system exists that I have seen.

Why is the OEBPS so vague on the innards of the black box? Because many different sorts of black boxes could do an equally effective job of accepting an OEBPS Publication and creating a reading experience. Moreover, different situations inevitably require different black boxes; for example, a reading system that creates a reading experience for the visually-disabled is likely to have very different innards from one that creates a reading experience for the sighted.

Probably the most conspicuous consequence of this decision is the OEBPS’s notable silence on a so-called “delivery format” for OEBPS Publications—that is, a single file into which all the text, images, and other files of an OEBPS Publication would be distilled for delivery to the reader’s device. This silence has caused a good deal of confusion, as several reading systems create binary files as an intermediate step between OEBPS Publication and reading experience.

The OEBPS allows this, but does not require it, and says nothing whatever about what such a binary file is like. One important ramification of this is that if all you have is a binary file for a given reading system, you do not have an OEBPS Publication. That binary file is not alterable; you cannot fix its typos or change its design. It may not be upgradable. It may not even be usable past the first major revision of the reading system software. You must keep (or ask your vendor for) a genuine OEBPS Publication as well as any intermediate binary formats you need.

What is an OEBPS Package File?

(Note to SGML purists: I am using wedges to delimit element names. I know this is not accepted SGML practice. For the less-experienced, however, it is convenient and quickly comprehensible. I am delimiting attribute names with square brackets.)

The OEBPS package file is the guide to an OEBPS Publication. It is an XML file that conforms to the OEBPS Package File DTD, part of the OEBPS. The root (that is, top-level) element for a package file is the <package> element. This element must have a [unique-identifier] attribute that uniquely identifies the OEBPS Publication. (What system of identifier is used is up to each eBook author. The OEBPS mentions ISBNs and DOIs.) This [unique-identifier] attribute must correspond to the value of a <dc:Identifier> element (see below).

The OEBPS package file consists of five main parts, two of which are optional.

The <metadata> element

The <metadata> element provides information about the publication as a whole (e.g. ISBN or other identifiers, author information, publisher information). The one element that is required inside <metadata> is <dc-metadata>, which must in turn contain a <dc:Title> element with the publication’s title, and a <dc:Identifier> element containing a unique identifier for the publication. (More than one <dc:Identifier> element is permitted, allowing publishers to register their books under several identification schemes.)

The “dc” in these element names stands for Dublin Core, an emerging standard for publication metadata. Several optional Dublin Core elements are specified in the OEBPS Package File DTD; they are described in Section 2.2 of the OEBPS.

A minimal <metadata> element might look like this:

<metadata>
  <dc-metadata>
    <dc:Title>My eBook</dc:Title>
    <dc:Identifier scheme="ISBN">0-0000-0000-0</dc:Identifier>
  </dc-metadata>
</metadata>

The <manifest> element

The <manifest> element gives a list of all the files (text, image, stylesheet, or other) in the publication. Each file must be listed in an <item> element, in which the [id] attribute gives an identifier for the file that is unique to that file within the publication, the [href] attribute gives the filename, and the [media-type] attribute gives the MIME type for the file.

MIME types for the core filetypes supported by the OEBPS are:

If a publication file is not of one of the above types, its <item> element must also contain a [fallback] attribute, whose value is the [id] attribute of a “fallback” file.

A typical <manifest> element would look like this:

<manifest>
  <item id="chap1" href="chap1.html"
     media-type="text/x-oeb1-document" />
  <item id="chap1fig" href="chap1fig.png"
     media-type="image/png" />
  <item id="chap1movie" href="chap1movie.mpg"
     media-type="video/mpeg"
     fallback="chap1movietext" />
  <item id="chap1movietext" href="chap1movietext.html"
      media-type="text/x-oeb1-document" />
</manifest>

As you can see, the above publication has one chapter, one image, and a video file with an OEBPS Document for fallback.

Those who prefer to organize their files into separate folders must be careful that the [href] attribute also has the folder name for each file; they should also ensure that the package file lives at the top-level folder. If the HTML files are in a folder called “textfiles,” while chap1fig.png is in a folder called “artfiles” and chap1movie.mpg is in a folder called “moviefiles,” with all these folders inside a file called “myEbook,” the <manifest> element should look like this:

<manifest>
  <item id="chap1" href="textfiles/chap1.html"
      media-type="text/x-oeb1-document" />
  <item id="chap1fig" href="artfiles/chap1fig.png"
      media-type="image/png" />
  <item id="chap1movie" href="moviefiles/chap1movie.mpg"
      media-type="video/mpeg" fallback="chap1movietext" />
  <item id="chap1movietext"
      href="textfiles/chap1movietext.html"
      media-type="text/x-oeb1-document" />
</manifest>

The package file itself should be inside the “myEbook” folder, but not inside any of its subfolders.

The <spine> element

The <spine> element gives a linear reading order for the text files. This element contains one or more <itemref> elements. The [idref] attribute should be identical to the [id] attribute of the <item> element representing the Document that should occur at this point in the reading order.

While the <spine> element is required, and must contain at least one <itemref> element, it is not strictly necessary to use it to define a reading order. It is acceptable to have only one <itemref> element listed (as a jumping-off point) and to manage reading order from then on via hyperlinks.

The <spine> element for the above mini-eBook might look like this:

<spine>
  <itemref idref="chap1" />
</spine>

Be aware that <itemref> elements within the spine may only refer to OEBPS Documents (<item> elements whose media-type is "text/x-oeb1-document"), not to images or any other kind of file.

However, the spine need not refer to all the OEBPS Documents in the manifest. “Out-of-spine” content is not placed within the normal reading order of the publication by the reading system. How, then, can it be accessed? By hyperlink. How a reading system treats such hyperlinks is up to it, but typically the content of the link target appears in a “pop-up” on the screen. Out-of-spine content is very useful for floats (such as figures and tables) and footnotes or endnotes.

The <tours> element

The <tours> element, which is optional, is intended to allow publishers to lead readers (or potential readers) quickly through points of interest throughout the publication. Section 2.5 of the OEBPS discusses this element fully.

The <guide> element

The <guide> element, which is optional, is intended as a list of important reference elements (such as tables of contents or illustrations, indices, title pages, and so on) within the publication. Section 2.6 of the OEBPS discusses this element fully.

What is the difference between Basic and Extended OEBPS Documents?

A Basic OEBPS Document sticks strictly to the XML tags and permitted structures mentioned in the OEBPS and laid out in the Basic OEBPS Document DTD. These tags and structures are drawn from the HTML 4.0 and XHTML 1.1 specifications. A Basic OEBPS document must also contain what is called a “DOCTYPE declaration” declaring itself as such. The Basic OEBPS Document DOCTYPE declaration looks like this, and appears at the top of the document:

<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Document//EN" "http://openebook.org/dtds/oeb-1.0.1/oebdoc101.dtd">

Including this declaration in a document claims that the document obeys the strictures of the Basic OEBPS Document DTD. Do not include this declaration if you are not sure your document obeys the DTD! (See the section on DTDs to learn what they are and how they work.)

An Extended OEBPS Document either uses tags that are not mentioned in the OEBPS, or uses the mentioned tags differently from what is allowed in the Basic OEBPS Document DTD. Extended OEBPS Documents must include a CSS stylesheet that tells the reading system how to display any unfamiliar elements. Basic OEBPS Documents do not require a CSS stylesheet (although it is certainly permissible to use one).

Basic OEBPS Documents enjoy one advantage over Extended OEBPS Documents: the ability to use “named character entities” (e.g. &aacute; for an a with an acute accent) for some special characters. The Basic OEBPS DTD includes the same named entities as XHTML. For somewhat complicated reasons, these entities should never be used in Extended OEBPS Documents. Instead, use Unicode.

What is a fallback file, and when must I use it?

The OEBPS specifies file types that compliant reading systems must handle properly:

This does not mean that other types of files are not permitted. To use a different kind of file (for example, a sound or video file), you must also include a “fallback file” that is of one of the permitted filetypes listed above. For example, the fallback for a video file might be a static image from the video in JPEG format; the fallback for a sound file might be the XML-coded text of the sound clip or a PNG image of the sheet music.

It is possible to create a chain of fallback files (one file falling back to another, which falls back to a third, and so on), as long as at least one of the fallbacks is of one of the permitted file types. A video segment could have a fallback TIFF file, which could have a fallback JPEG file.

Fallback files ensure that no eBook will have parts that are completely un-displayable by any compliant device. Even if a particular device does not support video clips, someone using that device will have something to examine.

Fallbacks are listed as part of the OEBPS package file manifest.