OEBPS FAQ

What is an open standard?

The word “standard” gets tossed around a lot in computing and engineering circles. The likelihood that two people who use the word mean the same thing by it is fairly small; if the two people are an engineer and a public-relations expert, the likelihood drops to near zero. Keep that in mind while reading press releases.

(This may seem like hairsplitting, but in fact I am glossing over a morass involving several so-called “standards bodies” and a great many more words than just “standard.” Take my word for it: “standard” is semantically null except as part of a compound.)

The phrase “open standard,” however, is marginally more precise, if just as colloquial. An open standard is a specification that has been endorsed by a body that exists specifically to create and ratify such standards, without expectation of directly profiting by them. Typically, though (unfortunately) not always, an open standard can be read and implemented without incurring fees or legal liability related to intellectual property issues such as patents and copyrights.

A specification is not an open standard just because it is in wide use. PDF, for example, is not an open standard; it was originally created and is currently owned and controlled by Adobe Systems, Inc.

Why does the OEBPS use open standards?

The most obvious reason to build on open standards is the reduction of intellectual-property headaches and costs associated with their adoption and implementation. Use of open standards avoids immediate licensing fees. It also helps prevent content and content producers from being held hostage by owners of proprietary formats and proprietary software.

Do open standards reduce intellectual-property headaches to zero? No, unfortunately not. So-called “submarine patents,” not disclosed by the entity owning them until well into an open standard’s development and adoption, have threatened a few open standards. Moreover, some patent-rich corporations are trying to encourage the use of patented technologies in open standards.

Even so, the open-standards process creates an interesting safeguard: a large pool of potential victims, many of them powerful and vocal, when patents threaten an open standard. The owner of the intellectual property may not have much leverage to demand royalties because of the sheer numbers who stand to lose if the open standard dies, or if using it suddenly costs a lot more than before. Alert standards developers and implementers have in recent history shamed patent owners into negotiating specific patents and backing away from inclusion of patented technology in open standards.

An additional benefit of open-standard adoption is the sheer weight of brainpower often brought to bear in standards development. No single entity, be it corporate, governmental, or academic, has a monopoly on smart developers. The open standards process allows smart developers from a wide variety of organizations to sit down together and hammer out the best standard they can. Moreover, an open-standard development process typically receives a great deal of public criticism and suggestions, so smart developers not immediately involved in developing the standard can still impact it. Proprietary specifications rarely (if ever) receive such varied and copious input.

Finally, the large number of open-standard implementers and adopters means that using them allows implementers and users of the OEBPS to tap into a gigantic pool of existing expertise (as well as training material such as books, seminars, and conferences). This ensures faster and simpler adoption of the OEBPS than would be possible were it based on proprietary or home-grown technologies.

What open standards are employed in the OEBPS?

The OEBPS employs the following open standards:

Unicode (from the Unicode Consortium)
the Extensible Markup Language, XML (from the World Wide Web Consortium, abbreviated W3C)
XHTML (also from the W3C)
Cascading Style Sheets, CSS (also from the W3C)
Portable Network Graphics, PNG (also from the W3C)
JPEG (compression standard from the W3C, file format from the Independent JPEG Group)
Dublin Core (from the Dublin Core group)

The next few sections of the FAQ discuss these specifications in more detail.

What is Unicode? How is Unicode different from ASCII?

ASCII refers to a near-universal agreement on how to encode letters, numbers, and symbols (such as those you type on your keyboard) in the binary numbers that computers understand. Unfortunately, ASCII is limited to only 256 characters, which is utterly inadequate for the variety of language and technical symbols used in the world. Wild and weird font and metadata tricks attempt to deal with this problem, but only really manage to mask it, often unsuccessfully at that.

(This inadequacy is the reason for the use of “character entities,” short codes that begin with an ampersand and end with a semicolon and represent non-ASCII characters, in SGML, HTML, and XML. Commonly-employed named character entities include & for an ampersand and á for an a with an acute accent.)

The Unicode Consortium is responsible for a new encoding system that will enable computers to differentiate tens of thousands of characters.

XML has been designed to work natively with Unicode, and the Open eBook Publication Structure is built on XML. Therefore, the entire range of Unicode characters may be used in OEB Publications. This does not mean, however, that every Unicode character will display properly in an eBook derived from an OEB Publication. A computing device may understand that a character is part of Unicode yet not know how to display that character on a screen, just as you or I might recognize a set of strange characters as belonging to a particular alphabet or language but not be able to pronounce them.

The Open eBook Publication Structure requires that conformant OEBPS reading systems understand when they have run into a Unicode character, but it does not require that the systems display all Unicode characters correctly. (Reading systems may choose to implement as many of them as they wish; no specific characters or character ranges are required.) When a reading system runs into a Unicode character it cannot display, the OEBPS requires it to notify the user by displaying a question mark or other distinctive symbol.

XML

What is XML?

XML is a “metalanguage,” a standardized way to create descriptions of text data that can be embedded in the texts themselves in the form of “tags.” XML can describe any sort of text data, from a scribbled sticky-note to a physics textbook.

XML describes different aspects of text from a word-processing or page-layout program. These make text intelligible to human readers using visual cues such as fonts and font size, indentation, and placement on a page or screen. Humans are accustomed to these cues and react well to them, but they have well-known drawbacks:

Visual cues are, of course, useless for the visually impaired.
Computers do not understand what visual cues mean to a human, even though computer programs can produce them. Visual cues are not a good foundation for making computers react in a specific way to different types of text.
There are many more distinctions possible in texts than there are visual cues available to capture them. Italic type, for example, is commonly used for emphasis, foreign or unfamiliar terminology, genus-species names, book titles, and many other things. Often, an electronic publication might make use of such distinctions (e.g., to create a glossary of unfamiliar terms without also capturing book titles, or to search for genus-species names without finding everything that has been italicized).
Conversely, a given distinction may be represented by more than one visual cue: e.g. emphasis can be shown by italics, boldface, or underlining.
Visual cues are inconsistent across books, and even across different editions of the same book. The same chapter heading can be made visually distinct in a near-infinite number of ways. If chapter headings in all books are to be treated similarly (e.g. for searching, or for automatically generating a table of contents), the visual cues are insufficient and even counterproductive.
Visual cues are “flat,” offering even humans (never mind computers) relatively few hints about the underlying structure of a document. XML allows for complex hierarchical structures to help ensure that a document belonging to a particular type of documents is logically and structurally consistent with the other documents of that type.

Where a page-layout program might use italics, then, an XML-based description of a particular type of text could use any number of distinctions; one can call a title a title, a foreign term a foreign term, and so on. With the use of stylesheet languages such as Cascading StyleSheets (CSS), these distinctions can be given visual distinctiveness as well. If desired, of course, they can be left “behind the scenes” to enrich the text for purposes of computer-aided searching or analysis without getting in the reader’s way.

XML is not magic. While XML can be made to describe almost any sort of text, how accurate or useful the description is depends on those who create the description and those who use it to describe each text. For example, the markup of a document (whether done by human or computer) may use tags in a manner inconsistent with their intended use. (The colloquial term for this is “tag abuse.”) Assuming that foreign terms and emphasis can appear in the same places within a text, XML cannot prevent the description of a foreign term as emphasis.

How is XML different from HTML?

XML provides the tools to create tagging systems for texts. It is not a finite list of tags from which to choose. In contrast, the web markup language HTML is a finite list of tags. Imagine HTML as a toolbox, and its tags as tools. XML is a tool and toolbox factory, then, a way to design and build tools and toolboxes. HTML is just one set of tools that can be built using the principles of XML.

In practice, many people think of HTML tags in terms of the visual effects they produce in web browsers. To them, a <blockquote> tag may not have anything to do with a block quotation; it’s simply a convenient way to indent text on both sides. This is not, of course, how HTML was intended to be used, and the result has been rather chaotic, as HTML tags were squeezed, abused, and redefined in order to produce visual effects in browsers.

XML is different. Ideally, a language created with XML to describe a particular kind of text should pay attention to what that text and its component parts are, not what they should look like. A figure caption might be italic, bold, small caps, or some combination of those; that doesn’t mean it’s stopped being a figure caption. XML doesn’t care how you want your figure captions to look; it cares that they’re figure captions.

Once you know that, you can decide how figure captions look with a stylesheet, which tells a browser or eBook reading device “all figure captions are 10-point Garamond Bold” or similar. The separation of logical tagging from the stylesheet allows you to change your mind about the visual appearance whenever you like, without disturbing the figure caption’s basic identity as a figure caption.

The implications are legion, but a few stand out. First, text marked up in XML can have its “look” changed at a moment’s notice, without retagging the text. Second, data searches through texts marked up with XML can take advantage of the never-changing markup. (If you suddenly change your chapter heads in HTML from a <h1> tag to a <h2> tag so that the font isn’t so big, how is a poor computer going to know that it’s still a chapter head, and search it appropriately?) Third, the stricter rules on tagging in XML mean that marked-up XML text is generally cleaner and easier to work with, for both computers and humans, than is the general run of HTML. (Try “view source” in your browser on any web page—though this one is not the best example, as it’s fairly clean. What you see will probably appall you. It should.)

What is a DTD?

A Document Type Definition (DTD) is a document that describes the XML structure of a particular type of text:

the “character entities” that represent special characters in the text,
the “elements” that constitute the text,
the “attributes” that describe those elements, and
where and how often elements may appear.

DTDs are written in a special syntax, although the W3C has issued a method of structure description (XML Schema) that is itself written in XML.

It is not necessary to be able to read or write a DTD or schema in order to code OEBPS Publications, although it can be helpful, particularly in the creation of Extended OEBPS Documents.

What is XHTML?

XHTML is no more than the reformulation of HTML 4 as XML.

XML as opposed to what? Well, XML’s parent standard is known as SGML (Standard Generalized Markup Language). The differences are not germane to this FAQ; what is important is that HTML was originally formulated using SGML, not XML, not least because XML didn’t exist yet.

XHTML is simply HTML that is built with XML instead of SGML. (Impress friends at parties with that string of acronyms.) The W3C has stated that further development will take place with XHTML; it considers HTML a dead end.

What is CSS?

Cascading StyleSheets (CSS) are a method of providing layout and display information for tagged text. Designed for the World Wide Web, CSS is also employed by the Open eBook Publication Structure.

Two sets of CSS specifications (known generally as CSS1 and CSS2) now exist; CSS2 was designed to supplant CSS1, but has not been completely implemented yet by browser developers. A third specification (CSS3) is in active development by the World Wide Web Consortium, and promises to take into account the needs of specialized applications such as the OEBPS.

The OEBPS has adopted selected segments of the CSS1 and CSS2 specifications, and has created a few segments of its own (e.g. for the coding of running heads), pending the inclusion of such functionality in CSS3.

The syntax of CSS is really quite simple; those accustomed to print design specs can pick it up quickly. A CSS rule consists of three basic parts:

the selector, which identifies the markup the rule applies to;
the property, which identifies the aspect of layout or presentation to be affected; and
the value, which nails down how the property should look.

A CSS rule may apply to more than one selector, in which case the selectors are separated by commas. It may also contain more than one property, in which case the properties are separated by semicolons. Some properties take more than one value, separated by spaces; other properties allow fallback values, separated by commas.

For example, if all <h1> elements in a document should be red, the CSS rule to accomplish that is h1 { color: red }. h1 is the selector, color is the property, and red is the property’s value. If all <h1> and <h2> elements should be red and bold, the CSS rule to accomplish that is h1, h2 { color: red; font-weight: bold }.

The current version of the OEBPS allows only three kinds of selectors:

element name selectors (such as h1 in the example above),
class attribute selectors, and
combined element-name and class-attribute selectors.

The “class” attribute is an XHTML construct that can be added to just about any XHTML tag. It serves to further distinguish tags from each other. For example, if the six header tags XHTML provides are not sufficient, new ones can be created with the class attribute: <h1 class="maintitle"> versus <h1 class="subtitle">. In CSS, a dot signals the class attribute, so the selector h1.maintitle stands for only the h1 elements whose class attribute has the value maintitle. The selector .maintitle stands for any element whose class attribute has the value maintitle.

What is PNG? What is JPEG? Why doesn’t the OEBPS support the GIF image file format?

The PNG image file format was developed by the World Wide Web Consortium (W3C) as an open replacement for the GIF format.

JPEG is in fact a compression standard, not a file format, and as such is maintained by the W3C. The file format associated with JPEG, JFIF, is maintained by the Independent JPEG Group.

Unisys Corporation holds patents on the GIF format, and in recent years has chosen to demand royalties from some web page designers and hosts who use the format. In order to avoid legal difficulties between Unisys and eBook content creators, the OEBPS does not require or recommend support of the GIF format. This does not mean that eBook reader manufacturers cannot choose to support GIFs; it does mean that to be OEBPS-conformant, readers must support the PNG format, which is open to anyone’s use without patent issues.

What is Dublin Core?

Dublin Core is a metadata vocabulary developed by a group of librarians and other book professionals.

“Metadata” is information about a textual object (such as a book or document) that is not logically part of the object itself. The title and copyright pages of a print book hold most of the book’s metadata: author and other creator information, title, publisher, ISBN, and so on.

The OEBPS uses an XML expression of Dublin Core’s metadata set inside its package file.