What parts of an OEBPS Publication should I archive? What kind of documentation should I have about an OEBPS Publication?

If your OEBPS Publication was derived from another SGML or XML dataset, that source material should be part of your archive. Otherwise, an OEBPS Publication can largely stand on its own as an archive. It is wise, however, to archive some documentation of the coding strategy used in the eBook. Such documentation might include:

I like to keep a list of all tags employed and all character entities present, also. I wrote a short Python script that compiles this information for me; anyone with basic scripting ability could do the same.

What should I find out from a vendor offering to convert to XML or an OEBPS Publication for me?

It’s a caveat-emptor world out there, particularly with XML and OEBPS so new. The more you know about both, the better off you are when shopping for conversion vendors or service bureaus. If you have read and understood this FAQ, you’re better off than many.

Asking for “XML” or “OEBPS” sight unseen is just like asking for “Quark” or “Word” or “print” sight unseen. The terms are so broad as to be meaningless by themselves, and they encompass a vast continuum of quality. If you do not see and evaluate what you will receive from your vendor, you go a long way toward ensuring that what you receive is toward the low end of the quality spectrum.

(I have in fact seen “XML” samples created by an important conversion company, now no longer extant, in which every single line of a fairly complex document—several different heads, a number of paragraph and list types—was coded as a <p> with a class attribute attached whose value was an utterly undocumented number, e.g. <p class="1">. Table rows were marked by XML comments rather than set off by elements. This is perfectly legal XML, perfectly legal OEBPS. It’s utterly useless to everyone but the conversion company, but it’s legal.)

You should therefore insist on seeing samples, created from your own materials if possible, before you sign any conversion contract. You should cooperate with a prospective vendor who requests typesetting files, print design specs, example books, or similar data from you, as long as you get sample output in return.

If you do not feel comfortable evaluating XML/OEBPS quality, by all means hire outside expertise to do the sample evaluation for you. Such an evaluation need not take much time or cost much, and it could save you from an unpleasant surprise or costly mistake.

You should also discuss deliverables with the vendors you are considering:

If you are gearing toward a particular reading system, ask to see samples compiled for that system also. That should give you a reasonable idea of how much effort your vendor is putting into presentation.

Ask your vendor how much input you will have into the process of conversion. At what stages of the process will you see the work? Will you have an opportunity to proofread? Alter markup strategy? Change aspects of the presentation? How will the vendor communicate with you, and how will the vendor act on your requests?

What kind of design spec can I create for an OEBPS Publication?

As you probably know by now, OEBPS Publications are not yet as flexible as print in terms of visual design. This does not mean that you are helpless to affect the presentation of your eBook; only that you must think about your eBook a little differently from print, and concentrate on aspects you know you can change.

XML and HTML being what they are, you cannot do a great deal of design with OEBPS markup alone. What the markup can—indeed, must—do instead is leave you enough richness to hang a good CSS from. To do this, it must distinctly mark every element that even might need distinctive presentation.

If you have a print design spec or stylesheet, with all distinctively-presented elements in your data noted, that is a good start. (If you edit electronically, and have a list of styles or styletags, that is just as good.) If you do not, go through your book yourself and look for everything that looks different. Give all those things names based on the role they seem to play in the book (not “16-point Frutiger bold,” but “second-level subhead”). Do not forget to look for divisions between sections that are marked by extra space, ornaments, or rules. A list of what you’ve found, along with the CSS properties and values you wish to apply and identification of design elements that do not adapt naturally to OEBPS, will be a valuable aid to your vendor.

Next, become familiar with what you can and cannot do with CSS. Read Section 4 of the Open eBook Publication Structure carefully; it should be quite comprehensible for anyone with a design or print-production background. (If it is not, O’Reilly Publishers offers an excellent CSS reference called Cascading Style Sheets: The Definitive Guide.) Do not, however, fall into the trap of assuming that a feature in the OEBPS will be available to you on all reading systems. For example, the OEBPS allows specification of color and background-color, but not all reading systems display color.

Be prepared to alter some of your design thinking as you see the results of your design; this is normal, and it is usually much easier to alter the look of an eBook designed with CSS than a print publication, even when the eBook production process is in progress or complete. (The exception is adding a new type of markup, which can be difficult; this is why it is so important to consider all the different design elements that lurk in your book before you produce your eBook.)

You absolutely must consider a hyperlinking strategy for your data also. What should your table of contents look like? Does it link anywhere in addition to the beginnings of chapters? Do you have floating elements (art, tables, footnotes) anywhere? How do you want them linked to, and from where? (If they are not called out anywhere, then you have a problem, and may wish to consider some editorial changes to your data.) Do you want them inside the normal “paged” flow of the book (if so, where precisely?), or only accessible by link? If they are only accessible by link, how will the reader return from them? If you are unsure about linking strategies, your vendor should be able (and willing) to explain your options and help you choose. If links must be placed by hand—and many must—expect to pay for the extra effort.

Indexes are a special problem. OEBPS Publications do not naturally preserve print page breaks, for the simple reason that they are designed to reflow according to different reading systems, screen sizes, and font sizes. If you want print page breaks preserved, and especially if you want to reuse your print index, you must ask your vendor to do these things, and you must realize that they often require a great deal of time and effort.

Since eBook reading systems typically have text-search capability as well as hyperlinking, the lack of a professionally-prepared index is not quite as fatal as might otherwise be expected. Still, the current inability to index eBooks is a weakness, and if indexing tied to page-break tagging helps, bite the bullet and pay to have it done. (Some genuinely far-sighted publishers will bite the bullet and pay to have an indexer insert index tags directly into their data.)

Can I give an OEBPS Publication to a vendor that typesets SGML or XML, in order to typeset a print book?

You can try. You might even have good luck with it. In general, however, OEBPS Document markup is not rich enough (or, sometimes, straightforward enough) to typeset well except for the very simplest of books. If you want a genuinely XML-based workflow, by all means implement one, but using OEBPS as your XML base is likely to be unwise. Obtaining OEBPS from an XML/typesetting workflow based on a richer XML content model should work nicely. You may also wish to consider paying your vendor to enrich your OEBPS Publication data to be suitable for typesetting; such enriched data may also prove excellently archivable.

Can I typeset from an OEBPS Publication myself?

Some commonly-available (that is, free or priced for the individual user, not for a production context) software exists or is in development that typesets from XML. Much of this software uses some flavor of the typesetting software TEX to do the heavy lifting, and must be fed a specific XML tagset (to which, of course, your own XML must be transformed). Other typesetting software exists that is based on specific DTDs (DocBook is a favorite, due to widespread use in computer documentation). Typically, these use DSSSL (an SGML stylesheet language, much richer and more complex than CSS) and James Clark’s jade software (or a derivative thereof) to output PostScript and/or PDF. Similar software is in development based on the XSL stylesheet language created by the W3C to work with XML.

This software would be in theory equally usable for all XML documents, but in practice, poorly-conceived or poorly-executed XML is unlikely to typeset well, no matter how sophisticated the typesetting software. OEBPS, unless it is extraordinarily well-constructed, is not a good candidate for these approaches. Again, a decent OEBPS base dataset could be enhanced and/or transformed in order to be typeset. You will have to decide whether this is the most economical or practical approach.