I thought it might be interesting to walk through the process I use for creating ebooks that can be read on the iPad, Kindle, and nook (as well as the Kindle Fire, nook color, iPhone, BlackBerry, any Android phone or Android tablet such as the Xoom or Samsung Galaxy tablets, Sony Reader or other ereading device, or a conventional computer monitor). In other words, how to make an ebook that can be read on any device.
This note is about mechanics of creating a book that works on any device (or as many devices as possible) and steps through the specific issues I encountered and the resulting choices/decisions/compromises I made to get there.
This piece was written in November 2011 and reflects state of the market/tools available/process I followed at that date (in particular, it was written after the Kindle Fire/Kindle Format 8 had been announced, but before either had been released into the wild). In addition, some of the initial research (particularly as it related to the specific requirements relating to the Kindle).
The Process to Create eBooks
As a first step I want to outline the process I use. If you just want to find a way to create ebooks, you can stop reading at the end of this section. However, I will then carry on and talk about some of my reasoning for some of the actions I took and how I addressed the challenges that arose.
What Am I Creating?
The process I am outlining in this note is the one I used to create the books in the How to Make a Noise series.
If you want to look at an example of the end results of the process I'm setting out here, then I suggest you check out How to Make a Noise: Frequency Modulation Synthesis or How to Make a Noise: Sample-Based Synthesis (as they are most the representative of the process). Both are available from Amazon, Apple's iBookstore, Barnes & Noble, and the Sony Reader Store (for the very reasonable price of $2.99 plus any sales tax which may be levied in your jurisdiction and any other charges the bookstore may raise).
I am producing are non-fiction, instruction books which look at at creating sounds with synthesizers. In terms of production and presentation, these are not simple, straightforward books like a novel might be—they're not just a bit of text with the occasional bolding, italic, or chapter heading—but instead are more like mini-wikis with the following elements:
- Text. Of course... these are books.
- An extensive table of contents to assist navigation through each book.
- Internal links to cross-refer to relevant information in other parts of the book. This is particularly important since my readers tend to use my books as a reference source, and go straight to a specific point so I can't be certain they will have read all that comes before (or after). With links, I can point them to related material. A typical book may use several hundred (up to 300 or 400) links.
- External links which are visually differentiated from internal links (not least since they will not work if there is no internet connection).
- Graphics—in color and as high quality as they can practically be. These graphics are a combination of edited screen grabs and (computer) drawn illustrations. A typical book might have up to around 100 images.
- Different fonts to indicate headings, sub-headings, figure notes, and so on.
The books are only available in electronic format. There is no index as this is unnecessary with the extensive table of contents and full-text searching.
In creating these books, I have aimed to create something that is readable on any device, irrespective of its power or screen—so these books can be read on a small screen (such as a phone), a tablet sized color screen (such as an iPad, nook color, or Kindle Fire), a eInk screen on a low-power device (such as a Kindle, nook, or Sony Reader), and a computer monitor screen. It is important that the layout and the graphics work equally well on every format.
The process I'm outlining can be applied equally to books with more straightforward requirements, such as works of fiction where you are unlikely to have the number of images nor the heavy reliance on internal linking. With a more straightforward layout, the production process will be swifter.
For drafting text, my prime tool is Microsoft Word.
I use Word because I am very familiar with it—I have been using it for years, am comfortable with how it works (despite its quirks), and find it completely reliable. Also, for drafting text, it allows me to do everything I need it to do.
Another reason for my continued use of Word is that it produces the documents in the standard format for the publishing industry. When I work with a publisher, they expect text to be submitted in Word (.doc) format. For me, there are advantages to having all of my work in a single format.
While Word has the functionality to do more, I only use it for the creation of text. I do not use it as a layout program nor to handle graphics. Indeed, there are significant downsides to using it for these purposes.
In drafting the text, I make extensive use of styles. In particular, I am rigorous about using heading styles which I use as the basis for generating the table of contents and as the anchor points for (most) links. The table of contents and all hyperlinks—both internal links and external links—are created within the Word document.
In parallel with writing the text, I also create the graphics, but outside of and separately from Word.
My graphics are predominantly from two sources.
- screen grabs, and
- line drawings where I create the artwork myself.
In addition I also use a range of photographs and stock images.
The main tools I use to create (and manipulate) my images are:
- Snagit from TechSmith. Snagit is a screen capture program that I use when I want to grab the image of an interface or similar.
- Adobe Photoshop. I use Photoshop for cutting up and editing photos and screen grabs. As well as cropping, I will often adjust the tone, brightness, and hue of images to make them work within the context of an ebook.
- Adobe Illustrator. I use illustrator to draw images (and add any associated text).
There's no particular magic about the programs that I use. I use them because they can produce professional results and because I understand how to use the tools (familiarity trumps cutting edge). Equally good results can be achieved with other dedicated graphics tools (including open source tools such as GIMP). The key issue with any graphics program is to ensure that the graphic images are output in a suitable format (in terms of file type, file size, image size, and image quality). I discuss these issues in the section Challenges with Graphics.
Once the graphics have been created, the individual image files remain separate—I do not import them into the Word file. Instead, the graphics are included at the HTML stage.
Convert to HTML
Having written and edited the text, and created the images, the next step is to convert the text to HTML (HTML is the file format behind web pages and can be previewed in web browsers). In many ways this might be seen as an interim step, however, from my perspective, this is a crucial step and is part of the quality control for creating a file that works as the reader would expect.
There's also another important reason for this step: ebooks are effectively web pages. By converting to HTML I am able to preview the book as it will look, but still have the flexibility to keep editing.
The first step, of course, is the conversion to HTML. For this, I use a tool called Word Cleaner. My reason for using this tool is that the HTML output from Word is virtually unusable. By contrast, Word Cleaner gives me something that is close to perfect for my needs and the tool can also automatically add a few tweaks that are useful when creating the ePub. As I say, it is close to perfect, but the text still needs some review here.
However, this step isn't simply a conversion from one format to another. This is also the place where I link up my images (with the appropriate positioning and padding). Because of the way that HTML works (explained in the section What is an HTML File?) the images remain separate, but can be previewed in their final context.
For editing the HTML file and adding images, I use a dedicated HTML editor, WeBuilder. There are other options, but I like this tool. I have been using it for many years and have found it powerful and reliable.
The final stage for me, is to take the HTML file (and associated image files and style sheets) and create an ebook.
Creating the eBook
I create files in ePub format using Sigil. The conversion process requires that I open the HTML file in Sigil and save it in ePub format—that's all there is to conversion.
Once the book is converted, then it can be tweaked in Sigil. In this context there are several operations that I perform with Sigil:
- The first change I make is to split the book into chapters. When I put a book into HTML format, I automatically add chapter split markers with Word Cleaner—the chapter splitting exercise then becomes a button press in Sigil.
- Sigil allows me to add and edit a book's metadata—the data that is embedded within the ePub file and so travels with the book. This can include the ISBN, copyright information, publisher details, a summary of the book, and more.
- There are two types of tables of contents in ebooks. The first is included within the body of the text, very much like a conventional table of contents but with hyperlinks. The second is what is called an ncx file (Navigation Control file for XML). The first type of table of contents is necessary for Kindle books and is created by generating a conventional table of contents in Word. The second type is the table of contents that appears in most ereader devices (such as the Sony Reader or iBooks) when you select the table of contents option and is a far more elegant way of navigating books. Sigil will generate an ncx file from the headings within the book.
- Sigil will also perform a range of validation tests to confirm that the ePub file is valid (and will therefore be acceptable for the iBookstore and the other leading retailers).
That's the process that I use to create ePub files. If you want to know more about my reasoning and the difficulties I encountered while creating ebooks, keep reading.
Creating the File that the Reader Sees
In reality, there is only one sensible choice for the format of the end product: ePub. This is the ebook standard format used by Apple (through the iBooks app on the iPad and the iBookstore), Barnes & Noble (on the nook range and their online store), and Sony (on their Reader devices and store), among others.
ePub is an open and widely used standard. The only major retailer/device that doesn't use the standard is Amazon on their Kindle range. However, Amazon will accept files in ePub format which they convert to their proprietary format (and you can also convert ePubs using Amazon's KindleGen tool which creates Kindle-compatible files).
Also since ePub is a very clean standard, the files will convert well with other conversion tools such as Calibre if a reader wants to read the book on another device.
What Goes Into an ePub File?
I've already talked about creating ePub files, but let's look at the elements within the ePub package. Getting over-simplistic, these elements are:
- First, and most obviously, there is the text of the book. Reading devices are simple, low-powered devices and accordingly in order for a book to render well, it is necessary to split books so that each individual chapter is a separate document. Each separate chapter is held within the ePub file.
- There are the illustrations.
- There is the table of contents for the book. This is more than simply a list of chapter—although you might want to include that separately. Instead, this is the ncx table of contents that shows up in reading devices when you hit the table of contents button. This allows readers to go to the table of contents on a button click and then to find their desired location from the table of contents. At the time of writing, the Kindle implementation of this functionality is weak.
- There is the manifest which essentially details all of the elements in the ePub file and works as a playlist to serve the content to the reader in the appropriate order.
- There is the metadata (the data about data). This contains the details of the book—such as the title, author, ISBN, publisher, copyright notice, book description, categorization, and so on. The advantage of embedding this data is that the data travels with the book and is immediately accessible.
- Lastly there is the wrapper—the thing that holds everything together. This is just a zip file—if you take an ePub file and change the file suffix from .epub to .zip, then you'll find you have a zip file which (provided the file isn't copy protected) you can open and look around. If you're going to mess around with files like this, it's probably best to work on a copy in case you make an unintended mistake.
There are other elements, but these are the key parts we need to think about.
It is quite possible—and in many ways very practical—to import plain text into an ePub and add the formatting in an ePub editor (such as Sigil—indeed, you can write a book in Sigil, if you want). However, if your book has a lot of formatting and other information (such as internal links and links to websites)—and my books have a lot of formatting and a large number of links—then it is more practical to create the text with the formatting and links and to import that text into the ePub editor/creator. I have already outlined the process to create the text in Word and then convert the text to HTML.
HTML is the format of most web documents. It is also the underlying format for documents within ePub files (... well, if you want to be strict, the files are held in XHTML format, but that's close enough to HTML for our purposes). Since HTML is the format underlying ePub, when an HTML document is loaded in Sigil there is no conversion to the underlying document (except for a few minor tweaks, I believe). This lack of conversion means that the document, the format, the layout, and so on will not be corrupted as the document migrates from HTML to ePub.
What is an HTML File?
An HTML file can be viewed in a browser (such as Internet Explorer, Chrome, or Firefox). One huge advantage of including an HTML step in the creation of an ePub file is that the HTML file can be previewed in a regular browser and any errors easily tweaked.
An HTML file is a plain text file (in other words, all it contains is text and it can be opened in any document editor including Notepad or Word, however, it is easier to edit in a dedicated HTML editor).
When looking at HTML, there are three elements:
- the HTML file
- the style sheet, and
- the graphic images.
Let's look at these in reverse.
The images are individual image files which are stored separately from the HTML file. Since these images will be rendered in a book reader, they need some special care which I'll talk about in the section Challenges with Graphics.
The style sheet is a (plain text) file which defines a page's layout and the fonts. In this file you can define the page margins, the body text font, the heading font, weight, color, indents, and so on.
You can include your styles within your HTML file, but that loses a key benefit of using external style sheets—the ability to link several documents to the same style sheet. By having a common style sheet (or sheets) linked from every HTML document you can change a style once and it will be consistently reflected in every document.
The HTML file performs several functions:
- First, and most importantly, it contains the text.
- That text is marked up to indicate where styling (headings, bolding, italics, and so on) should be applied.
- It includes the link to point to the appropriate style sheet(s).
- It points to the images, and places them in the appropriate location within the document (the style sheet then determines how the images are rendered).
And because an HTML file is a plain text file, every parameter is exposed and can therefore be edited/finessed.
Advantages of HTML
There are, however, many advantages to passing text through the HTML stage. Having converted my book into HTML format:
- I know that I have clean text with any unwanted rubbish removed. This means I can have a high level of certainty that there will be no problems with my final file since all that is included is what I want to be included—there is no extraneous information added by Word or any of the other tools I have used. (Most word processors store information within their files whenever you cut-and-paste, make changes, import files from other programs, and so on. These additions are useful when editing but can all cause problems. However, these unwanted elements are not present in HTML files.)
- My book can be previewed in a web browser. This allows me to (1) see how the book will look, (2) view my images in context, and (3) check that all of the links (internal, including the table of contents, and external) are working as intended. It is much easier to check these details in a web browser than it is to review them in an ereading device.
- Since HTML is a plain text format, I can be fairly confident that the text will be accessible in the future. A Word document (for example) requires a version of Word (or another word processor) in order to open the file. If a current word processor file format changes, then I will be relying on future software developers to include some sort of import functionality in order to access my files. HTML as a plain text format should always be readable.
- Lastly, since the HTML file is a plain text file, there can be no embedded viruses.
You Don't Get What You Specify
It might seem that all you need to do is to specify a few fonts and your book will be presented exactly as you want. Unfortunately, that is not the case.
Many ereader devices allow the reader to override the default fonts. This is a good thing—it allows the reader to choose the font they find easiest for reading and to select a font size, line spacing, and margins that are comfortable for their eyes.
As well as user-override, there is also device override. One example of this comes with the current Kindle devices which offers the reader a choice of three fonts (regular, condensed, and sans serif).
Another example of device override comes with the range of fonts. As I have said, ePubs are effectively web pages and the way web pages work is that they use fonts that are present on the viewing machine. In other words, if you specify a font for you web page/book and that font does not exist on the viewing device, then that font cannot show. Repeat: cannot.
With a web page, the way to address the font limitations is to specify a list of fonts, so instead of simply specifying (for instance) Garamond, you may specify Garamond, Palatino, Georgia, Baskerville, Times New Roman, and Times. The web page will then use the first specified font that it finds on the viewing device.
This principle applies equally to ePubs where you can specify a list of fonts, which may or may not appear on a reader. Towards the end of your list, you can always specify generic fonts (for instance serif and sans serif) in order to try to get some form of differentiation.
Technology is changing: websites can use web fonts and so bypass the viewing computer's limitations and ePubs allow you to incorporate fonts into your ePub file. However, some distributors strip out embedded fonts (and that's before we consider the copyright implications of dealing embedding fonts). Also, even if you can include fonts, then you cannot be sure the font will work on a reader's reading device.
Therefore trying to delineate different areas of text simply with different fonts—for instance using a serif font for the body text and a sans serif font for the headings—may not work. To delineate headings, figure text, and so on—and to attempt to make clear that it is different from body text—I use a combination of size, different font faces, and bolding. As far as I can tell, the font size and bolding will usually carry through, even if the font faces often get lost.
As a side issue, some styling gets partially lost. For instance, on the current iBooks software (on the iPad and iPhone) the body text can be overridden by the user. However, the differing styling that I have specified for the headings still remains. Now of course, this is what happens at the time of writing—all it needs is a software update and this behavior will be changed.
Another issue to consider—particularly with the Kindle—is that ereaders don't always present the layout as defined. For instance, the current Kindle format does not allow you to right-align images and then flow text around the images.
Challenges with Graphics
There are several guiding principles in creating graphics intended as an integral part of an ePub:
- The quality of the image should be as high as possible.
- The file size of the image should be as small as possible.
Those two factors are not mutually exclusive—but there is a compromise between the two and generally smaller file sizes equates to a reduced quality. Therefore, the optimum approach is probably to set a minimum quality and accept the file size that comes with that quality.
One key quality consideration is to ensure that once created the image is not reprocessed. When an image file is compressed to extremes (which is what we're doing here), the form of compression is lossy, in other words, data is discarded. If the image is reprocessed and recompressed, then in effect you will be throwing away data from an already data compromised file. This will have the result of degrading the quality much further which means that if you have achieved the perfect balance of quality and size, by reprocessing, the quality will be lost (and the file size may be increased).
Dropping an image file into a word processor such as Word will see the image file reprocessed. Several reprocessing transformations may then be applied to the image by the word processor:
- The image file may be converted into a different format. This reduces the quality of the file and is also counter-productive. As I will explain, for my books I made a clear choice about the image file format in order to ensure that the file was in a format that could be presented to the reader. By converting the image format, all that will happen is that the image will be converted back to the original format at the end of the process (thereby necessitating yet another conversion and every conversion leads to a degradation of quality).
- The file may be renamed (which is not a problem in and of itself, but is a nuisance if you're trying to keep track of your files.
- The compression (to balance the file size against quality) will be lost, usually meaning that the quality will be wrong. Alternatively, the file size will be larger than it needs to be.
For all of these reasons, I keep my images separate from the text until I reach the HTML stage where the image is simply referenced by the HTML file (in other words, the raw image is used without any processing).
Graphics: challenge #1—format
The ePub format supports images in a range of formats: JPG, GIF, PNG, and SVG. The differences between the formats and the benefits of each are irrelevant. The key issue here is that Amazon (at the moment) only support images in two formats: JPG and GIF. Therefore, if you create your images as PNG or SVG files, they will definitely be converted at some stage of the process.
As a result, I created my graphics in GIF format.
The reason for my choice is the degree of control in compressing image files. JPG and GIF compression algorithms are both lossy.
With small amounts of compression, there is no noticeable visual effect following the compression. However, when large amounts of data are thrown away, the compression is very noticeable. As a personal preference, when dealing with extreme compression, I generally prefer the look of images that have been compressed with the GIF algorithm. Beyond that, one of the ways that the GIF compression algorithm works is by limiting the number of colors in an image. The GIF compression algorithm allows for colors to be removed one at time so you can ensure that your file is only as small as it needs to be.
Graphics: challenge#2—file size
I've talked a bit about balancing the quality and the compression, and that is a very important consideration. However, there is also a hard limit that needs to be respected.
Under the guidelines applicable at the time of writing, Amazon limit the size of any image to 128 KB. That's tiny! To give you some context: a photo may be several MB; 1,024KB = 1 MB; and the old 3.5 inch floppy disks were 1.44 MB or 1,474 KB. You get an idea of the compression that may be necessary.
However, as well as meeting the Amazon requirement, there are other reasons why you should minimize your file size:
- If you are being paid the 70% ebook rate by Amazon, you don't actually receive 70%. Instead you receive 70% less a delivery charge which is related to the size of the file. Therefore, the smaller the file—and smaller images will give you smaller files—the smaller your delivery charge and the greater your income.
- You will also be doing your reader a favor. Smaller files work better on low-powered reader devices and if the reader is paying to download the file, then a smaller file size will incur less bandwidth charges.
Graphics: challenge#3—image dimensions
The third graphics challenge is the image dimensions and here there are a lot of competing considerations.
The first consideration is the file size. The smaller the size of an image—in terms of pixels—the smaller the file size of the image. This is logical: fewer pixels equates to less data, which equates to smaller files.
However, simply using a tiny file is not the answer if you're looking to communicate detail to your reader.
There's also another complication to throw in here—you don't know how a reader will view an image. For instance, it could be viewed:
- on a phone-sized screen
- on grayscale eInk screen
- on a color tablet-sized screen
- on a 4:3 ratio screen (such as an iPad) or a 16:9 ratio screen, or on something different (such as the Kindle Fire)
- on a portrait oriented screen
- on a landscape oriented screen with or without the page being split into two columns
- on a computer monitor
For smaller screen sizes, the image size will usually be compressed to fit the screen. For larger screens, the image may be expanded to fit the margins, but may not... The only way to find how your image will be displayed is to audition some test images on as many devices as possible.
I auditioned images on many devices. For me—and my experiences and conclusions may be different from yours—the conclusions I came to were:
- The maximum width for an image should be 640 pixels. Larger images need to be cropped and compressed down to 640 pixels, but smaller images can remain at their natural size.
- Images need to be simplified. For me, this meant re-drawing a lot of images so that the text and labeling was legible on any screen size. Equally many elements had to be dropped—for instance, call-outs (in other words, text labeling specific elements on an image) do not work very well (in my opinion).
As well as giving an image size that works on all screens, the 640 pixel width also helps give a smaller file size.
The final challenge was which colors to choose, in particular for the line art. I could have gone with simple black and white images, but since many of the images were of color sources (specifically the screen grabs) it made sense to use color throughout. Added to which, with color I could easily identify the areas where I wanted the reader to look, so for instance with a graph, the axes could be gray, but the trend line could be blue or red, making it obvious where I wanted the reader to focus.
As always, there were several competing considerations to the decision:
- The first consideration was aesthetic. I wanted to choose pleasing colors, but it was necessary to ensure the colors I chose each contrasted ensuring each aspect was clear.
- The second consideration (which links in to the first) related to the simplicity. It was necessary to choose a strictly limited palette. Limiting the palette also had the side effect of forcing me to be very clear about what I was trying to illustrate and of keeping image sizes smaller (fewer colors = less data).
- Choosing color contrast was also important for two other reasons. First, it is important to create graphics that (as far as possible) work for the visually impaired. Second, these graphics may be viewed on eInk devices (in other words grayscale). Usually if you get the contrast right, then grayscale will work.
A graphic artist might have a better solution. For me, I chose some colors, tried them, changed them in light of my trials, and kept changing until I found a combination that worked within the restricted criteria I had set.
For me it is important to keep control and to have a detailed understanding of what I have created. By working at a granular level, I:
- have far more flexibility and can easily make changes in the future (it is easy to tweak an ePub in Sigil), and
- can provide the optimum reading experience for my readers.
There are many other options for creating ebooks—in particular, there is software that will output books in the appropriate format, there is conversion software, and there are services to convert books—however, for me, this process is the only way that I could keep control and provide the optimum reading experience. Beyond that there are other reasons for my approach.
- The first—and perhaps the most compelling reason—is that what I am doing is more complicated than other services can readily cope with. As one example: I use internal links within my books—that's one thing that seems quite hard for software/conversion services.
- Equally importantly, I want to see what the reader will receive, and ensure that the layout supports the information that I am trying to communicate.
- Also, but less importantly, is the cost aspect. It is expensive to pay someone to create a heavily cross-referenced ePub with many images, and the cost of any changes is disproportionate to the amount of work (which is not a criticism of these services—simply a reflection that changes require high levels of detail, and quality detail work costs).
This is the process that worked for me in the context of the style of books I write and the expectations of my readers and results in a professional quality ebook file.