The Project Gutenberg FAQ 2002
Chapter 2
Where a paragraph begins on a new page, you should put the page number at the start of the paragraph, as:
[149] With the extinction of the dinosaurs . . .
V.100. Should I keep Tables of Contents?
Yes, but just keep the contents themselves, and not the page numbers for each chapter or section, except where you have kept the page numbers in the whole text. When you have removed the page numbers from the book, it doesn't make much sense to leave them in the TOC.
Here, for example, is a typical TOC. In the original text, each chapter had a page number beside it:
THE DUKE'S CHILDREN
CONTENTS
1 When the Duchess was Dead 2 Lady Mary Palliser 3 Francis Oliphant Tregear 4 It is Impossible 5 Major Tifto 6 Conservative Convictions 8 He is a Gentleman 9 'In Media Res' 10 Why not like Romeo if I Feel like Romeo? 11 Cruel 12 At Richmond
Note that I have indented the lines here, to give a sign to automatic converters that these lines should not be wrapped into one paragraph.
V.101. Should I keep Indexes and Glossaries?
If you are working from a pre-1923 publication, then yes.
If you are working from a modern reprint, you must be careful not to take any of the text that might have been added by the modern publisher. If you have any doubt about whether the index or glossary was part of the original printing, you should leave it out. Often with reprints, under your Clearance Line [V.37], you may see an instruction not to use indexes. In such cases, or if there is any doubt at all, don't.
V.102. How do I handle a break from one scene to another, where the book uses blank lines, or a row of asterisks?
Use a blank line, followed by a line of 3 or 5 spaced asterisks or dashes, followed by another blank line.
In a printed book, where the point of view switches from one character to another, or some other break in the narrative is made without a new chapter or headed section, the publisher will often denote the break just by a couple of blank lines. This gives the reader a cue to notice that the point of view has switched, and avoids confusion.
However, a printed book cannot be edited or changed, while an eBook will be edited and converted over its lifetime, and it is likely that if you denote this break just by a couple of blank lines, as in the book, your break may be lost. For example, in automated conversion to a PDA reader format, it is common to merge multiple blank lines into one.
In making a PG e-text, you _may_ indicate this break by a couple of additional blank lines, but, if your text is later converted into another format such as HTML, the extra blank lines may get lost in the editing or rendering. Or the person doing the conversion may simply think that the extra blank line was a mistake, and remove it. To guard against this, you should add an unambiguous visual break such as a line of spaced asterisks:
* * * * *
The exact layout of your break is not really important, and you can use whatever format you prefer. Blank line followed by five spaced asterisks followed by another blank. Or you could use two blank lines, and dashes instead of asterisks. Just make sure that future readers can be in no doubt that you intended to indicate a break that was really in the original printed text.
V.103. How should I treat footnotes?
In a printed text, the most common treatment for footnotes is to put them at the end of the page to which they refer. Sometimes, editors gather them all at the end of the book. Footnotes are a real formatting problem for an eBook without defined physical pages; there is no agreement between readers about which is the best way to render them.
There are three basic ways of rendering footnotes in an e-text:
You can insert them right into the text, in brackets, at the point in the paragraph where they occur, with or without an indication that they were originally footnotes. This is only reasonable in a text with very short footnotes.
You can insert them after the paragraph to which they refer, either contiguous with the paragraph or as a new "paragraph" of their own, as I am doing with this one. If the text contains any footnotes longer than a line, [1] you should not try to just append them to the paragraph; you should make a new "paragraph" of them, with a blank line before and after.
[1] Some footnotes can go on not only for several lines, but for several pages!
You can gather all footnotes at the end of the e-text, or to the end of the chapter to which they refer.
Of these three, gathering all footnotes to the end of the chapter or the end of the whole text is probably the friendliest option, since it preserves the original intention of allowing the reader to continue reading the main text without interruption. However, it may involve some renumbering and general note-keeping on your part, and may not be needed where there are only a few short footnotes. You can see an ideal example of this kind of footnote marking in our edition of Darwin's "The Voyage of the Beagle", file vbgle10.txt from 1997, Etext number 944, which you can get from: <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext97/vbgle10.txt>
V.104. My book leaves a space before punctuation like semicolons, question marks, exclamation marks and quotes. Should I do the same?
No.
If you look closely at these "spaces", you will see that they are not as wide as a normal space--they tend to be half to three-quarters as wide. These don't actually represent spaces as such; they were just a convention used by typesetters to make the text feel less cramped, and they did not express any specific intent on the part of the author.
OCR software tends to see them as full spaces, and one of the jobs you typically have to do when editing a text that has been OCRed is to remove them.
In some texts, this also happens following an opening quote, so your OCR might read a sentence as:
" Hello ! How are you to-day ? "
which you should correct to:
"Hello! How are you to-day?"
Samples of this can be seen in the images used for the FAQ "Why am I getting a lot of mistakes in my OCRed text?" [S.17]
V.105. My book leaves a space in the middle of contracted words like "do n't", "we 'll" and "he 's". Should I do the same?
Unlike the pseudo-spaces before punctuation, these really were intended as spaces indicating the break between words--that is, where we would nowadays contract two words into one, the author or editor has made the contraction, but left them as two separate words.
Since this effect was intended, it is usual to leave the spaces in. Some people who really do n't like this style of spelling do remove them, but generally volunteers want to preserve the text as printed.
V.106. How should I handle tables?
Just line up the information neatly in columns. If you use a non-proportional font [W.5] you will be able to do this reliably. You can also use the dash character "-" , the underscore "_" and the pipe character "|" to make borders if you really need to, but it's usually better to omit them. It is, though, often good to indent your table a little, to set it off from the main text, and to avoid the danger of having it automatically wrapped by some converter later. For example, from "The Albert N'Yanza, Great Basin of the Nile" by Sir Samuel White Baker:
TABLE No. 1.
Table for Increased Reading of Thermometer, using 0 degrees 80 as the Result of Observations for its Error.
Month. 1861. 1862. 1863. 1864. 1865. January. . . -- 0'143 0'314 0'487 0'659 February . . -- '157 '328 '501 '673 March . . . 0'000 '172 '344 '516 '688 April . . . '014 '186 '358 '530 '702 May . . . . '028 '200 '372 '544 '716 June . . . . '043 '214 '387 '559 '730 July . . . . '057 '228 '401 '573 '744 August . . . '071 '243 '415 '587 '758 September. . '086 '257 '430 '602 '772 October . . '100 '271 '444 '616 '786 November . . '114 '285 '458 '630 0'800 December . . 0'129 0'300 0'473 0'645 --
V.107. How should I format letters or journal entries?
Make them look like they are in the printed book. If the signature is indented in the book, indent it in the letter. For example:
"Sir, No consideration would induce me to change my resolve in this matter, but I am willing to engage your services as my agent for a fee of 100 pounds. "H. Middleton"
When a letter appears in the middle of lots of prose, using shorter lines for the letter is an effective way of making the letter stand out, without resorting to indenting the whole thing.
When the book is largely composed of letters or entries, as happens in an epistolary novel or the publication of somebody's letters or journal, you might reasonably leave two or three (but whichever you choose, keep it consistent throughout the book!) blank lines between entries to give the reader a visual cue that the next is not just a new paragraph, but a new entry, for example:
10 pm.--I have visited him again and found him sitting in a corner brooding. When I came in he threw himself on his knees before me and implored me to let him have a cat, that his salvation depended upon it.
I was firm, however, and told him that he could not have it, whereupon he went without a word, and sat down, gnawing his fingers, in the corner where I had found him. I shall see him in the morning early.
20 July.--Visited Renfield very early, before attendant went his rounds. Found him up and humming a tune. He was spreading out his sugar, which he had saved, in the window, and was manifestly beginning his fly catching again, and beginning it cheerfully and with a good grace.
I looked around for his birds, and not seeing them, asked him where they were. He replied, without turning round, that they had all flown away. There were a few feathers about the room and on his pillow a drop of blood. I said nothing, but went and told the keeper to report to me if there were anything odd about him during the day.
11 am.--The attendant has just been to see me to say that Renfield has been very sick and has disgorged a whole lot of feathers. "My belief is, doctor," he said, "that he has eaten his birds, and that he just took and ate them raw!"
11 pm.--I gave Renfield a strong opiate tonight, enough to make even him sleep, and took away his pocketbook to look at it. The thought that has been buzzing about my brain lately is complete, and the theory proved.
This is different from the case mentioned in the FAQ [V.102] "How do I handle a break from one scene to another, where the book uses blank lines, or a row of asterisks?". In that case, we added a row of asterisks because future reformatting or conversion could cause confusion about the scene break that was explicitly signalled by the blank lines on paper. In this case, each new letter or journal entry cannot be mistaken by a careful reader, so we don't need asterisks or dashes to signal that; we're just adding a bit of extra space to make it more readable.
V.108. What can I do with the British pound sign?
The British pound sign cannot be expressed in ASCII, but is very common in the works of English novelists. It evolved as a stylized version of the letter L (from the Latin "Librii"), and it's entirely appropriate to represent it as such, either like:
The horse cost L8 12s. 6d.
or
The horse cost 8l. 12s. 6d.
This works particularly well where an amount is expressed in pounds, shillings and pence (Librii, soldarii, denarii).
Where there is a simple number of pounds, you may prefer just to use the word:
She was a handsome widow with 500 pounds a year.
V.109. What can I do with the degree symbol?
Just type out the word "degrees" or the abbreviation "deg."--for example:
By the time we reached Cairo it was 115 degrees in the shade.
Geographical degrees are more awkward, but should be handled the same way:
It was at 30 deg. 15' E, 14 deg. 45' N.
In general, any symbol can be represented in words.
V.110. How should I handle . . . ellipses?
Just as I did above . . . and here! Leave one space before and after each dot. Do not break an ellipsis over the end of a line. In principle, an ellipsis is one symbol, like an em-dash, and should not be broken at line end.
A special case arises when an ellipsis follows a sentence instead of being in the middle. . . . In this case, put the period after the last letter of the sentence, as you normally would, then follow the usual format for ellipses. You end up with four dots, with spaces everywhere except before the first.
V.111. How should I handle chapter and section headings?
For a standard novel, you can choose either four blank lines before the chapter heading and two lines after, or three lines before and one line after, but whichever you use, do try to keep it consistent throughout.
Normally, you should move chapter headings to the left rather than try to imitate the centering that is used in some books.
V.112. My book has advertisements at the end. Should I keep them?
Most people seem to think "no", and "no" is the safe choice, but opinions vary.
The typical arguments are: "The ads are not part of the author's intent, so you should remove them." vs. "They give a flavor of the original book, so you should keep them". This latter is particularly cogent when the ads are for other books by the same author.
Decide which of these statements best fits your own views in the case you're looking at; after that, it's up to you!
V.113. Can I keep Lists of Illustrations, even when producing a plain text file?
Yes. As in the case of the Table of Contents, there is no point in including page numbers when your text doesn't have them, but the list of illustrations itself may go in.
V.114. Can I include the captions of Illustrations, even when producing a plain text file?
Yes.
You can format them as short paragraphs of their own, in brackets, with the word Illustration: followed by the caption, something like:
[Frontispiece: A Flash of Light]
or
Don't interrupt a paragraph to insert one, unless the reader really needs to know that the original illustration was in the middle of the paragraph; place the note between paragraphs instead.
V.115. Can I include images with my text file?
Yes, as I have done with the zipped version of the plain-text format of this FAQ, but in general it makes much more sense, if you want to include images, to make a HTML version of the book and include them there, where they are anchored into the text in a predictable way, and leave them out of the text version. But there are exceptional cases, such as this--I included images with this plain-text FAQ because I wanted you to be able to experiment with them using your own OCR package.
If you do include images with plain text, they will be included with the ZIP file, but not downloadable separately with the plain text file; for example, if your file gets named abcde10.txt, and you include images pic1.gif, pic2.gif and pic3.gif, then abcde10.zip will include all four files, but only abcde10.zip and abcde10.txt will be posted, so the images will be available only within the zip file, so, even if you are including images, don't assume that the reader will be able to see them.
If you do include images with plain text, be sure to mention them by filename in a note at the appropriate places in the text file; otherwise readers may not even realize they're there. For example:
If you do include images with a text file, don't make them too big. Readers downloading zip files of plain text expect them to be relatively small; don't burden them with huge downloads they don't want. Use the same kind of rules and processing that you would for a HTML file, or better still, include the images only with the HTML version.
About formatting poetry:
V.116. I'm producing a book of poetry. How should I format it?
Make it look like the original.
The only formatting change that you might consider is to limit the amount of centering. Often, in a poetry book, the title of a poem may be centered, when the body of the verse isn't. This can work on paper, particularly when the page is narrow, but "centering" the title on a 70-column line can mean that the title ends up far to the right of the body of the poem, which looks untidy. And even if you center the title correctly over the body of _this_ poem, the next poem may have longer lines, and so _its_ title may not have the same center as the first poem, and the title of one will be off-center with the title of the next!
If you have this kind of formatting in your book, you should consider moving all of the poem titles to the left margin rather than try to keep compensating for different line centers. It's more consistent, and easier to read, if you just left-align all titles. To see a not-quite-successful attempt at centering the titles over the poems, take a look at the Poems of Emily Dickinson, available from <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext00/1mlyd10a.txt>
In that case, it would have been better to left-align the numbers and titles. Centering isn't really an effective formatting choice in etexts.
V.117. I'm producing a novel with some short quotations from poems. How should I format them?
As nearly as possible like they look in the book, with the exception that you should indent the whole verse anywhere between 1 and 4 spaces from the left. This is to give a signal to automatic conversion programs that these lines should not be wrapped.
For an example of a novel with many differently formatted quotations embedded, see the "a" version of Clotel, file clotl10a.txt, Etext number 2046, from the year 2000, which you can find at <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext00/clotl10a.txt>
Some of these quotations touch the left-hand column; today, we would think it better to insert at least one space before every line.
About formatting plays:
V.118. How should I format Act and Scene headings?
Pretty much like chapter headings. You can use 4 blank lines between acts, and 3 blank likes between scenes, or 3 between acts and 2 between scenes. If your book has "END OF ACT/SCENE" footers, leave them in the etext.
You may center act/scene headers and footers if they are centered in the book, but it's usually best to left-align them, for the same reasons it's usually best to left-align poem titles in poetry.
V.119. How should I format stage directions?
Generally, in brackets.
In printed texts, it is common to show stage directions as italics inside brackets. You don't have the option of italics in plain text, and you shouldn't need to use _underscores_ or /slants/, and certainly not CAPITALS, to indicate italics for stage directions. Normal text within the brackets is all you need. It will be immediately clear to a reader that bracketed text consists of stage directions.
[Square brackets] are most common for stage directions, but (round) or {curly} brackets will work too, if there's a reason why they are preferable in the case of your text. Just make sure that you use the same kind of brackets consistently and only for stage directions--don't use round brackets for stage directions if characters' speeches also contain text in round brackets.
Some printed plays follow the convention of not closing brackets when the direction is at the end of a speech or scene. For example: [Exeunt.
Where the book doesn't close the bracket in a case like this, you shouldn't either.
V.120. How should I format blank verse?
Just like normal verse in poetry. Make it look like the printed book. Left-align it, and make one line of etext the same length as one line of print.
Sometimes in blank verse, a speech may start mid-line, and the print reflects that by leaving a space on the left, and starting mid-way. In a case like that, do the same in the etext.
About some typical formatting issues:
V.121. Sample 1: Typical formatting issues of a novel.
Look at the image novel.tif. It shows a page of a novel, with several typical formatting decisions to be made.
We note that there is no end-quote on the first paragraph, but that's OK, since the second paragraph is a continuation by the same speaker, so the first paragraph doesn't need a closequote. There is also an italicized "I", which will end up with underscores, but there is nothing else to give us any difficulty.
In the second paragraph, we have an ellipsis, an italicized French word with an accented letter, the British pound symbol, and an italicized "Here".
The ellipsis is simple.
Let's assume we're making this into a 7-bit text, so we're going to convert the non-ASCII character a-circumflex and the pound sign. The a-circumflex just goes to an "a", but we have several choices we can make about the pound sign.
The italicized "Here" is clearly for emphasis, so we will mark that up. The word "flaneur" is italicized because it is not English, but possibly also for emphasis . . . if the sentence had read "The Major is a _fool_", with the word "fool" italicized, it would clearly be emphasis. As it stands, we don't know whether emphasis is intended. This doesn't matter if we are just using _underscores_ or /slants/ to render italics, but if we use CAPITALS, we're going to have to impose our best guess on one side or the other.
The third paragraph shows some vaguely familiar squiggles--Greek letters! We hit the PG transliteration guide at <https://www.gutenberg.org/vol/greek.html> and spell it out . . . rough-breathing upsilon = hu; beta = b; rho = r; iota = i; final sigma = s. So the Greek word transliterates as "hubris". Since hubris is a familiar word, we don't need to make a fuss about it, though we may _italicize_ it.
We then have a note, which we will format a little differently from the main text to help it stand out, and a new chapter heading.
We should certainly indent the second line of the Byron quotation to preserve its original form, but we have the option whether or not to indent the first line a little to signal to any future automatic converter that this is not to be rewrapped.
In the first paragraph of the new chapter, we need to get rid of the hyphenation of "Wentworth" at line-end and fix the two em-dashes.
In the second paragraph of the new chapter, we have a long dash between "d" and "l", clearly meant to denote "devil", so we will fill it in with three dashes, and we see a three-em-dash after "Lord H", so we can use six, or possibly four, dashes for that.
Finally, we have a table, a list of money values against names.
Depending on the standards we've chosen to use throughout the book, we could render these details in a variety of ways. For illustration, here are two acceptable possibilities:
"I shall go down to Wokingham", said Middleton, "a few days before the election, and the Major will stay here. I understand that there will be no other candidate, and _I_ shall take the seat.
"The Major is a . . . _flaneur_. He has no interest beyond his own advancement. I can buy him for a hundred pounds. _Here_ is his answer."
Wallace wondered at the _hubris_ of his friend, and examined the note Middleton thrust upon him.
"Sir, No consideration would induce me to change my resolve in this matter, but I am willing to engage your services as my agent for a fee of 100 pounds. H. Middleton"