Shamus Young, the guy who brought us the DM of the Rings, the Spoiler Warning Series and Stolen Pixels is currently writing a book. Like a real, dead tree novel type thing. I’m totally stoked to read it, but it seems that he has run into word processing issues recently. This entry is, more or less, a long form response to his blog post. I hope Shamus will read it and that it will help him choose the right tool for the job. I decided to put it here, rather than in his comments section because this stuff might be beneficial to you guys as well.
You see, the issues Shamus just discovered are something I have learned to route around back in grad school. Granted, I have never written a novel, but my Masters thesis was a few hundred pages long. So, you can say that I do have some experience with editing huge documents. Here are some of my pointers. This is the stuff that I had learned the hard way in grad school.
Ditch WYSIWYG
Ditch it! My regular readers are probably familiar with this mantra. I keep going on about this all the time, but there is a reason for it. In my honest opinion WYSIWYG is the root of all evil. Most of the problems that Shamus described in his post stem from the use of a WYSIWYG editor. Not MS Word mind you – any WYSIWYG editor will do this to you. Shamus, for example, was using Libre Office (an open source office suite) which was developed independently from Microsoft Office. And yet, as he described it “duplicated” all the worst flaws and “bugs” of Microsoft Word. How can that be though?
Libre Office team didn’t just copy these flaws from Microsoft. They have never even seen Microsoft Office code (no one outside from Microsoft Employees and contractors does – it is a closed source application). They developed their product independently simply following the conventions established long before Micrsoft was the dominant force in the word processing market.
So what does it tell you when two products developed absolutely independently share a number of identical flaws? It means the problem lies with the design, and not the implementation. To put it simply, What You See is What You Get is a blatant lie. Internally all WYSIWYG editors use invisible markup elements that build up in your document – they basically insert their custom HTML tag like constructs all over your text to indicate where to start paragraphs, what should be bold and etc. The problem is that you cannot see that markup. The last major Office Suite that offered that feature was Corel Word Perfect, but alas, they were run out of business completely marginalized by good old Uncle Microsoft.
The problem with invisible markup is that you never know whether or not you just deleted some formatting tag, section break, paragraph break, or whatever. Every once in a while you apply some formatting to your entire document, and it cocks everything up. The first 6 pages look fine, but page 7 looks like something has chewed and barfed it up all over page 8. Why? Because somewhere in the middle of page 6 there is a major clusterfuck of improperly nested tags. You could fix it, but since you can’t see the problem you are forced to work blind. Most of the time, the more you try to fix it, the worse it gets. Sometimes the only way to rescue your document quickly is to copy it, paste it into notepad (to remove all the invisible formatting) and then re-paste the plain text version into a new document.
I can guarantee you that you will experience these issues sooner or later with every single WYSIWYG tool.
Use Plain Text
The best advice I can give you is to put off formatting till you are done. If you are not ready to wean yourself away from WYSIWYG editors, then don’t but do not give into the urge to start formatting your work. Just type text. Don’t bold chapter names, don’t put page numbers, don’t do anything. Concentrate on content. You can add all that other stuff later. Pretend it’s a plain text editor with a built in spell check…
Can you do that? Ok, good. Now that you are pretending Word is a plain text editor, ask yourself why do you need that bloated piece of software anyway? Both Microsoft Word and Open/Libre Office editors take up lots of memory, and have tons, upon tons of features you will never ever use. On top of that, they are unstable and prone to crashing. Their proprietary file formats are prone to corruption. You don’t need to format your work as you go – you can do that at the end right? So why not ditch these editors.
The only useful feature these tools have going for them is the inline spell checking. You know, the stuff that makes squiggly lines under your words when you misspell them. Well, guess what? These days almost every single text editor which is not Notepad has that feature built in by default. Personally, I prefer Vim but, I admit – it is not for everyone. If you do end up using it, I created this nifty cheat-sheet that hangs over my desk now. Of course the fact that you need a cheat-sheet to use an editor may be a deal breaker to some people. Of course then there is Cream – vim for dummies which strips down some of the raw power, in exchange for convenience and familiar conventions (ctrl+s to save, etc…). It could be worth checking out.
Picking the right text editor is a deeply personal choice. It is a tool that you will be spending a lot of time in, so finding one that does everything you want, and does not get in your way may take you a while. But once you find the right one, you will be much happier and you will never want to look back. There is a lot you can learn about a man just by looking at his preferred text editor.
So this is what you do: you try bunch of them. Save your work as a plain .txt file, and then the editor no longer matters. You can switch it to a different one at a moments notice. There is no converting, no copying and pasting, no hassle. You just can’t go wrong with that format, and most importantly it is almost impossible to corrupt it. Binary Word documents go bad if you as much as sneeze at them, but text simple and robust.
If your publisher, proofreading buddy or thesis adviser insists on Word, you can always just copy, paste and save a copy of your work in their preferred program. It will take you an additional minute or two, to get it formatted the way they like it, but in the long run you will save many, many hours of productivity because you won’t have to deal with WYSIWYG issues.
Shamus, if you want a no frills, stripped down writing experience, have you checked out Q10 or Dark Room. They are full screen editors that remove all the toolbars and menus and allow you to immerse yourself in your work without any distractions. Creative writing types swear by these things.
Use Version Control
When I was writing my thesis, I was extremely paranoid of losing work. Even if technology cooperates with you, shit happens. I cannot tell you know many time I have lost hours of work because I accidentally deleted a chunk of text/code and did not notice it until like the next day at which point there was no way to undo the change. Over the years, I have learned my lesson – never embark on a big project without version control.
This is a big hurdle for non-programmers. Version Control is something we techies learn to use at an early stage in our careers, but creative writing types never, ever actually see it. It is a tool for programmers, but there is no reason why you can’t use it for prose. Fortunately, Shamus is a programmer too, so I hope he will get this.
You don’t need anything fancy. A local Subversion or Git repository is perfect. Just set one up, check your work in, and then at the end of your work cycle, commit your work. The repository will basically take a snapshot of your document at every commit, and you will be able to pull up those snapshots at any time later. So if you suddenly decide that deleting that one chapter was a bad idea, you can easily get it back.
The advantage of a repository is that it hides all these redundant copies from you. It only stores incremental changes, so it takes up far less space, and it avoids the clutter, and the hassle of keeping redundant copies of your work by hand.
If you are a programmer (like me and Shamus) chances are you are already familiar with at least one version control tool, and have at least one repository for your code. So it should be trivial to just start using it for your creative writing and/or academic papers.
If you are not a programmer, and you never used version control tool, don’t fret. I got you covered: use Dropbox. It is dead simple. You sign up, you install a client and assign a single folder on your computer to be the “dropbox folder”. The client will then monitor that folder for changes. Every time you edit a file inside that folder, it will upload your changes into the cloud. Every time you hit save, it makes a copy, and you can later log into the web interface and revert your file to a previous state.
Granted, it is not as powerful as a real repository, but it will do in a pinch.
Make Backups
There is a common trope in movies and TV shows. A struggling writer is shown writing his latest novel on his trusty laptop. This novel is going to either make or break his career. He finishes the first draft the night before deadline, and then BAM – somehow he manages to destroy the laptop which contained the only copy of his work. It is lost in a fire, it falls out the window, he drops it into the tub, etc…
Don’t be that guy! Have redundant backups. Backing up your work is not just for geeks. I should not have to say it, but you would be amazed how many people don’t grasp this concept. If you created a document that can’t be easily reproduced from scratch in a day or two, then having only a single copy of it on a single machine is just stupid.
Especially, since I already gave you a link to Dropbox. In addition to being a poor-mans version control, Dropbox is also a great backup tool. All your changes are automatically uploaded into their web portal. They give you 2GB of free space, but you can buy more if you need too. The free allowances is more than enough space for text though.
It gets even better though – if you install the dropbox client on more than one computer, it will keep all your machines in synch. All of them will have the latest version of the file pushed to them, as soon as you hit the save button. So you can start writing a document on your desktop, then grab your laptop, drive to your local coffee shop, and continue editing like nothing happened. It’s like using Google docs, but you never have to log in, and all your files are stored locally and mirrored across all your computers in addition to being held in the cloud.
LaTex is not Hard
No, seriously. It is not. If you follow my tip, and ween yourself from WYSIWYG and start writing your documents in plain text, then you are only a step away from working in LaTex. You are basically 80% there.
You see, the thing about LaTex is that it is plain text. If you have a .txt document you can convert it to a fully functional LaTex document by:
- Changing the extension to .tex
- Sticking these two lines at the begging of your document:
\documentclass[letterpaper, 10pt]{article} \begin{document}
- Appending this line to the end of your document:
\end{document}
That’s it. That’s all there is to it. Just three lines, and you are using Tex. The rest is just fine tuning, and formatting. But, like I said, you don’t need to worry about that stuff until your done and ready to mess with the formatting.
In case your curious, I have a running series of posts in which I introduce total beginners to the ins and outs of this wonderful technology. Oh, and windows users: here is a list of links that will get you started.
The huge advantage of LaTex over WYSIWYG is that it is just like coding. If you mess up the formatting, the compiler will tell you that you did, and point at a line where you fucked up. There is no guess work, no bizarre undocumented behavior or quirks. If you run into a problem, you can just google the error message, and more often than not, you will find useful information relevant to your problem. As a programmer I find this much, much easier to deal with than seemingly random quirks of WYSIWYG engines. But if you are still not convinced, please go and read my article on why LaTex is superior to Office.
Shamus, good luck on your book. I hope some of this advice will help you. Everyone works differently, personally but I would never trust anything longer than a 1-2 page letter to a WYSIWYG editor.
I wish I could do everything in latex, but half the time I’m working with a lab partner/group and they insist I use word since of course that’s what they use. Regardless, I think I’ll start using your text only approach when writing lab reports; the main drawback is that while I’m working on it, my equations won’t be formatted using the word equation editor so my partners will probably get bitchy (by the way, I LOVE having to use freaking tables to get equations properly aligned and numbered in word as seen here. Something which is native in latex).
Thanks for pointing out that dropbox has version history. Forgot it had that feature (luckily I haven’t had any situations where I would have needed it)
In the latex part you have several mistypings :)
\end{document}
for the end of the document and.txt
extension instead of.tex
And also:
I can imagine the document crying “Please, Sir, have mercy! Add those two lines to me!”
You mention not worrying about formatting, which is definitely correct. But I wouldn’t go as far as to say not to use any markup at all. Semantic markup is still important, depending on what you’re writing. You don’t want to have to go back hunting for all the book titles in your text because they weren’t marked as such.
If you really don’t know what system you’re doing to use (LaTeX, Markdown, whatever) and you’re not familiar with any of them yet, you can just use your own unique markings that can easily found later in as a string search. For example, say you do chapter headings with ===, titles with %%, foreign words with ***, and emphasis with __. Even though the last three will all probably be mapped to the same italics font they’re semantically unique. Later on you can go back and replace them with your chosen system’s markup.
=== On Fantasy Books ===
My his last book, %%The Eye of the World%%, the ***Aes Sedai*** were discussing what to do about the Rand, the ***ta’veren***. They saw him as __dangerous__.
You may not believe this, but WordPerfect is not only alive & well, but it is actually a decent WYSIWYG editor. It has this button, called “reveal codes,” that shows all the markup side-by-side with the WYSIWYG window. Not like the idiot paragraph symbol in word that just makes invisible characters show up, this shows everything.
Wonder why italics keep showing up in one area? Why a font won’t stick to a location? Why the stupid bullets don’t line up, or re-indent every time you add a new one? Reveal codes makes all that garbage an easy fix.
It is still crummy for documents with lots of math equations, tons of images, or figures. But it *is* much better than Word at even those. I had forgotten it existed until I was forced to use it in legal employment – it is huge in the legal community, it even has a “legal mode” for editing. Anyway, if lawyers do anything, it is put together long complicated documents replete with cites and footnotes and parentheticals, all of which must be in very specific font/size/shape/look. And WP makes that easy.
I’ll note that Word does have some really nice features. If you apply styles, it will create table of contents, figures, authorities, and indexes for you, and keep them updated. The citations and bibliography features are passable. All Office can do this, but embedding objects is dramatically underused and misused at the same time. WP does a lot of this, but not as well.
In plenty of circumstances, I think the aggravation of Word is worthwhile. A book is not one of those circumstances, particularly not a book even if you’re keeping a separate file for each chapter. TeX is also great in a lot of circumstances, like anything with math, but probably not a long document with gobs of images – if I were doing a comic-type book I’d strongly consider InDesign (I think Quark is a mess).
Prefix, or prepend rather than append
I do all my textwork with LaTeX and I can’t work with OOffice or something for more than a minute. But I disagree that writing unformatted text is a good idea:
1.) Page count. Especially in academic work people will want to know their progress as to not write too big or too small chapters(=semantic blocks). It is in my opinion vital to know how many pages you actually have (in final, formatted form).
2.) Structured text helps to structure thoughts. At least it helps me with my thesis. I go top down, create empty chapters first, then sections and subsections and when i have my basic plan I can work in a relaxed way, knowing where the journey of writing a thesis will take me. Having no table of contents, no real headlines &ct would kill my overview.
I’d like to help my wife move from WYSIWYG writing, wherein she’s constantly distracted by formatted, to semantic writing. But I’m certain that TeX won’t work — it’s just too alien, brittle, and verbose for her. I’d lean towards something like Markdown but unfortunately it doesn’t have support for footnotes, and anyway I’d have to rig up a Markdown-to-properly-formatted-PDF script, which she wouldn’t be able to configure or customize in any way. So I’m considering seeing if I can get her to try out Ulysses, a Mac semantic word processor which is apparently inspired by TeX but is more approachable for non-technical people.
“…and Open/Libre Office editors take up lots of memory, … Their proprietary file formats are prone to corruption”
You’re not completely wrong with this essay here, but that’s one thing that should be put straight: Openoffice works on the Open Document Format, which is not only, as the Name suggests, open, it is also an ISO standard, and everyone can read it in plain text, you just need to feed the binary file to unzip. Granted, a layman will not understand any of what goes on in there, but the format is not proprietary. There are even other applications that use UDF as their main format, here’s a list:
http://en.wikipedia.org/wiki/OpenDocument#Software
… some of those programs might actually be better suited for writing a book, and you still get a file that you’ll be able to read when MS has long abandoned .doc, and .docx, and whatever the successor might be, because one of the programs in the list above will still be around somewhere.
One more error:
The TeX document should probably end with
\end{document}
:)
Other than that: I think most of your tips are quite right, although I think some of your personal preferences have crept in in a few spots.
I’ve used Openoffic eand LaTeX for writing theses (also Word, but that was a horror). In the end I stuck with OpenOffice (these days: LibreOffice).
Why? Because you do need some formatting while you write a thesis! You need to declare headings and chapters, literature references and images, and you’d better do it right when writing, while you still know what you’re trying to reference there…
That also works with both TeX and any office package, but the thing that drove me to OpenOffice is not having to remember all of those commands and stuff. And having figured out in what order things need to be done in OpenOffice. Also, I have problems with the placement of pictures in both programs, but in OpenOffice, at least I see immediately what something will be misplaced, not only after compiling.
But that, again, is probably a matter of taste or habit more than an objective decision. It completely depends on how you prefer to work.
As for backup and verioning: I’m using backintime for both at the same time. It’s probably not the same as git, but as a non-programmer I didn’t want to start learning that, and it’s a lot better than saving xxxx_v1, xxxxx_v2 … etc. to a USB disk.
I should probably go and learn to use the LO-internal versioning. You can just save a new version within the same file. But I can’t really say how it works and how reliable it is, and I’m not gonna test it now.
Speaking of which, I have a thesis to finish … cheers.
I work with Plone, and I have seen clients countless times type up something in Word, to then paste it in a WYSIWYG editor in their site. All those invisible tags get copied, too, and then can start breaking things. It’s getting better, at least now TinyMCE will recognize when you are pasting content, and will give you a plain text window for pasting.
I do all my development with TextMate, and use TextEdit (default Mac program) for .txt files.
I do my writing in Darkroom, it’s lean and you can adjust the formatting for something you like.
Darkroom has no real version control, but each time you start it up, it’s opening the last edited file as a new document, so when saving you can store it with an versionnumber filename1 filename2 etc.
@ Eric:
Clarification: With ‘formatting’ I meant – setting displa yoptions – for suicidal writers I do suggest ‘Comic Sans’ Size 15 bold in cyan on magenta background ;)
this is my favourite :)
http://www.xm1math.net/texmaker/
Org-mode + Emacs, great for structured text and works with minimal markup. Of course, I haven’t written anything over a few thousand words in length, so I’m extrapolating.
@ xWittaker:
Yeah, collaboration with non LaTex people sucks. I’m in the same situation at work – all the internal documents are in MS Office. Actually, I am required to have 3 versions of office installed on my machine in parallel because we do have some documents generated by our clients that only work in Office 2003 (hackish 3rd party macros crash terribly in 2007), some other documents that have add-ins that leverage Office 2007 features, and one mega-hack which consists of a clusterfuck of VBA code that hooks into Microsoft Binder (which was discontinued after Office 2000). Fun times. :)
@ Gui13:
Thank you. I have fixed it.@ Chris:
Yes, this is exactly what I meant. When I create plain text documents I habitually make markdown style headings just because they look good.
@ jambarama:
Huh, what do you know. I thought they were out of the game. Glad to see they are still hanging around and found their niche. This was actually the first office suite I used on the PC platform. :)
@ smuf:
Personally I don’t really mind compiling to see how many pages I have written. LaTex provides the sematic blocks just fine. Since I usually use a programmers editor like vim, or a dedicated Tex suite stuff like \section{Foo} tends to stand out quite a bit.
When I do use plain text, I usually go for Markdown style headings.
@ Avi Flax:
Ulysses looks interesting. Have you tried LyX? It is a WYSIWYM editor (what you see is what you mean). It is still very entranced in the WYSIWYG way of thinking but slightly better.
@ Zak McKracken:
Very good point, thank you. When I said “proprietary file formats” in my mind I was talking only about MS Office. :)
@ Zak McKracken:
You know, I find that with LaTex you set up your figure/table/citation code once, and then it just works. It will stay in the same place, and look the same no matter what you do. With MS Office, Libre Office you will need to fiddle with it every time you add anything above it, or change document properties such as margins.
As for the internal versioning stuff – it’s probably less than ideal system. For one it balloons up the file size. Two, if the file gets corrupted, you lose the entire history.
@ Chrissy:
Ugh… I once inherited maintenance duties for a Joomla based page, with tons of content, all of which was copied and pasted from word. It was such a mess, that on certain pages, deleting a single character in the built-in visual editor in Joomla could cause a cascade of changes that would end up in certain fonts changing color, images shifting on page, and the entire layout breaking.
Also, I once had a boss who had an internal WordPress blog where he posted news, and interesting industry articles for the employees. He would write every single entry in Word and then paste it in.
The end result was somewhat interesting with every entry being in different font (Times New Roman, Arial, Verdana, etc..). I also loved when he would copy and paste quotes from online articles while preserving formatting.
So the entry would start in 10 point Times new roman, then you would get some fancy sans-serif font for the quotation in 12 points, and the rest of the entry would continue in 11 point Arial.
@ Eric:
You think this is a joke, but I actually knew people who had this font/color combination set up in their Outlook. Their emails would quite literally (and by literally I mean figuratively) stab you in the eyes. :P
@ MrJones:
TexMaker is great. I used it on Linux because Kile didn’t have inline spell-checking and I liked it enough to phase out my use of TexNicCenter on Windows. :)
750m download for LaTex, ouch…, not going to try it now. Why it is so massive?
(This link http://www.tug.org/protext/)