Archive for the 'latex' Category

Latex: Squeezing the Vertical White Space

Wednesday, September 19th, 2007

Here are some tips on how to “compress” your paper vertically by minimizing white space gaps between elements. I had to do this few days ago when school refused to duplicate my syllabus because it was 14 pages long. I got it down to 6 without any cuts, and then down to 3 after doing some reductions in text.

So how do we squeeze the vertical whitespace? There are many ways to do this, and some are more complicated than the other. My tips range on the easy side, so you won’t need to write custom .sty files or redefine commands using intricate tex sequences.

First we want to set the spacing between paragraphs as small as possible. The commands below should kill just about any space inducing setting in your paper unless you are doing something fancy:

\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
\setlength{\headsep}{0pt}
\setlength{\topskip}{0pt}
\setlength{\topmargin}{0pt}
\setlength{\topsep}{0pt}
\setlength{\partopsep}{0pt}

Now the paragraphs are snugly against each other so let’s take a look at line spacing. The \linespread command is usually used to increase the line spacing but we can exploit it to make it smaller by passing in a value smaller than 1:

\linespread{0.5}

You may want to play around with that value - if you set it to small, LaTex will just reject it. Below certain threshold some lines may start running into each other. For me 0.5 value did the trick, and slurped up swaths of white space.

Our next offender are section headings. By default they have huge gaps above and below them. Totally wasteful, especially if you are trying to save trees by decreasing your page count without sacrificing content. So what do you do? There are very complex ways to change the spacing above and below the section headings but we are lazy bums and don’t feel like using them. So let’s use the titlesec package and zero out all the spaces:

\usepackage[compact]{titlesec}
\titlespacing{\section}{0pt}{*0}{*0}
\titlespacing{\subsection}{0pt}{*0}{*0}
\titlespacing{\subsubsection}{0pt}{*0}{*0}

If you look int the documentation, the attributes to \titlespacing are command, left margin, above-skip and below-kip respectively. The * notation replaces the formal notation using plus/minus and etc. If you set it to zero, headings will snug up to the paragraphs above and below them.

Enumerations and itemizations are horrible space wasters too. By default, all the lists are double-spaced. Why? Don’t ask me, but it’s easy to get rid of that by using “compacted” lists provided by the mdwlist package. In your preamble add:

\usepackage{mdwlist}

Then instead of using normal lists use:

\begin{enumerate*}
	\item
\end{enumerate*}
 
\begin{itemize*}
	\item
\end{itemize*}

This is not as straightforward as the other steps, as it will require some search and replace in a pre-existing text. If you know a better way to do this, please let me know.

Last thing I did was to change my margins using the geometry package. My command looked like this:

\usepackage[left=2cm,top=1cm,right=2cm,nohead,nofoot]{geometry}

My document had no headers or footers so I was able to disable them. You should probably experiment a bit with the values above to see what is the maximum range of your printer. Most devices won’t print all the way to the paper edge so you must set margins appropriate to your printer.

My document had no formulas or figures, but had several long item lists. Some of them were very narrow - 2-3 words per item. These types of lists are major space wasters so I set them in multi-column mode. Depending on your list you can either use 2, 4 or either 4 columns. For me 2 columns were the right fit. I recommend using the multicol package. In your preamble add:

\usepackage{multicol}

Then surround your text to be “columnized” (shut up, it’s a new word I just mad up) using:

\begin{multicols}{2}
	% your stuff goes here
\end{multicols}

Once I did all of that, the page count of my document was cut roughly in half. Feel free to add your vertical space squeezing tips in the comments.

Why LaTex is Superior to Office

Monday, July 16th, 2007

If you are a regular reader, you probably saw me mentioning LaTex at various times. I always praise it as the superior solution - one above and beyond the traditional, word processing products. So let me take a minute of your time and explain exactly why I choose to use LaTex, and why you should consider learning it.

Output Quality

The most striking difference between text produced by Word or Open Office and LaTex is the quality of the output. You have to keep in mind that LaTex is not just a mere word processor - it is a typesetting software. It does kerning, hyphenation and all the tricks used by professional typesetters who work in professional publishing industry. This means that your text is always beautifully justified and balanced across the page. Please compare the two screen shots below. This is some random Lorem Ipsum text in MS word:

Word Sample
sample produced in MS Word

Now compare it to the same text sample generated by LaTex:

LaTex Sample
sample produced in LaTex

They look similar, but you will notice that the LaTex sample just flows better, and looks more professional. Let’s see what happens when I contract some of the words to create really long expressions, and justify the text in word:

Word Justified Sample
justified text in Word

You might be familiar with this situation. Word does not know how to break words across line with hyphens. Nor does it know how to use kerning to bring some letters closer together, and push other ones apart to avoid creating huge white space gaps. LaTex does both:

LaTex Sample of Justified Text from Word
sample of the same justified text in LaTex

The difference is striking. On one had you have ugly gaps, on the other you have nicely flowing, justified text. Which one do you like more? You be the judge. And this is just the tip of the iceberg. I could show you many cases in which LaTex beats conventional word processing in placing figures, floating text around images, displaying mathematical formulas and etc.

Transparency of Markup

When you write LaTex you work with plain text files. You simply add markup commands to your text. This is very much like writing HTML - just with more features, and more powerful parser on the receiving end. So what do I mean by transparency? You simply always see what is going on. For example, consider the following LaTex code:

Plain Text. \textbf{Bold} Text.

The \textbf command is the equivalent of the HTML <strong> - it makes the text boldface. You clearly see what is going on - the text between braces will be bold, while the text outside will be plain. Word on the other hand utilizes hidden markup. Yes, there are markup characters in word! You can actually see some of them: such as paragraph breaks, tabs, section breaks and etc. Other ones, such as boldface, and font tags are hidden. But they are there. Consider the following situation:

Hidden Formating Markup in Word
Hidden formating in Word

Note that when I backspace too close to the boldface text, all of a sudden I find myself typing in bold again. This happens because the symbol that ends the boldface “tag” was inserted directly to the left of the space following the word Bold. Deleting that space, deleted the markup symbol, throwing me back into boldface mode.

When I taught a Fluency in Technology course at MSU this was one of the consistent issues that frustrated the students to no end. All the faculty members I talked to noticed the same thing. We know why this happens - but an average person takes it as a weird software quirk, a bug, or some sort of failing on their part. If you ever see someone struggling with Word, look for this moment:

Word Paraghaph Formating Quirks
Word paragraph formating quirks

Ask the user what do they think happen, and how do they feel about it. Reactions will likely range from disbelief to death threats addressed at Bill Gates. But this is the same exact problem that I illustrated above. Some hidden formating symbol gets deleted, and messes up all your paragraph format. How to avoid situations like this one? You can control it by being careful with what you format, and where do you put white space. But if someone gives you a file, there is really no way of clearly identifying issues like this.

In LaTex, markup is transparent. There are no nasty surprises. You match the opening brace to the closing brace, or \begin statement to \end statement - and it’s easy to see if one is missing. It saves a lot of frustration.

Ease of Modifying Styles

This is something that happens to you when you try to publish a paper. You write a really long document using one citation and formating style, and then you find out the journal requires a completely different style. So for example, you may need to two columns, different spacing between paragraphs, different ordering of bibliographic references, different style for figure captions and etc. If you are using word, chances are that once you change the font, the margins, the column layout and paragraph spacing, all your figures will end up in weird unexpected places. You may need to move them around manually. Same goes for your page breaks, section breaks and etc. You probably have a lot of work ahead of you.

In LaTex you will usually only have to edit few lines in your preamble. If you are changing to a known style such as IEEE for example you can simply download appropriate templates, toss them to your project directory and add one or two lines to your document - for example:

\documentclass[peerreview]{IEEEtran}
\bibliographystyle{IEEEtran}

Or something among those lines. The rest is done by the parser and typesetting engine. It will move around the figures as appropriate, format the text the way it should be, change the the way bibliography and table of contents is displayed and etc. It’s easy, and requires almost no manual tweaking, unless you were doing something very complex and specific. As opposed to word - it just works.

Ease of Debugging

Most of issues with a LaTex document can be fixed by analytical process of analogous to debugging code. The markup language has a strict syntax, and most mistakes will generate errors, and force you to deal with the immediately. A LaTex user usually problems such as:

  • What is causing the error in this particular line of text
  • What commands do I use to make this line up properly
  • Did I miss a closing brace or an end statement here?
  • What do I need to put in the preamble to change the spacing and text flow here

All of these problems are analytical problems that can be solved by careful elimination process, or researched via googling the error messages, or warnings and by reading extensive documentation available online. Word users on the other hand, tend to struggle with OMG WTF kind of issues:

  • Why does my whole paragraph loose formating when I hit backspace (see above)
  • How the hell did this embedded object get corrupted?
  • I put some section breaks in the document and now my paging is all messed up
  • Why does word merge these tables when I paste them?
  • OMG! Everything breaks when I paste this into the multi-column section!

The only way you can debug weird formating issues in Word is by trial and error, and hitting undo many times, until you get it right. You can’t study the code and find out what you did wrong because most formating marks in your text are hidden, and handled internally. Documentation is usually lacking, and chances of finding good hits via google are directly proportional to how well can you describe the issue in a short search phrase.

No Vendor Lock In

Microsoft makes Office to make a profit. They want to lock you into their platform because they want your money. And unless we can make ODF the national standard, they will tweak their file formats with every edition, release half assed API’s and specs and do everything to make interoperability difficult. Open Office and other products will always be playing catchup. ODF is the only way out of the lock in - but the things don’t look so great on this front. My hunch tells me that Microsoft will succeed pushing their OpenXML specification (which, btw is neither Open, nor a specification - more of an incomplete set of purposefully confusing notes with a restrictive license attacked) as the de-facto format in the upcoming years.

So thanks, but no thanks! LaTex is completely open, and well documented. It has been ported to virtually every platform and architecture. And it does not require a bloated, slow editor to use. You can edit your documents in vim, emacs, notepad, or even Eclipse. You are not tied to a single company, and you are not tied to a single editor. You can pick and choose. And choice, ladies and gentlemen is an essential component of freedom, and individualism.

Don’t listen to the people who talk about the paradox of choice. If you feel paralyzed and unable to make a decision when faced with plethora of different choices, then you are probably either a indecisive person to begin with, a lazy bum who doesn’t want to do the research, or a brainwashed zombie-sheep. Some people all all of the above. Majority of people are at least one of those. But you and me - we are different. If you read this far in this long post, you are probably at least little bit intrigued by LaTex. And so you probably embrace choice and freedom. LaTex gives you that in the same way using ODF format does. The only difference is that ODF is currently supported by very few Office applications - while LaTex does not require any specialized tools. Just a text editor and the phraser/compiler.

Designed for Excellence

Tex the core of the language was designed by Donald Knuth - a man who made some impressive contributions of the field of theoretical Computer Science, and is hailed by many as a living legend. He is considered a father of algorithm analysis. The man is a genius, and he wrote Tex because he was disappointed by all the publishing software that existed at the time.

Tex was then embellished, and improved by Leslie Lamport (who also has impressive set of contributions of the field) becoming LaTex. Quite an impressive pedigree if you look at it. The system was designed by some of the most brilliant people in our field, to be the be-all-and-end-all of document preparation and publishing. Then it was released into the wild as an open source project, to be picked apart by millions of eyes and hands.

Word on the other hand… It was designed to compete and emulate Bravo - the first original WYSIWYG text editor for Xerox Alto. It was initially developed by Charles Simonyi and Richard Bridie. Simonyi was hired to work at Xerox directly from Stanford, and his crowning achievement before Word was development of the mentioned Bravo editor. Brodie was a code jokey who also worked on Bravo, and more recently a professional poker player. Since then it was slaved over by hundreds of developers on tight schedules, high pressure work environment, and conflicting goals of maintaining backwards compatibility, while preventing interoperability with 3rd party software. It’s not an impressive pedigree. Also, if you think about it, Word has all the qualifying prerequisites for a genuine OMG! WTF??? monster of a corporate monster of a project. One of those that you’d be likely to see on The Daily WTF.

I don’t know about you, but I’d rather go with the open source system designed by legends, than by kludge worthy, proprietary, corporate monstrosity. But that’s just me.

Mature and Stable

LaTex is still in active development (at least last time I checked) but the software is mature and stable. The development process is slow and steady. The current releases of the software are rock solid - I have never, ever encountered, or even heard about an issue with the compiler/parser software. The markup language is throughly documented, and widely used. Whenever you need to do something fancy, chances are that someone already did that before you and either published a complete package or at least a well documented solution online.

Office on the other hand is in constant flux. Each version tweaks the file formats, changes the menus, adds more useless functions, and more bloat. Software is unstable, and prone to crashes, and corrupting the binary files for no reason. Because of the poor design it is used by malware writers as an infection vector.

Open office, while more secure is plagued by the same set of issues. It’s a big application that is constantly changing, trying to catch up to the industry leader. Bugs abound, and interoperability with MS Office is still not where it should be.

Unfortunately, Latex is not for everyone

All of that said, LaTex is not for everyone. The learning curve is steep compared to Word or Open Office. You can’t just pick up LaTex by messing around with the UI. You have to understand the syntax, and learn it’s basic rules before you do anything. Users should at least have basic understanding or programming, markup languages, compiling and debugging software. Thus, it’s probably not something that you’d just give to a novice computer user. It’s a tool for technologically competent people who would rather work with a well documented document preparation system with strict, transparent markup syntax, rather than with quirky, buggy and unstable WYSIWYG setup.

LaTex is not something that would find much use in a corporate office environment. Who is LaTex for?

  1. College Students - there is no better software for writing essays, term papers, presentations, and research papers
  2. Grad Students - I wrote my Master’s thesis in LaTex. I can’t imagine doing anything like that in Word. It would be a suicide. Don’t do it to yourself. Use the proper tool for this task - that tool is LaTex.
  3. Writers - LaTex was designed for publishing, and there is no other tool that will produce high quality almost-ready-to-print documents out of your manuscripts
  4. Scientists and Researchers - no other tool makes it so easy to write research papers, books and journal articles
  5. Professors - in the past I used LaTex to generate great looking multiple choice exams, lecture presentation slides, lecture notes and etc..
  6. Developers - you want great looking documentation and user manual? This is the tool for you. LaTex can also output to HTML so you can easily create googlable online copies of all your documents.
  7. You - if you read this far, you probably have what it takes to try it, and use it well.

I don’t know if any of this convinces anyone. But those are some of my reasons for using it. Please feel free to add your reasons if you happen to be a LaTex user. And if you are not, check it out. It’s worth it. Don’t forget to let me know what you think of it once you try it.

If you guys want, I could run few introductory articles teaching basics of LaTex here on this blog. For example, if you are to lazy (or busy) to research it on your own, but you wouldn’t mind seeing some simple examples, and code samples here, please let me know.

Convert PS and EPS images to JPEG

Monday, April 23rd, 2007

The reason why I post stuff like this here is twofold. Firstly (is that even a word?), I forget. Chances are that in 3 months I will need this shit again, and won’t remember the exact syntax, or the name of the tool I used. So instead of googling for it, I can just search through my blog archives.

Secondly, chances are that there are quite a few intermediate or beginner latex users out there who may or may not find this useful.

Here is the scenario: you created a beautiful Latex document. In fact it is so awesome that someone approaches you and asks you to send them that incredibly cool chart or figure you used in your paper. So you send the person the eps/ps file and they can’t open it.

The proper raction that should be taken in such situation is of course repeatedly whacking said person on the head with a blunt object until they get a clue. Sadly forcible clue insertion is not always an option, so every once in a while you will need to convert your images into some more luser friendly format.

You can convert any ps or eps file into a jpeg using ghostscript:

gs -sDEVICE=jpeg -dJPEGQ=100 -dNOPAUSE -dBATCH -dSAFER -r300 -sOutputFile=myfile.jpg myfile.eps

This method has one flaw. It produces humongous files. Depending on the eps file you may get something like 2000×3000 pixels which is slightly on the insane side. Also the file size of the JPG will be about 10 times that of the eps.

We will now need to trim and resize the file using some Image Magic tools:

mogrify -trim -resize 800x600 myfile.jpg

Obviously you can put your own dimension instead of the 800×600. Mogrify will resize the image, and also cut down the size of the file to a manageable level. Still, in most cases files obtained using this method were bigger than the eps files used to produce them. It’s best to simply use your photo editing tool (gimp?) and re-export the image/chart to a conventional image format. It will probably yield much better results than this conversion.

On the other hand, if a ps/eps is all you have, this might be useful.

Convert JPG and PNG to EPS on Windows

Friday, April 20th, 2007

There are two kinds of people in this world - those who are always on the lookout for the nifty tools that will convert the xyz image format to Encapsulated Postscript (eps) and those who have no clue what Encapsulated Postscript is.

Why do we like to convert all kinds of shit to eps so much? It’s because of LaTex. And while the pdflatex has support for most non-postscript image types, you sometimes want to compile your documents into postscript format first for best results.

On most unix systems converting between back and forward between common graphical formats and eps is relatively trivial. There are dozens of small command line apps (most likely developed by and/or for frustrated LaTex users) that will for just that for you. Many systems will actually ship with decent assortment of these tools out of the box.

However if you find yourself editing LaTex documents on windows you might be lacking these tools. Or not. I found myself in this situation the other day, and I was able to find windows port of jpeg2eps as well as the port of the awesome sam2p app that works on bitmaps, png’s and bunch of other file formats. The direct link to the binaries is here.

Once you have the right tools, the rest is easy.

LaTex: Fixing Wrong Figure Numbers

Saturday, April 14th, 2007

What I tell you right now may save you hours of extensive debugging, cursing under your breath, commenting out custom code dealing with figure layout and much frustration. Whenever you use figures, always (and I mean ALWAYS EVER FOREVER ALWAYS) put \caption first, and \label second like this:

\begin{figure}[htp]
 	\centering
 		\includegraphics{image.eps}
 	\caption{Some Image}
 	\label{fig:some-image}
\end{figure}

If you put the \label above \caption you will run into trouble when referencing figures inside subsections. In my case, the caption underneath the figure would say Fig. 4.2 but the output of \ref would be 4.3.10 because somehow it was picking up the section numbers wrong. The whole damn chapter 4 had the caption/label pairs flipped - but the rest of the document was fine. I have no clue what possessed me to write it this way.

Now I know better. This is the 3 hours of my life that I will never get back. All because I put label before a caption. Do not do that to yourself!