programming – Terminally Incoherent

Unit Testing Sinatra Apps

Luke Maciak — Tue, 24 Feb 2015 21:14:30 +0000

Testing is the safety harness for your code. They are not magical, and they will not prevent you from missing bugs you did not anticipate. They do however automate the boring chore of making sure various edge cases and special conditions do not blow up your code. As such they help to catch bugs you could have totally anticipated, but did not bother checking for because of reasons.

Manually testing web apps is a nightmare because it forces you to pull up a web browser, refresh pages, make sure you clear your cache between tests, etc.. No one wants to fiddle with the browser all day, so automating basic testing tasks will not only save time, but also greatly improve your workflow.

Unfortunately testing web apps can be a bit tricky sometimes. They are designed to be accessed over the network, and to render in a web browser, and so they require your test framework to do network-like and browser-like things to emulate those conditions. While unit tests for classes can be easily mocked, pretending to be a web browser is definitely non trivial. When I work with PHP I usually use the excellent Codeception tool-set to do acceptance testing. When I work in Node or just build front end stuff, I typically use Grunt with Phantom.js.

When working with the Sinatra framework, most unit/acceptance style testing can be easily done by using the minitest and rack-test gems. Let me show you how.

Let’s set up a simple Sinatra app. Our folder structure ought to be something like this:

myapp/
├── app.rb
├── Gemfile
├── public/
│   └── style.css
├── Rakefile
├── tests/
│   └── app_test.rb
└── views/
    ├── layout.erb
    └── main.erb

When setting up your dependencies in your Gemfile it is a good idea to isolate the test related gems from the actual run time dependencies. You can do this by using the group keyword:

source 'https://rubygems.org'
gem 'sinatra'

group :test do
  gem 'minitest'
  gem 'rack-test'
end

When deploying to production you can exclude any group using the –without argument:

bundle install --without test

If you are deploying to Heroku, they exclude test and development groups by default, so you don’t even have to worry yourself about it.

Here is a simple Sinatra app:

require 'bundler'
require 'bundler/setup'
require 'sinatra'

get '/' do
  erb :main
end

You know how this works right? The above will render the contents of main.erb and envelop them in layout.erb which is the auto-magical default template. For the time being lets assume that the contents of the former are simply the words “Hello World” and that the later provides a basic html structure.

To test this application we need to create a test file somewhere (I put them in the test/ directory) and inside create a class derived from Minitest::Test and include the Rack::Test::Methods mixin.

Mixins are a wonderful Ruby feature that let you declare a module and then use the include keyword to inject it’s methods into a class. These methods become “mixed in” and act as if they were regular instance methods. It’s a little bit like multiple inheritance, but not really.

In the example below, this gives us access to standard Rack/Sinatra mock request methods such as get, post and etc…

ENV['RACK_ENV'] = 'test'

require 'minitest/autorun'
require 'rack/test'
require_relative '../app'

class MainAppTest < Minitest::Test
  include Rack::Test::Methods 

  def app
    Sinatra::Application
  end

  def test_displays_main_page
    get '/'
    assert last_response.ok?
    assert last_response.body.include?('Hello World')
  end
end

Once you invoke the mock request method (see line 15 above) the last_request and last_response objects become available for making assertions. The last_response object is an instance of Rack::MockResponse which inherits from Rack::Response and contains all the members and methods you could expect. For example, to check whether or not my app actually displayed "Hello World" I simply had to test if that string was somewhere inside last_response.body (see line 17).

To run this test you simply do:

ruby tests/app_test.rb

The minitest gem takes care of all the boring details. We just run the test and see the results.

Let me give you another example. Here is a bunch of tests I written when working on the Minion Academy web service. My goal here was to make sure my routing rules worked correctly, that the requested pages returned valid JSON objects, with the right number of nodes, and that no JSON would be generated if the URL was formatted wrong:

 
  def test_json_with_1
    get '/json/1'
    assert last_response.ok?
    response = JSON.parse(last_response.body)
    assert_equal 1, response.count
  end

  def test_json_with_1_trailing_slash
    get '/json/1/'
    assert last_response.ok?
    response = JSON.parse(last_response.body)
    assert_equal 1, response.count
  end

  def test_json_with_0
    get '/json/0'
    assert last_response.ok?
    response = JSON.parse(last_response.body)
    assert_equal 0, response.count
  end

  def test_json_with_100
    get '/json/100'
    assert last_response.ok?
    response = JSON.parse(last_response.body)
    assert_equal 50, response.count
  end
  
  def test_json_with_alphanumeric
    get '/json/abcd'
    assert_equal 404, last_response.status
  end

Note that those are not all of the tests I have written for this particular bit, but merely a representative sample.

The neat thing is that these tests will seamlessly integrate with other unit tests you write against regular, non-Sinatra, non-Rack related classes. You can simply dump all the test files in the tests/ directory and then add the following to your Rakefile:

require 'rake/testtask'

Rake::TestTask.new do |t|
  t.pattern = 'tests/*_test.rb'
end

This will add a test task you can run at any time that will iterate through all the files matching t.pattern and run them in a sequence.

If you're like me and you don't feel content unless successful tests are rendered in green, and errors in bright red, I recommend using the purdytest gem which colorizes the minitest output. There are many test report filters out there that make the output prettier, but Purdytest probably the simplest and least obtrusive. You simply require it at the top of your test files and then forget all about it.

Minion Academy

Luke Maciak — Mon, 27 Oct 2014 14:35:35 +0000

Have I mentioned that the nemesis system in Shadow of Mordor was really cool? Because it was. Playing that game made me wander what else could be done with it. For example, I have always been fond of RPG oracles and general random generators for pen and paper RPG games. I am firm believer that every NPC and/or enemy, no matter how minor or unimportant should have a name and a few distinguishing features. A good game master can usually make up such details on the spot, but keeping track of dozens of throw away characters which may or may not die or conversely become important at some point can be difficult. So random generators are GM’s best friend – especially when trying to populate the game world with diverse collection of characters and not just standard “dwarf with brown beard, elf with blond hair” type stand-ins which is what you usually come with when you need to make up a character on the spot.

While there are dozens of random NPC generators, I figured I might as well write my own. It seemed like a fun and quick side project. How would one go about procedurally generating non player characters though?

First and foremost I figured it should be easy to modify and expand. Instead of hard coding values into the generator itself, I figured it should be stored as some sort of a structured list. I went with YAML because unlike many data serialization formats what claim to be “human readable” it actually is. Well, at least for me it is – your opinion may of course vary and it is not like YAML is without a lot of weird quirks. But I basically just needed a simple data format that could be easy to edit by hand, and that could be consumed by my code without doing a lot of parsing. Seeing how in Ruby can slurp a YAML file into an associative array in a single line of code, it was more or less perfect.

Moreover, I wanted my generator not to be “fully” random but rather use weighted probability scores for specific things. For example, it should be relatively rare to see a Rogue wearing a plate armor, but it would be common to see it on Warrior characters. How do you implement that? There is a few ways. For example you could find the discrete cumulative density function (CDF) of your list, generate a random number in the range between 0 and the sum of all weights, do a binary search to find this number… Actually, no. Scratch that. This is a solved problem and there is literally no reason to re-invent it other than as a classroom exercise maybe (or if you are worrying about performance). Instead of spending hours writing and debugging CDF code, we could just grab a gem such as this one and be done with it.

The basic idea was to let me write a weighted list like this in YAML (higher the number, the better likelihood the item gets picked):

race:
    Human     : 6
    Elf       : 6
    Dwarf     : 6
    Orc       : 6
    Goblin    : 5
    Halfling  : 4
    Hobgoblin : 3
    Kobold    : 2
    Ogre      : 2
    Troll     : 1
    
class:
    Fighter  : 4
    Soldier  : 3
    Cleric   : 1
    Bard     : 1
    Diplomat : 2
    Ranger   : 5
    Rogue    : 5
    Sage     : 1
    Scout    : 3
    Warrior  : 6

social:
    Commoner : 5
    Noble    : 2

Then in Ruby I could pull stuff out of it like this:

require 'yaml'
require 'pickup'

data = YAML.load_file('stuff.yml')
race = Pickup.new(data['race']).pick(1)
class = Pickup.new(data['class']).pick(1)

This was basically as complex as the code would get. As it is usually the case with this kind of projects the bulk of the work went into actually generating the data files that would yield not only a good deal of variety but also return both mundane and ordinary foot soldiers as well as funky and exceptional fun characters from time to time. It is more of a creative endeavor rather than programming.

What kind of weapons are appropriate for a rogue? What kind of armor should be worn by scouts? What color can Orc eyes be, and would this be any different for goblins? What kind of scale colors are most popular amongst the Kobolds? These were the sort of questions I had to answer while making this tool.

If you follow me on Twitter (as you should) you have probably seen me posting screenshots of the minions I was generating on the console:

Here are some more randomly-generated minions, now with personality traits and descriptions. pic.twitter.com/QERJtDXWP0

— Luke Maciak (@LukeMaciak) October 21, 2014

This is back when I still had “barbarian” as a class which I later decided against including. Why? Well, to me it seems like every other class (warrior, rogue, bard, cleric, etc..) is something you choose to be. Barbarian, on the other hand is something you are. It is more often than not used to describe a social caste or grouping of people rather than a profession / calling. So I removed it and replaced it with Fighter and Soldier to have 3 solid close combat classes. In my mind warriors fight out of conviction (they have a duty, seek glory, want justice, etc..), fighters do it because they like it (they are the brawler, trouble-maker types that start fights in taverns for shits and giggles) and soldiers do it strictly for money.

Creating plausibly sounding names proved to be a whole separate problem. I knew that when it came to elves and Dwarfs, I could just shamelessly crib from Tolkien if I wanted to because there are billions of good names for both of these races in the Middle Earth lore. But I didn’t just want to have gigantic copy pasted lists. So I opted for something slightly more clever. I grabbed some interesting names, broke them into (usually) two parts, and then randomly recombined them. For example, here is a sample of Orc name table:

Orc:
    given:
        Male:
            prefix:
                grish: 5
                gor: 5
                muz: 5
                maz: 5
                lag: 5
                lat: 5
                lar: 5
                lir: 5
                rad: 5
                shá: 5
                rag: 5
                rat: 5
                urf: 5
                goth: 5
                núr: 5
                nir: 5
                fár: 5

            postfix:
                nákh: 5
                bag: 5
                gash: 5
                gnash: 5
                bash: 5
                mash: 5
                gol: 5
                mol: 5
                duf: 5
                buf: 5
                rúf: 5
                muf: 5
                dúr: 5
                grat: 5
                gnat: 5
                thrak: 5
                lúk: 5
                múk: 5
                mog: 5
                rog: 5

This particular selection can yield names like Grishnákh, Gorbag and Muzgash (all of whom are named characters from Lord of the Rings) as well as dozens more or less plausibly sounding names.

Most races have gendered first names and last names dictated by social status. So for example a noble’s name may include the name of their estate, or name of their father, whereas the names of commoners are typically nicknames or trade profession related. Elves, Hobgoblins and Trolls ended up with gender neutral names just because of how alien they sounded and because I wanted to have at least one group which did not have a concept of gendered names.

Once I had basic data files created, I wrapped it up in a bit nicer interface and started generating minions by dozens. It was interesting just to read their short descriptions and try to imagine how they would look and what their personalities would be. At some point I even noticed emergent little micro-stories popping up every once in a while. For example, here are two randomly generated Orcs I got the other day:

Noble born warrior Ragma travels with her attendant Mizni seeking glory in combat 2 prove she is worthy of her title pic.twitter.com/FcUfYJuCi8

— Luke Maciak (@LukeMaciak) October 26, 2014

I found it interesting that they were both ambitious and feared losing face. It felt like they were connected somehow. Ragma was a noble born warrior while Mizni one was a commoner and a ranger. Possibly Ragma’s attendant and a guide? They were likely traveling companions: Ragma young, impetus, and irresponsible, but eager to make a name for herself. The older, wiser Mizni was likely appointed by her parents to keep the young warrior in check, and make sure she returns home safely from their adventures. They both driven by their ambition. Ragma wants to prove she can live up to the high standards of heroism set by her parents. Mizni wants to prove her value to the family by taking on a challenge of keeping the wild and irresponsible Ragma in check. You could literally write a short story about them, just based on this relationship.

This is the beauty of randomly generated content: sometimes a short little blurb can strike a chord with you and your imagination will immediately fill in the blanks creating interesting and meaningful relationships and scenarios. I figured it was worth sharing this little thing that I have done with others.

Minion Academy Screenshot

I set it up on Heroku cloud platform, and named it Minion Academy, mainly because I managed to snag that as a URL. So it is now up at minion.academy. When you visit the page you will get five randomly generated NPC’s and you can refresh the page for five new ones. It’s very basic, and still rather rough around the edges. There is still some work I want to do with it.

For example, I want to add more armor choices. Right now it’s basically just cloth, leather, chain or plate. I would like to expand it so that you could have a wide variety of different armor types for each of these categories. You might have also noticed there are no magic user types being generated right now. This is partly by design (I was initially trying to make a minion specific generator which kinda grew to cover all kind of NPC’s) but I’d like to add some wizards and sorcerers at some point.

If you notice a bug, I have the source code on Github so feel free to submit a bug report. As usual, let me know what you think in the comments.

Make Your Web Forms Time Lord Friendly

Luke Maciak — Mon, 13 Oct 2014 14:09:23 +0000

This was a conversation rolling through my Twitter feed lately: how do we design good web service signup form? One that is unobtrusive, intuitive and inclusive. How many fields do you need? What is the best way to arrange them? What kind of questions are important to ask your users? Turns out that there is a lot of disagreement on this, and a lot of misinformation and false myths floating around.

For example, is this a good sign up form?

Facebook Signup Form as of Oct 2014.

I would argue that it is not great. In my opinion splitting the users name is absolutely pointless. Even, assuming your service needs to use the legal names of your customers (which 99.9% of web services do not “need” to do, they just choose to do so because of reasons) you really only need a single field. This is not a very popular opinion, and a lot of programmers are very, very defensive of their first name/last name split.

I get it, though. I too was taught the mantra of “always be normalizing” when it comes to a database scheme design. The software engineer in me wants (even needs) to have human identity split into two or more clearly labeled forms so that it can be properly sorted. But, asking for first and last name does not work for everyone. As soon as you normalize this way, you are automatically starting to exclude swaths of users whose names do not conform to the particular pattern you chose.

You probably heard of this little factoid: in some cultures you list your family name first, and your given name last. That alone should give you a pause, and make you re-consider using a two field strategy. Some people think that simply labeling the fields as “given” and “family” instead of “first” and “last” will do the trick. I also saw a developer claiming that his app is going to be primarily used by English speaking Americans so it does not matter. But that’s wrong too, because even in that narrow demographic you are going to have a number of people whose names do not fit into the first/last pattern. You want examples? How about Madonna, Eminem, Pink, Xzibit, Nelly, Sinbad, Rihanna, Kesha, Mr. T, Lady Gaga or “The Artist Formerly Known as Prince”. There is a strong history of performers adopting mononyms or stage names which either become their legal names, or at the very least are more publicly recognizable than their birth names.

The fact that I could rattle a dozen names of the top of my head, all of which belong to prominent and recognizable celebrities is a proof that this practice is very much part of western culture. Mononyms and funky stage names are as American as apple pie. So you can’t really use “culture” to defend the over-normalization of the name field, when your own culture has a large group of very prominent outliers.

People make a lot of assumptions as to how people’s names work, but all of them are false. Yes, all of them. The single, uniform field for name is not just something I pulled out of my ass for the purpose of this article. It is actually the best practice recommended by W3C.

Same goes for sex. Why does Facebook think it is necessary to ask its users what kind of genitals they have? I can see how this could be a valuable data point for a dating service, since people use those specifically to facilitate mutual mashing of genitals together. So it makes sense to let people sort and filter potential future romantic partners based on their sex and gender preferences in addition to other criteria. Facebook however, like most social and anti-social web apps in existence has virtually no business to ask this question.

Don’t even try to sell me on “demographics” and “advertising” argument because it is bullshit, at least with respect to Facebook since they track your behavior and browsing habits anyone. There is nothing your sex tells their advertisers that they could not get from analyzing your posts, likes and social graph interactions. In fact, the tracking data is more valuable and more accurate way to target advertising than an empty data point that designates you as “man” or “woman”.

Also, why is it a strict binary choice? I mean, unless you’re building something like Christian Mingle type service (where religious dogma only allows you to recognize an arbitrarily chosen set of genders and appropriate parings), why would you want to wantonly ignore biology? If you are going to ask this question (and you have no business doing so in the first place), why not ask it the right way?

Is the Facebook form asking for sex, or gender? Because I honestly can’t tell? This is an important question to ask because Facebook has weird “real name” policies that could result in the suspension of your account if their support staff determines you “lied” on this question. So what do you put down biological sex does not match the gender you identify with? What if you don’t identify neither as male nor as female?

I think Twitter does this right:

Twitter Signup Form as of Oct 2014.

A single field for “full name” and no unnecessary questions about sex and gender. This is how it should be.

My personal rule of thumb for designing web forms: make them Time Lord friendly. Whenever you want to add or normalize a field, think how the protagonist of BBC’s Doctor Who series would fill it out. Your form should allow one to use The Doctor as the single and only personal identifier.

The Doctor does not have a first name
The Doctor does not have a last name
The Doctor does not have a middle name or middle initial
The Doctor does not have a set of initials based on name
The Doctor is not a given name
The Doctor does not have a family name
The Doctor does not use a honorific – it’s just The Doctor
No, you can not abbreviate Doctor as Dr. or anything else
The Doctor does not have a short name or nickname. You address him as Doctor
You can’t use Doctor’s date of birth to calculate age because he is a time traveler
The Doctor’s age won’t fit in your two-digit age field
The Doctor’s does not have a fixed height, eye color, hair color, etc..
The Doctor does not have a fixed ethnicity or skin color
The Doctor does not have a fixed gender

If you keep these things in mind you can avoid common pitfalls of web form design and build signup forms that are not only intuitive but also maximally inclusive.

Every time you touch the UI you break someone’s workflow

Luke Maciak — Wed, 13 Aug 2014 14:07:02 +0000

Let’s assume your app is currently in production, and has a non-trivial number of users. By non-trivial I mean a number that makes it impractical for you to write a personalized apology email to each and every single one of them when you lose their data. When you reach that sort of penetration, every time your developers touch the UI or anything directly adjacent to the UI it is bound to break someone’s workflow.

You might think you are fixing a long standing UI bug, or making the user interface more consistent and therefore user friendly, but it does not matter. At least one of your users probably worked the side-effects of said bug into the way they do things and it will appear broken to them afterwards.

Let me give you a few examples from personal experience. This is not a project I am personally involved in at the development side. For once I am actually sitting at the user-end and watching the fireworks. Let me set the stage for you: we have been using a third party time tracking tool for ages now. When we first deployed it, it was a self hosted application that we had to maintain ourselves. This involved periodically rebooting the server due to memory leaks, and applying the infrequent patches and upgrades. Prior to every upgrade we would first test it on a dummy instance, and would have the folks who used the tool extensively to do scheduling and processing time and expenses give it a once-over before we deployed it to production. If there were issues we would work with the vendor to iron them out prior to the deployment. It worked well.

Unfortunately about a year ago they discontinued support and licensing for the self-hosted version and we had to upgrade to their “state of the art” cloud based service. This was nice for me because it meant we no longer had to expend time and resources to maintain the tool internally. The end users were also happy because they would be getting all kinds of new bells and whistles to play with. The vendor promised the cloud version is developed and improved very aggressively based on user suggestions and that their new agile development process can deploy fixes and custom patches much faster than before. It sounded great on paper, but it turned out to be a disaster.

Our users after switching to the cloud platform.

The vendor likes to push out minor updates and patches every other Monday, and like clockwork this results in our ticketing system getting clogged up with timesheet software related issues. We verify all of these and tag-team grouping and compiling them into support requests who get forwarded to the vendor support team, and cc’d to our account manager. This is our third account manager since the switch and I suspect our company single-handedly got the last two fired by maintaining an unstoppable barrage of open tickets and constant demands for discounts and downtime compensation.

Most of the problems we are having stem from trivial “fixes” that make perfect sense if you are on the development team. For example, recently someone noticed that the box you use to specify how many hours you worked can accept negative values. There was no validation so the system wouldn’t even blink if you entered say negative five hours on a Monday. So they went in, added input validation, and just to be on the save side they fixed it in their database. And by fixed, I mean they took an absolute value of the relevant column, and then they changed the datatype to unsigned integer. Because if there were negative values there, they had to be in by a mistake, right? Because who in their right mind would use negative time? Well, it turns out it was my team. Somehow they figured out a way to use this bug to easily fudge time balances on the admin site. For example, if someone was supposed to work five hours on a Monday, but had an emergency and left three hours early, the admin would just go in and add -3 work hours to the timesheet with a comment. It allowed them to have both the record of what the person was supposed to do, and what actually happened. Needless to say, after the “fix” all our reports were wrong.

Ok, so you divide by zero here, put NULL in all these fields, put -1 here, and 1=1 in the date field, click through the segfault message and your report is ready.

More recently, they noticed that there were two ways for people to request time off in the system. You could create a time-off request ahead of time (which had to be approved by a supervisor) or you could submit it as a part your timesheet by putting in 8 hours as “personal day” or whatever. Someone on the vendor’s dev team decided to “streamline” the process and removed the ability to enter time off from the timesheet page. To them it made perfect sense to only have a single system pathway for entering time. Unfortunately my team relied on that functionality. We had a special use case for the hourly contractors which simply required them to record their downtime as “unpaid leave” (don’t ask me why – I did not come up with that). Before they could do that by simply filling out their time sheet. After the upgrade they had to go to the time off tab, and fill out a time of request for every partial day that week, then have that request approved by a supervisor before they could actually submit a timesheet. So their workflow went from clicking on a box and typing in a few numbers to going through 3-5 multi-stage dialog boxes and then waiting for an approval.

To the vendor’s credit, they are addressing most of these problems in a timely manner, and their rapid development cycle means we don’t have to wait long for the patches. They do however have serious issues with feature creep and each “fix” creates three new problems on average.

Pictured here: pair programming as implemented by our vendor.

Majority of these stem from the fact that our users are not using the software the way the developers intended to. They are using the application wrong… But whose fault is that? Should paying customers be punished or even chastised for becoming power users and employing the software in new, emergent ways rather than using it as you imagined they would? Every botched, incomplete or ill conceived UI element or behavior in your software is either an exploit or a power user “feature” in waiting.

I guess the point I’m trying to say is that once you deploy your software into production, and make it available to a non-trivial amount of users, it is no longer yours. From that point on, any “bug fix” can an will affect entire teams of people who rely on it. A shitty feature you’ve been campaigning to remove is probably someone’s favorite thing about your software. A forgotten validation rule is probably some teams “productivity crutch” and they are hopeless without it.

Full test coverage may help to limit the amount of “holes” your users may creatively take advantage of, but it only takes you so far. There is no way to automate testing for something you never anticipated users doing. You won’t even discover these emergent, colorful “power user tricks” by dog-fooding your app, because your team will use it as intended, rather than randomly flail around until they find a sequence of bugs that triggers an interesting side-effect and then make it the core of their workflow. This is something you can only find out if you work with genuine end users that treat your software like a magical, sentient black box that they are a little scared off.

LaTex: Continous Background Compilation

Luke Maciak — Mon, 09 Jun 2014 14:07:08 +0000

What I’m about to propose here is a bit unconventional, but it really works for me. I’ve been doing a lot of front and back end web stuff lately and as a result every machine I own or work with has Node installed by default. Not that I do much in node itself, but because it gives me access to some amazing front-end tools and utilities such as Bower, Yeoman and Grunt. That last one especially has become my go-to build tool as of late. Not necessarily because I love it’s syntax (it’s not great) but because of how it works.

It is a build tool unlike any other: instead of providing users with a monolithic set of build tasks and commands it is completely modularized, allowing you to mix and match both official and community made plugins. And it has a vibrant community that builds plugins for just about everything. Including LaTex.

Combining Latex and Grunt

One of my favorite features of Grunt is the ability to set up watch tasks that monitor your project files for changes and will continuously build and test your code while you work. You can combine it with a live-reload functionality which automatically refreshes your browser. When I work on web projects these days I can automatically see my changes taking effect on the second monitor. I can’t emphasize enough how much does this improve your productivity, and how cool it feels.

I decided I want that kind of setup when I work in LaTex, and it turns out that it can be accomplished using the same tools I use for web development. This is perhaps not a pure LaTex environment, and I am probably committing some horrible transgression here by altering the “proper” Tex workflow. This is definitely not how St. Knuth and St. Lamport intended their tools to be used. But it works really well.

The basic idea is to use Grunt to set up a watch task that will re-compile your document every time you save it. Web-developers probably already know where I’m going with this, but since this is a LaTex post, I’m going to assume you are not familiar with these very web-specific tools. So if you never used Grunt I’m going to explain it step by step.

First, you need to install Node, and then use the built in package manager known as npm to install the Grunt command line tool:

npm install -g grunt-cli

The -g parameter means “global”. By default npm installs packages into node_modules folder in the current directory, but the command line client must be installed globally. While npm is working and spewing all kinds of information into the console, you might go ahead and put that node_modules directory into your .gitignore file.

Now, that you are done with that, go into your project directory and install Grunt proper. Yes, you install it twice. The tool used for issuing build commands is installed globally, whereas the actual meat of the build engine along with all the plugins is installed locally in your project folder. If you think about it, it is a brilliant move as it ensures that every project gets it’s own self-contained environment. Not only that, but you can simultaneously work on two different projects: one which requires a bleeding edge version of the build tool, and one which uses deprecated logic tied to a very old version and they will never conflict with each other.

npm install grunt

It will be installed to the aforementioned local directory:

Installing Grunt

Next, you will need to install at least two plugins. One of them is the official watch plugin which will monitor files and trigger build tasks when they are changed. The other one is an excellent LaTex wrapper written by Tim von Oldenburg:

npm install grunt-contrib-watch grunt-latex

You can specify both plugins on the same line. In fact, you can install both grunt and the associated plugins in one command.

Installing Plugins

The nice thing about Tim von Oldenburg’s package is that it is configured to work with all the popular LaTex distributions so it should work on any platform. I tested it both on Windows and Linux and it was working flawlessly.

Once you have all the components of your build system in place, you can set up your “make” file, here known as a Gruntfile. In most circumstances these build files are written in Javascript and saved as Gruntfile.js in the root of your project directory. But I highly recommend using CoffeeScript instead, because it results in much smaller and readable files. Grunt does not care which of the two languages you use.

Don’t worry if you have never used CoffeeScript – the basic syntax needed to make a Gruntfile is no more complex than that of Make. Here is a very basic Gruntfile.coffee you can use for your project:

# Grunt boilerplate
module.exports = ->
    
    # Set up individual tasks
    @initConfig
        latex:
            src: ['main.tex']
        watch:
            files: 'main.tex'
            tasks: ['latex']

    # Specify tasks you want to use
    @loadNpmTasks 'grunt-latex'
    @loadNpmTasks 'grunt-contrib-watch'

    # Tell grunt what to do if no arguments are specified
    @registerTask 'default', ['latex', 'watch']

Let me explain. Lines 2 and 5 are just standard boilerplate that has to be there in every Gruntfile. In a CoffeeScript they are just two trow-away lines, whereas in the Javascript version they would be two nested closures which are about seven times as scary to someone who is not used to the “sideways x-mas tree” shape that language usually ends up with. CoffeeScript flattens it out a bit and makes it quite more readable.

Lines 13 and 14 declare the plugins that you will be using. We then set up specific option for each plugin in the @initConfig block (lines 6-10). By default the name for each plugin based task is the unique part of the plugin name. So it is latex for grunt-latex and watch for grunt-contrib-watch (all the “official” plugins have the contrib prefix). You use that name to refer to them in the initialization block.

For example the latex plugin requires only one option, which is src and it is used to specify your main .tex file. The watch plugin needs to options. First specifies which files to watch, and second is which tasks to run (which in our case is the latex task).

Finally, the last line defines the “default” taks – which is the one to be run if no tasks are specified on the command line. Normally, you can run a specific task by supplying it to grunt as an argument. For example:

grunt latex

Will run the latex task. If no arguments are specified the task default is run, if it exist. So it is always a good idea to register it in your Gruntfile. I set it to compile the document once, and then watch for changes from that point on. Running grun then throws it into an infinite loop, where it waits for file changes and re-compiles my document on the fly. Observe:

Using Grunt to Live Compile Latex (click to enlarge)

To stop Grunt from it’s active watch-loop you simply hit Ctrl+C.

Most PDF readers will auto-refresh your document when it changes, so as long as you keep it open on the side, or on the second monitor you will see the changes as they stream in. Adobe Reader for some reason locks any file it has open, and prevents LaTex from overwriting it, so you cannot use it with this setup. Pretty much anything else will work fine. In the screenshot above I’m using Evince which is the default PDF viewer on Ubuntu which has a perfectly serviceable Windows version.

You should keep in mind that this method is not perfect. For example, while the grunt-latex plugin is very robust and set up to work in just about every environment, it is not very versatile. It does have some configuration options, but it does not run external commands. This means that if you are using BibTex or a similar package which requires an additional binary to be run before or after latex, the plugin won’t help you.

The good news is that Grunt is incredibly flexible, so it is easy to work around such limitations. For example, in the document I was working on I needed to build a word index. This requires you to run the makeindex command each time you compile. I figured how to make this happen using the grunt-shell plugin which lets you configure a task that will run an arbitrary shell command for you. My Gruntfile ended up looking like this:

module.exports = ->
    
    @initConfig
        latex:
            src: ['main.tex']
        watch:
            files: 'main.tex'
            tasks: ['latex', 'shell', 'latex']
        shell:
            makeIndex:
                command: 'makeindex main.idx'

    @loadNpmTasks 'grunt-latex'
    @loadNpmTasks 'grunt-contrib-watch'
    @loadNpmTasks 'grunt-shell'

    @registerTask 'build', ['latex', 'shell', 'latex']
    @registerTask 'default', ['build', 'watch']

If I needed to add BibTex support I could just another entry under shell and then whenever that task was invoked both the indexing and BibTex tasks would run. Or, I could run them individually by specifying their name after the colon (shell:makeIndex instead of just shell).

What do you think of this setup? How do you compile your LaTex documents? Do you use a makefile of any sort, or do you rely on an IDE to run all the build tasks for you? And if you use an IDE, then which one? Let me know in the comments.

Scraping Reddit’s Json for Cool Pics

Luke Maciak — Wed, 04 Jun 2014 14:13:05 +0000

Did you know that you can add /.json to any Reddit URL to get a machine readable JSON document you can screw around with? You can test it yourself. For example, go to /r/vim/.json. It works for pretty much any kind of url, including multiredits. This has been part of the Reddit API for about seven centuries now, but I have never really paid attention. Until now, that is.

People sometimes ask me where do I get inspiration for shit like Ravenflight. Part of the explanation is of course being a natural genius like I am. Part is hanging out with other nerds, because crazy random stuff is bound to come up in a conversation. Finally, part of is the stuff I get exposed to on the internet. For example, I subscribe to a multitude of picture subs. Pretty much if it has “Imaginary” in the title or it is part of the SFWPorn thing (no, it’s not porn, it’s just pictures… Though /r/AnimalPorn should really consider changing the name to something that would raise less eyebrows).

One day I got a bright idea: what if I could create a multiredit of all these cool picture subs, and then scrape it for cool pictures and display them as a scrolling gallery. This way they would be much easier to browse (no need to click on the links or use the RES to expand them) and I could distill away all the unpleasant Redditry. Like the obligatory: “you idiot, why isn’t this imgur” or “way to go posting imgur instead of linking to source, you idiot” fight that happens every time anyone posts a picture on the internet ever. But mostly it was just a cool idea… Once I realized reddit was serving jSOn files for everything it was just too tempting not to mess around with them.

I briefly flirted with the idea of using an API wrapper such as ReditKit or Snooby and doing everything on the server side, but I quickly gave up on the idea. Part of it had to do with the fact that none of the wrappers I looked at actually did any rate limiting, which is one of the chief reasons why I wanted to use one in the first place. Syntatic sugar is really nice, but parsing jSon is relatively painless, whereas designing throttling and caching is exactly the kind of dumb and boring busy work I was trying to avoid. It also did not help that after an hour of impatiently flipping through the docs and running things in irb I still had no idea how to parse multiredits. It seems that 90% of the documentation was written with the expectation that people using these wrappers would be building funny comment-bots, and that remaining 10% of stuff was either self-explanatory or irrelevant.

Eventually I got annoyed and started fucking around in JSFiddle just so see if what I was thinking about was possible. It turns out it was, and that it was working remarkably well on the client side. You can see my prototype here:

Click on the results tab to see how it looks. I’m not sure if this is an impolite script because I’m still doing no caching or rate limiting here. But since all the fetching and processing it is happening on the client side I think I might be getting off on a technicality here. Even though the code might generate a lot of simultaneous requests, they will all technically come from different IP addresses so perhaps admins won’t yell at me for doing this.

I went ahead and dressed it up a little bit, and slapped a final, polished version at imaginary.pics for everyone to enjoy. So any time you want to look at some fantasy themed pictures of monsters and heroes, you can just type that into the address box in your browser and get inspired.

And yes, that’s a .pics domain, because why not. I like descriptive domains and I’m not afraid to use non-standard TLD’s if I can get away with it. You should have known that about me after I committed dontspoil.us back in the day. I’m quite excited about the crazy new TLD’s and being able to register all kins of dumb domains. Btw, it took me like an hour to stop clicking the “go again” link on that website, so you’re welcome. I’m calling dibs on wank.bank when that becomes available: I’m gonna just copy-paste some buggy porn-tube-clone code onto that and make like $millions.

By the way, the dumb.domains site seems to have an affiliate deal with somewhat shady registrar. If you are actually planning to buy a fun domain name, I’d recommend iwantmyname.com. Someone recommended them to me, and I really like the cut of their jib. Then again it might just be me. I previously bought domains through sites like Godaddy and Network Solutions so I was actually really confused when the registration process did not involve clicking through 17 pages of up-sell bullshit, and some lady’s cleavage was not being thrusted into my face from advertising banners. Their site is well designed, everything is intuitive and they seem like cool people. I wish I knew about them years ago.

Where do you usually buy your domains? Are you currently sitting on any domains that you bought because they were cool, but never actually put them to a good use? Have you ever stupidly bought a domain just to host five lines of Javascript like I just did? If so, what did you host?

Building a Jekyll Site

Luke Maciak — Mon, 05 May 2014 14:03:00 +0000

Back in 2009 I got a brilliant idea into my head: I was going to build a site on top of Joomla. Why? I still don’t exactly understand my own thought process that lead me to that decision. I think it had something to do that it was branded as a content management system and I had some content I wanted to manage. Perhaps it was because it looked serious and enterprisey and I wanted to try something different hoping it would be less of a pain in the ass than WordPress. Or perhaps it was a bout of temporary insanity.

Don’t get me wrong, Joomla is a wonderful CMS with a billion features that will let you do just about anything, but typically in the least convenient and most convoluted manner. I’d be tempted to say that Joomla engineers never actually tested their software on live humans before pushing it out into the public but I suspect that’s wrong. I suspect they have done a ton of usability testing, and purposefully picked the least friendly and the most annoying user experience. Because, fuck you for using Joomla.

Granted, they might have made some great improvements since 2009, but I wouldn’t know because upon slapping it on the server, and vomiting my content all over it, I decided I never actually want to touch anything on the Admin side of it ever again. On day two of my adventure with Joomla I decided that shit needed to go, but since I just manually migrated (read copy and pasted) like dozens of Kilobytes of content into it, I couldn’t be bothered. So I took out my scheduling book, penciled the site upgrade for “when I get around to it” and then threw the book out the window, because scheduling things makes me sad and hungry which is why I never do it.

Fast forward to 2014 and I was still happily “getting around to it”, when my host sent me a nasty-gram saying my Joomla is literally stinking up their data center. I had no clue what they were on about, since the installation was pristine clean, vintage 2009 build in a virgin state of never having been patched, updated or maintained. But since they threatened to shut all of my things down unless I get that shit off their server I decided it was time. I got around to it.

Fist step was straight up deleting Joomla. Second step was picking the right toolkit for the job. I briefly considered WordPress, but that’s a whole other can of worms, but for different reasons. WordPress is actually pretty great as long as no one is reading your blog. As soon as you get readers, the fame goes to it’s head and it decides it owns all the memory and all of the CPU time on the server, and demands a monthly sacrifice of additional Rams as your user base grows. It is literally the bane of shared servers, and most WordPress “optimization” guides start by telling you to abandon all that you know, and run like seventeen layers of load-balanced proxy servers in front of it. Not that Joomla performance is any better, but that site had no readers so it was usable. But since I was getting around to updating it, one of the goals was making it more robust and scalable, rather than trading a nightmarish clustefuck of crap for a moderately unpleasant pile of excrement. I figured I might as well go for broke and trade it for something good: like a mound of fragrant poop or something.

Since the site was on a shared host with a Quintillion users, and I didn’t feel like paying for and setting up yet another droplet I opted for a statically generated site. I tried a few static site generators and Jekyll is the one that did not make me want to punch the wall in the face (if it had a face) so I opted for that. Plus, I already had some basic layout done, so I figured I might as well use it.

The huge benefit of having a static site running on a shared host is that in theory you will never have to touch it, other than to update the content. The host will take care of the updating the underlying OS and web server, and since you have no actual “code” running on your end, there is noting there to break. Once you put it up, it can run forever without framework or plarform upgrades. It is a low effort way to run a site.

As far as front end went, I knew I wanted to work with HTML5 and that I wanted a grid based systems because making floating sidebars is a pain in the ass. So I whipped out Bower and installed Bootstrap 3.

I know what you were going to say: fuck Bootstrap, and I agree. Bootstrap is terrible, awful and overused. In fact, I think the authors of the project realized how much of a crutch it is, which is why they introduced a conflict with Google Custom Search in the latest version. Bootstrap literally breakes Google’s dynamic search box code, because fuck you for using Bootsrap.

But, it’s easy, clean and I love it, so I bowered it. Bootstrap consumes jQuery as a dependency so I got that for free. This is another useful framework people love to shit all over (though for good reasons) but since I already got it I figured I might as well use it for… Something.

One crappy thing about Bower is that when it fetches dependencies it puts all of them in bower_components directory, including useless garbage such as Readme files, build files and etc. Some people package their distributable code for Bower, but most projects don’t give a shit and just use their main repository and give you un-compressed, un-minified files along with all the associated miscellaneous garbage. I loathe having Readme files showing up on a deployment server, so I decided to manually minify and concatenate my scripts and stylesheet with Bootrstrap ones. For
that I needed Grunt. For grunt I needed Node. And so it goes. It is funny how one decision cascades into a dependency tree.

Runtime Dependencies

Pretty much at the onset, I decided I will be using the following:

Bootstrap & jQuery
Font Awesome for more icons
Google toolchain including Analytics, Custom Search and Webmaster Tools

Only the first item on the list is something you would want to install locally. The rest can be run off a CDN pretty reliably. Actually, you could run jQuery off a CDN too, but I decided not to. This makes your bower.json incredibly simple:

{
  "name": "My Site",
  "version": "0.0.0",
  "authors": [
    "Luke Maciak "
  ],
  "description": "Blah blah blah, website",
  "license": "MIT",
  "homepage": "http://example.com",
  "private": true,
  "ignore": [
    "**/.*",
    "node_modules",
    "bower_components",
    "test",
    "tests"
  ],
  "dependencies": {
    "bootstrap": "~3.1.1"
  }
}

This is the nice thing about static sites. Your production does not need a lot of setup – you just copy the files over and your done. All the heavy lifting is done at development time.

Dev Dependencies

Here our list is longer. I need things to build the code, manage the dependencies and some way of deploying it all to a server in a non annoying way.

Ruby and Gems
Jekyll
Node and NPM
Grunt
rsync for moving the files around between servers

I already had Ruby, Jekyll node and Grunt running on my system because… Well, why wouldn’t you. I mean, that’s sort of basic stuff you install on the first day when you get a new computer. So all I had to do was to steal a project.json file from another project and slap it in my directory:

{
  "author": "Luke Maciak",
  "name": "My Website",
  "version": "1.0.0",
  "dependencies": {},
  "devDependencies": {
    "grunt-html-validation": "~0.1.6",
    "grunt-contrib-watch": "~0.5.3",
    "grunt-contrib-jshint": "~0.1.1",
    "grunt-contrib-uglify": "~0.1.1",
    "grunt-contrib-concat": "~0.1.3",
    "grunt-contrib-cssmin": "~0.5.0",
    "grunt-contrib-csslint": "~0.1.2",
    "grunt-contrib-copy": "~0.4.1",
    "grunt-shell": "^0.7.0"
  }
}

Once it was in place, fetching all the grunt dependencies was a matter of running npm install. Now comes the hard part: setting up your Gruntfile.

Basic Setup

For the sake of completion, here is my complete Gruntfile

/*global module:false*/
module.exports = function(grunt) {

    // Project configuration.
    grunt.initConfig({
        validation: {
            options: {
                reset: grunt.option('reset') || true,
            },
            files: "_site/**/!(google*).html"
        },
    watch: {
        files: "",
        tasks: 'validate'
    },
    jshint: {
      files: [  'Ggruntfile.js', 
                'scripts.js'
             ],
      options: {
        white: false,
        curly: true,
        eqeqeq: true,
        immed: true,
        latedef: true,
        newcap: true,
        noarg: true,
        sub: true,
        undef: true,
        boss: true,
        eqnull: true,
        smarttabs: true,
        browser: true,
        globals: {
            $: false,
            jQuery: false,

            // Underscore.js
            _: false,

            // Chrome console
            console: false,

          }
      },
    },
    csslint: {
        lint: {
            options: {
               'ids': false,
               'box-sizing': false
            },
            src: ['style.css']
        }
    },
    cssmin: {
        compress: {
            files: {
                'style.tmp.min.css': ['style.css'],
            }
        }
    },
    concat: {
        options: {
            separator: ';' + grunt.util.linefeed,
            stripBanners: true,
        },
        js: {
            src: [
                    'bower_components/jquery/dist/jquery.min.js',
                    'bower_components/bootstrap/dist/js/bootstrap.min.js',
                    'scripts.tmp.min.js'
            ],
            dest: 'resources/js/scripts.min.js'
        },
        css: {
            src: [
                    'bower_components/bootstrap/dist/css/bootstrap.min.css',
                    'style.tmp.min.css'
            ],
            dest: 'resources/css/style.min.css'
        }
    },
    copy: {
        main: {
            files: [  
                {   expand: true, 
                    flatten: true,
                    src: 'bower_components/bootstrap/dist/fonts/*', 
                    dest: 'resources/fonts', 
                    filter: 'isFile'
                }
            ]
        },
    },
    uglify : {
        main: {
                 src: ['scripts.js'],
                 dest: 'scripts.tmp.min.js'
             }
    },
    shell: {
        jekyll: {
            command: 'jekyll build'
        }
    }
    });

    grunt.loadNpmTasks('grunt-contrib-watch');
    grunt.loadNpmTasks('grunt-html-validation');
    grunt.loadNpmTasks('grunt-contrib-uglify');
    grunt.loadNpmTasks('grunt-contrib-jshint');
    grunt.loadNpmTasks('grunt-contrib-concat');
    grunt.loadNpmTasks('grunt-contrib-cssmin');
    grunt.loadNpmTasks('grunt-contrib-csslint');
    grunt.loadNpmTasks('grunt-contrib-copy');
    grunt.loadNpmTasks('grunt-shell');

    grunt.registerTask('default', ['jshint', 'uglify', 'csslint', 'cssmin', 'copy', 'concat']);
    grunt.registerTask('all', ['default', 'shell', 'validation']);
};

It is a huge, monolithic pile of configuration so let me explain what I’m trying to accomplish here. In an ideal world, you want to have a single CSS file linked at the top of your page, and a single JavaScript file linked on the bottom. If you use Bower to handle dependencies (as you should) this is not possible, because every little thing you install gets it’s own folder in bower_components folder. So your first task is to pick out the important parts from each of those folders, smush them together into these two files. This is what is happening here.

For example, lines 56-62 run my custom CSS rules (style.css) through a minifier (using grunt-contrib-ccsmin) that removes all the spaces, and makes it super-ugly for the purpose of loading faster. Likes 96-101 do the exact same thing to my custom JavaScript code in scripts.js via grunt-contrib-uglify. So I end up with two very ugly files style.tmp.min.css and scripts.tmp.min.js. All of these files will be excluded from Jekyll compilation via _config.yml exclude list.

Once I have those, I use the grunt-contrib-concat plugin to concatenate my custom stylesheets and scripts with those provided by Bootstrap. You can see that in lines 63-83. I end up with my two ideal production ready files named: style.min.css and scripts.min.css. The new files are placed in resources/ directory.

A side effect of re-locating the Bootstrap script and CSS is that you break the glyphicons. The css files have relative paths to the web-font included in the bootstrap package, so if you want it to work it has to be in the fonts/ directory relative to the css location. This is what the 84-95 section is about. I’m taking all the files from bootstrap_components/dist/fonts/ and placing them in resources/fonts/ like this:

resources/
├── css
│   └── style.min.css
├── fonts
│   ├── glyphicons-halflings-regular.eot
│   ├── glyphicons-halflings-regular.svg
│   ├── glyphicons-halflings-regular.ttf
│   └── glyphicons-halflings-regular.woff
└── js
    └── scripts.min.js

The rest of the file is mostly concerned with linting. I check my css code with grunt-contrib-csslint and my JavaScript with grunt-contrib-jshint which is fairly standard. In both cases I’m relaxing the linting notes a little bit to preserve my own sanity, and to get around ugly hacks. For example ‘box-sizing’: false on line 51 is there to allow me to fix the aforementioned css that completely breaks Google’s Custom Search functionality. Similarly on line 35 I’m declaring $ as a global, because JSHint does not uderstand jQuery and freaks out for no reason.

I’m also using the excellent grunt-html-validation plugin to make sure my HTML is valid.

Finally, here is my _config.yaml file for Jekyll. It is mostly unremarkable, save for the exclusion list where I prevent Jekyll from copying all of the useless files into production.

name: My Site
description: blah blah blah
author: Luke

category_dir: /
url: http://example.com

markdown: rdiscount
permalink: pretty
paginate: 5

exclude: [
            package.json, 
            bower.json, 
            grunt.js,
            Gruntfile.js, 
            node_modules, 
            bower_components,
            validation-report.json, 
            validation-status.json,
            scripts.js, 
            scripts.tmp.min.js,
            style.css,
            style.tmp.min.css,
            lgoo.psd,
            Makefile,
            exclude.rsync,
            README.markdown
         ]

Grunt takes care of compiling and linting all the front end code, while Jekyll builds the site from an assortment of html and markdown files. I already wrote a lengthy article about setting up a basic Jekyll site before, so I won’t bore you with the details here. The basic skeleton looks like this though:

.
├── _config.yml
├── _drafts/
├── _layouts/
│   ├── category_index.html
│   ├── default.html
│   ├── page.html
│   └── post.html
├── _plugins/
│   ├── generate_categories.rb
│   └── generate_sitemap.rb
├── bower.json
├── bower_components/
│   ├── bootstrap/
│   └── jquery/
├── exclude.rsync
├── favicon.ico
├── feed.xml
├── Gruntfile.js
├── imag/
├── index.html
├── Makefile
├── node_modules/
├── package.json
├── README.markdown
├── resources/
│   ├── css/
│   ├── fonts/
│   └── js/
├── robots.txt
├── _site/
├── scripts.js
└── style.css

The bower_components directory as well as the “naked” JavaScript and CSS files are excluded from compilation, in lieu of the resources which contains files generated by Grunt. Other than that this is a fairly standard structure.

Deployment

As I said before I decided to use rsync to deploy the website. There are many ways to deploy a Jekyll website, but this is probably the most efficient tool for the job. In an ideal world, you compile a Jekyll site, and then rsync compares your _site directory to what is on the server and only copies/deletes files that are different. This means you first upload will be massive, but from that point on, you are just going to transfer the deltas.

There is a little caveat here though: by default rsync compares files based on timestamps. This is a problem because Jekyll clobbers the _site directory every time you build your site. This means that every file inside of it will look brand spanking new to rsync even if it has not technically changed. This downgrades our delta-sync tool to just a crude file uploader that is no more sophisticated than an rm -rf command followed by scp _site/* host:~/site.

Fortunately, I found an excellent tip by Nathan Grigg which suggests telling rsync to use checksums instead of timestamps. By force of habit, when setting up an rsync script most of us might be tempted to write something like:

rsync -az --delete _site/* user@host:~/site

This is the traditional and wrong way of doing this. What Nathan suggests instead is:

rsync -crz --delete _site/* user@host:~/site

Or perhaps, more descriptively:

rsync --compress --recursive --checksum --delete _site/* luke@myhost:~/site/

I actually like to use the long arguments when I write scripts, because years down the road they will make it easy to understand what is going on without looking up cryptic letter assignments in the man pages.

To simplify deployment I wrote myself a little Makefile like this:

.PHONY: check, deploy, tunnel-deploy, build

default: check, build

check:
	@command -v ssh >/dev/null 2>&1 || { echo "ERROR: please install ssh"; exit 1; }
	@command -v rsync >/dev/null 2>&1 || { echo "ERROR: please install rsync"; exit 1; }
	@command -v grunt >/dev/null 2>&1 || { echo "ERROR: please install grunt"; exit 1; }
	@command -v jekyll >/dev/null 2>&1 || { echo "ERROR: please install jekyll"; exit 1; }
	@[ -d "_site" ] || { echo "ERROR: Missing the _site folder."; exit 1; }

build: check
	grunt
	jekyll build

deploy: check, build
	rsync --compress --recursive --checksum --delete --itemize-changes --exclude-from exclude.rsync _site/* luke@myhost:~/site/
	ssh luke@myhost 'chmod -R 755 ~/site'

Tagging

Jekyll kinda supports tags and categories, but those are still rather underdeveloped features. When I build Jekyll sites I like to use Dave Perret’s plugin to get nice category archive pages. It also injects the category names into the “pretty” permalinks adding taxonomy to your url structure.

I have a very specific idea abut how tags and categories should be handled and how they differ. For me, categories group posts of a certain broad type, while tags are used to indicate specific topics/keywords that cut across the categories. So for example, you could have a category named “videos” and bunch of tags like “interview”, “trailer”, etc.. That said, the tag “interview” is not unique to the “videos” category and could also be used to tag posts in other categories like “pictures” for example. I like to have one category per post, but multiple tags. These is not a hard rules and most systems out there allow for more liberal use of both concepts. Dave’s plugin actually allows for multiple categories per post. But I typically stick to one. It is a personal preference of mine.

Categories are big, broad and there are few of them. I will often list them on the sidebar, and use them for navigation. Tags are different – they are more messy. So I opted to have a single page that would essentially be a table of contents by tag.

Michael Lanyon did an excellent writeup on how to alphabetize your tag list using nothing but Liquid tags in your template.

{% capture site_tags %}{% for tag in site.tags %}{{ tag | first }}{% unless forloop.last %},{% endunless %}{% endfor %}{% endcapture %}
{% assign tag_words = site_tags | split:',' | sort %}


   Table of Contents:
      
         {% for item in (0..site.tags.size) %}{% unless forloop.last %}
            {% capture this_word %}{{ tag_words[item] | strip_newlines }}{% endcapture %}
               
                  {{ this_word }} 
                  {{ site.tags[this_word].size }}
               
            {% endunless %}{% endfor %}
      

   Posts for each tag:

   

      {% for item in (0..site.tags.size) %}{% unless forloop.last %}
         {% capture this_word %}{{ tag_words[item] | strip_newlines }}{% endcapture %}

            
               
                  {{ this_word }}

               
                  {% for post in site.tags[this_word] %}{% if post.title != null %}
                     
                        
                           
                              {{ post.date | date: "%b %d, %Y" }}
                        {{ post.title }}
                  {% endif %}{% endfor %}
               
            
      {% endunless %}{% endfor %}

This works very nicely generating an alphabetized list that is easy to search through. You can link to the list for individual tag using a hashmark in the URL: http://example.com/tags/#tagname and it will take you to that section. That said, it can be a bit confusing for the user to get dumped into the middle of a huge list of unrelated things. So I built upon Dave’s idea and added some Javascript to the mix.

I figured I already have a jQuery dependence, so I might as well use it:

var MySite = MySite || {};

MySite.showSingleTag = function showSingleTag() {
    $(".tag-list").hide();
    $(window.location.hash).show();
};


$( document ).ready(function() {
    if( window.location.hash )
    {
        MySite.showSingleTag();
    }

    $(window).on('hashchange', function() {
        Gigi.showSingleTag();

        // scroll to the element
        $('html,body').animate({scrollTop: 
            $(window.location.hash).offset().top},0);
    });       
});

This script detects if there is a hash in the URL, and if so hides all the entries, except the ones related to the relevant tag. I left the list of tags alone, because I figured the user might want to explore what else is available. Because of this a little bit of additional logic was added. If you click on a hash-link the browser page won’t reload, and thus my hashmark check won’t trigger. So on line 15 I check if the URL hash changes, and if so I re-do the hiding, and then I forcefully scroll the user’s viewport back to the tag list.

TL;DR

I have successfully switched from Joomla to Jekyll and it’s great. I’m totally not going to regret this choice 5 years down the road, right? I mean, what could go wrong, other than everything. Actually, I’m already begging to see cracks forming in this master plan. You see, the site has a lot of images. They are mostly low to medium resolution screen-shots, but there are a lot of them, and there will be many more if I actually keep updating this thing more than once a year. As part of the update I added about 100MB worth of images, which is not a terrible lot but it has slowed the Jekyll compilation times quite a bit. So this is bound to get super annoying real quick… But I guess that’s par for the course: all software sucks, and it is a fucking miracle the internet even works seeing how nearly every website in existence is held in place with a digital equivalent of duct tape.

You can see the fruits of my labor at gigiedgleyfansite.com. While it’s not perfect, I think it is a huge improvement over the old Joomla based site. Let me know what you think.

3 Tiny Vim Plugins That Will Make Your Life Easier

Luke Maciak — Wed, 02 Apr 2014 14:02:44 +0000

There is a religious movement within the Vim community which emphasizes purity of the environment and rejects superfluous plugins and advises adherents to meticulously prune their .vimrc to keep it nearly empty and thus clean from impurities. I personally do not agree with this philosophy, but I do see a point in it. Sometimes having too many plugins might lead to conflicts and weird behavior. That said not all plugins are created equal.

Some, like the infamous Unite plugin are expansive and complex because aim to be kitchen sinks and fountains of utility. Others are tiny and follow the unix philosophy of doing only one thing, but doing it well. Today I would like to talk about three of such tiny plugins that you might find worthwhile additions to your web design toolkit.

The first nifty little plugin adds a feature to vim which I see requested and asked about all the time. Most modern IDE’s or editors aspiring to be developer tools have built in auto-completion for common block and scope identifiers such as parentheses, braces, square brackets and quotation marks. As you open a brace, the editor automatically inserts a matching close brace character on the other side of your cursor. This is exactly the functionality offered by vim-AutoClose.

Vim-AutoClose Functionality

In the example above you might notice that after I typed the keyword test( and the plugin automatically added the matching ) my cursor next leaped behind it. This is because I actually hit the ) key and vim just skipped the character. This is by design. The author likely realized that auto-closing brackets could easily break the flow of your typing, unless it allowed you to over-type on top of what it inserts. In other words, you can completely ignore that this plugin is enabled and type as you would normally do and all will be well. Only every once in a while you will be spared typing a few characters here and there as it will automatically close the scopes for you.

That’s it. That’s all it does. There is no configuration, no key-bindings and no side effects. It simply auto-closes brackets and quotes for you. It will work everywhere, though the most obvious benefit is in C style languages that use a lot of braces, and brackets.

Speaking of closing things, one of my least favorite things about HTML is closing tags. It is just a lot of extra typing that’s basically busywork. Some IDE’s actually try to spare you that extra work, by adopting auto-close approach like they do with brackets and braces which may or may not be something you want. Other tools, like the editor built into WordPress offer a button you can press to close opened tags at your leisure which is much less intrusive. The Closetag plugin offers you just that: an extra button for closing tags:

CloseTag Plugin in Action

As you can see above, I have a bunch of nested HTML tags and all I have to do to close all of them is to hit Ctrl+_ (that’s control and underscore character) three times to close all of them. Each time this combo is pressed, vim searches for the nearest tag that is not closed and generates a matching pair. It works even if your text spans multiple lines, and it wisely ignores tags like which do not need to be matched.

Vanilla vim lets you use the % key to jump between opening and closing brackets or parentheses. It lets you rapidly move around C style code bases, but it does nothing when you are editing HTML. The Vim-MatchIt plugin extends this functionality to HTML files and lets you use % to jump between opening and closing HTML tags:

Vim-MatchIt in Action

I’ve been told that this actual plugin gets included by default in many mainstream vim distributions these days. I had it in my plugin directory for ages now, so I just carry it around just in case.

There you have it: three super-minimalistic, unobtrusive plugins that do not change much int terms of core behavior, but add a lot of convenience into your workflow. What are some of your favorite tiny vim plugins? How about Emacs or Sublime people? Let me know in the comments.

Let’s Learn LaTex: Part 7

Luke Maciak — Wed, 26 Mar 2014 14:02:50 +0000

Let’s talk about embedding code snippets in your LaTex documents. All of us here are programmers (except for those who aren’t) so this is a natural thing we might want to do. Despite being built by programmers for programmers, LaTex doesn’t really have a code mode. Knuth spent around a lot of time on making sure that Tex is the best tool around for typesetting math equations, which was probably a good choice. After all, math is the universal language of science and formulas are the “meat” of any serious research paper. This goes double for Computer Science papers which should be liberally sprinkled with it. Snippets of code are “implementation details” and thus of little importance in the grand scheme of things.

Still, embedding code in documents is something we may want to do. The traditional method is using the verbatim environment which respect white space and creates a nice, preformated, mono-spaced code blocks. For example, lets take the following snippet:

\noindent Here is a code snippet in Python:

\begin{verbatim}
# Hello world
def Hello():
    foo = 10
    print "Hello World!"
\end{verbatim}

Upon compiling, it will look like this:

Verbatim Environment

While this is not terribly impressive, it is good enough for most research papers. That said, if you are using LaTex for writing manuals, homeworks or school papers you might want your code blocks to be a little bit more flashy. Perhaps they could include line numbers, and some color. How do you do that?

With a package of course. There are a number of different packages that offer this kind of functionality, but my favorite is probably the one named lstlistings. It is not the most powerful or flexible one, but it does not have any dependencies (except maybe the color package), does not rely on external tools and it is easy to use.

You include it in your preamble like this:

\usepackage{listings}
\usepackage{color}

Then you put your code snippet inside the lstlisting environment instead of verbatim:

\noindent Here is a code snippet in Python:

\begin{lstlisting}
# Hello world
def Hello():
    foo = 10
	print "Hello World!"
\end{lstlisting}

The result is rather unimpressive:

Listings package without any configuration.

Yes, it does actually look worse than the verbatim environment, but that’s only because we have never configured it. You see, the lstlistings package is old school, and like many LaTex packages it adheres to the fuck reasonable defaults philosophy. The idea behind this attitude is simple: the user should know what they are doing, or else go die in a fire. Yes, I know – it’s charming. But what are you gonna do? Go back to using Microsoft Word? I didn’t think so.

Configuring our code environment is actually not a major pain in the ass. You basically need to include a short lstset block somewhere in the preamble. It can be as brief or as detailed as you want. I typically go with something like this:

\lstset{ %
	language=python,
	numbers=left,
	frame=single,
	showstringspaces=true,
        basicstyle=\ttfamily,
       	keywordstyle=\color{blue}\ttfamily\textbf,
	identifierstyle=\color{magenta}\ttfamily,
	stringstyle=\color{red}\ttfamily,
	commentstyle=\color{cyan}\ttfamily\textit
}

The above will set the default language for all the code blocks in the document to be python (don’t worry, this can be overriden), toggles on line numbers, puts a border (frame) around the code and sets up some basic font and color formatting. You should remember commands like \textbf from Part 2. The color related commands are provided by the color package.

The resulting code block will look like this:

Listings package with config.

The lstlistings package makes up for having no useful defaults by being very configurable. You can control just about any aspect of the way your code is displayed. You not only have control over fonts and colors, but you can mess around with the line numbers, visual indicators for whitespace characters, and change which words are treated as keywords. Here is a nearly complete list of config options you may find useful:

\lstset{ %
  language=python,                 % the language of the code
  basicstyle=\ttfamily,        	   % style for all of the code
  keywordstyle=\color{blue},       % keyword style
  identifierstyle=\color{magenta}, % style for variables and identifiers
  commentstyle=\color{green},      % comment style
  stringstyle=\color{mymauve},     % style for string literals

  tabsize=2,                       % sets default tabsize (in spaces)
  backgroundcolor=\color{white},   % requires the color package
  frame=single,                    % put a border around the code

  numbers=left,                    % line numbers (none, left, right)
  numberstyle=\tiny\color{red},    % the style that is used for line numbers
  numbersep=5pt,                   % how far the line-numbers are from the code 
  
  showspaces=false,                % visibly indicate spaces everywhere
  showstringspaces=false,          % visibly indicate spaces within strings only
  showtabs=false,                  % visibly indicate tabs within strings
  
  breaklines=true,                 % toggle automated line wrapping
  breakatwhitespace=false,         % if wrapping is on, enable to break on whitespace only

  deletekeywords={...},            % exclude keywords from chosen language
  morekeywords={*,...},            % add words to be treated as keywords
      
  captionpos=b                     % sets the caption-position (bottom)
}

The package supports a good number of languages out of the box. All the popular ones are represented, alongside some rather obscure ones (like Motorola assembler). Here is a table of languages ripped straight out of the manual:

Supported Languages. Dialects are listed in parentheses with defaults underlined. If there is no default, dialect must be specified.

Dialects are listed in parentheses, with the defaults being underlined. As you probably noticed, not all languages that have dialects include a default (because fuck defaults, remember?). For those languages you must specify one in the declaration. The method of doing it is rather awkward:

\lstset{
	language={[x86masm]Assembler}
}

It doesn’t look pretty, but it works.

I mentioned before that you can override the language setting. You can also skip it from the preamble declaration, but putting it there is usually a good idea. In most documents you will be posting snippets in the same language over and over again, so this saves you typing. If you want to use a different language, or you did not specify one in preamble, you will need to include it as an optional argument when you open the lstlisting environment like this:

\begin{lstlisting}[language=ruby]
	puts "Hello World!"
\end{lstlisting}

You can also override any of the other options this way. Just list them after the language declaration.

If you are super lazy, and can’t be bothered to copy and paste a code snippet into your paper, you can include it using the \lstinputlisting command. It has one mandatory argument which is the file name:

\lstinputlisting[language=python, firstline=7, lastline=18]{filename.py}

You can specify any regular \lstset settings as optiononal arguments. The firstline and lastline arguments can be used to import only a part of the file. If you skip them, the entire file will be embedded.

Let’s Learn LaTex
<< Prev	Next >>

About Code Snippets

Luke Maciak — Mon, 25 Nov 2013 15:04:27 +0000

This will be an exercise in brevity. Originally I wanted to post this thought on Twitter, but I could not figure out how to compress it into 140 characters. I could have posted it on Google+ but I figured I might as well publish it somewhere where at least few people will actually read it.

Let’s say you made a thing: be it a function, a library, a framework or perhaps a utility of some sort. You publish it online for the world to see, and as it is our ancient custom, you provide a working example to show users how it works. This example typically takes a form of a concise code snippet that aims to showcase how your thing works. Here is something a lot of people don’t think about when they post code snippets online: what happens with them afterwards. Here is a hint:

Copy and Paste into Production

This is a well known fact: most code snippets that get posted online, invariably find their way into production code somewhere. No matter how many dire warnings you post before, after and even inside the snippet, someone, somewhere will straight up copy and paste it directly into their mission critical production environment. If your code includes an obvious security hole, or stealthy hisen-bug waiting for specific use case to manifest, it will be enshrined there forever. Even if you fix it on your page, the buggy version will likely never get updated.

There is nothing you can do about it, and the potential problems this could cause will be on them, but… Now that you know this, you can take steps to ensure that snippets you publish online are of high quality, and that examples you use are actually appropriate. I think the best example of this was from in the vast repository of terrible code snippets that is php.net where one of the contributors said something along the lines of this function is not rated for crypto and should not be used for authentication, but let me show you how to generate random salts with it. I can’t remember where exactly I have read this, and it might have been excised and nullified by now. Still, you should take my word for it, because I have a blog which means means I am a journalist, and journalists have integrity. Integrity is a fancy way of saying their shit is legit.

Is this a real thing?

Unfortunately, making sure your code is of high quality not enough. If your example is long, boring and complicated, someone is bound to take it, rip out important, crucial chunks and post a simplified, buggified and incomplete version for all of the world to see. And then people will use that snippet because it is short, and easier to grok on the first glance. I know this, because I have done this. I’ve been on both ends of this equation.

By that I mean that more than once I have taken a snippet of code, deleted lines I did not comprehend or care to research. As long as the resulting abridged version compiled I typically declared victory, and triumphantly vomited it onto the internet via a blog post or two. Not on this blog of course, but in other places. Actually, maybe on this blog too… Yeah, this definitely happened on this blog, didn’t it?

I’ve been on the other side too. Went to the official documentation site and found it wanting and so I copied and pasted from somewhere else. Not into production of course… Actually, probably into production. But if something works, it works, until it doesn’t and then you blame it on the intern. Preferably the one that no longer works here, for plausible dependability.

I guess what I’m saying is that any code you post online not only ought to be correct and feature the best, most secure way of doing things. It should also be concise to the point of being irreducible. Because if it isn’t, someone will reduce it for you, and then we are back to square one. How do you accomplish such a feat? It’s easy: pick good defaults.

Whenever you make something, the easiest way to call or execute it should always fall back onto the “right” defaults, and follow best practices. The people who want to half-ass things, jerry-rig or tinker then can boot-strap their own little disasters by passing parameters or using command line switches or what not. But the clean and concise way should always try to do the right thing. Of course this is not always possible, but it never hurts to try.