I don’t know if any of you guys use WordPress on a regular basis. Probably not, because true geeks use Jekyll to blog these days. I don’t blame you. I love it too but a site of this size and complexity is just at the edge of maintainability with Jekyll. Once your post number exceeds a few thousand, the poor thing just starts dragging it’s feet when generating. Granted, once you generate your blog, nothing beats the speed and responsiveness of fully static blog, but WordPress works for me for the most part. Which is not to say it is devoid of issues or bugs. Up until recently however, I haven’t really experienced any major issues with the software, other than it’s tendency to eat up all the available memory on my VM and then crash it whenever there is a slight traffic bump. But that’s more or less normal.
Yes, WordPress is kinda famous for performing like absolute shit under heavy loads. The people who build and maintain servers running wordpress in high stress environments highly recommend putting WP and Apache behing a Nginx proxy with hard caching and offloading highly dynamic stuff like comment feeds to client side third party plugins like Disqus. This way readers get static pages, whereas content creators can hit the live site to make changes. But I digress.
WordPress works for me. I have been using it for many years now, and it has improved quite a bit in that span of time. It has grown new features that I’m fond of. One of such features is the post revision history, which is something that in my honest opinion should be baked into every OS and platform by default. I mention this feature specifically because this is where my problem started. One day I have noticed that the revision history on some of my posts looked like this:
What is this, I don’t even… Have we been hacked?
Fortunately no. If someone 0wnd my server there would likely be some traces of malicious activity. Like log entries, or maybe dick enlargement add in a malware installing iframe on my front page. Nothing like that was going on though. In fact, these phantom revisions with blank author seemed to do stuff that was almost opposite to what you would expect a malicious script to do.
Malware that takes over blog sites usually seeks to inject unsafe html and script tags into post and comments in order to perform drive by downloads, infections, MITM attacks and the like. My ghost posts were doing the opposite. They were stripping any nonstandard and suspicious tags from my posts.
If you look at my programming articles, you probably notice that I use a lot of styling in order to make code snippets look nice. I use GeSHi (the same code highlighter I used for that PasteBin project) driven plugin to colorize them. It is a filter hook that fires when the post is rendered and looks for a pattern like:
<pre lang="php"> // code here </pre>
Everything inside the <pre> tags gets auto-escaped and colorized so I don’t have to worry about sanitizing the code. I can actually paste raw HTML snippets into my article without the fear of breaking my layout. Unless of course some rogue phantom decides to reopen my post and strip all he attributes from my tags turning every instance of <pre lang=”php”> into a plain <pre>.
Some of you have pointed this out in the comments when some of my PHP Like a Pro posts would go up, they would be horribly broken with text boxes in the middle of the page, missing sidebar and the like. This was my ghost poster sanitizing my HTML to conform to some weird purity standard. In addition to stripping attributes from <pre> tags it would also completely remove the more obscure tags such as <samp>, <var> and <kbd> (which I extensively use to make inline code stand out or to make fancy keyboard button like font) and strip things like Youtube iframe embed code.
This was far from malicious but as you can imagine, extremely annoying. Fortunately it only seemed to happen to posts that were queued up to auto publish in the future. So an immediate solution was to make sure I publish the posts manually instead of putting them in the queue. Sadly this was sort of messing up with my workflow and schedule. I always write my blog posts ahead of time, and schedule them to publish around 10am on the weekdays. This way posts go out whether or not I happen to be busy at work, and I don’t need to think about the process. So I was kinda desperate for solution.
After much digging I have uncovered that this was actual, genuine, honest to goodness WordPress bug:
Even better, the issue has already been fixed by one Sergey Biryukov in this patch and scheduled to be rolled into the 3.5.1 release. Which is scheduled to drop… Actually, I have no clue. I tried checking their release schedule and deadlines but after a little while I developed a mild case of who gives a fuck and moved on with my life.
I can report that I tested the patch on a spare server and found it working. Then I applied it to the production server, and have been running with it for over two weeks now and nothing broke yet. The scheduled posts are working as they should and I have no more ghost revisions anywhere.
If you happen to be in the same boat as me, running WordPress 3.5 and running into this bug here is what you do. First ssh into your server, then:
cd /your/wordpress/path/wp-includes wget http://core.trac.wordpress.org/raw-attachment/ticket/22944/22944.3.pat ch patch -b post.php 22944.3.patch rm 22944.3.patch
The -b on the patch command is for backup. It actually creates post.php.orig backup file in the same directory so you can restore it in case the patch fouls something up (which it shouldn’t).
Anyways, I hope this was helpful to someone out there. I figured it was a good idea to document this bug for posterity. If you don’t want to fuck around with patches I recommend buckling up and waiting till the 3.5.1 gets released which should be pretty soon.
TL;DR: I exorcised a ghost out of my WordPress and all is well now. Move along.