WordPress: Vanishing Categories

Roughly a month or so ago, something weird happened to this website. It was one of those weird and a bit scary glitches that make you question your own sanity because they come out of nowhere and they have seemingly no reasonable explanation. I was busy typing away a new post when I noticed that all my tags and categories simply vanished.

The content, mind you was still there. All the posts and images were intact. They simply lost their category and tag associations. I have never actually seen anything like this before so my first thought was “database corruption”. I’m not sure how the DB could get corrupted, but WordPress is famously finicky about these sort of issues. It is not uncommon to see a badly written plugin touch one of the core WordPress tables in a bad way and make it freak out.

I promptly logged into the server and started running exploratory queries, but most of them came back looking very normal. The tag and category tables still had all of the entries in there, and posts were still correlated with them via foreign keys. The schemas of all the tables looked normal and I couldn’t detect any sign of plugin induced damage or even malicious tampering. All the information was in the database, but the UI refused to make the connections.

Luckily this is not the first time (and probably not the last time) I have seen WordPress go completely wonky. I have learned that running an active WordPress site without nightly backups is pretty much actively seeking out headaches. So after scratching my head for two hours, I decided to roll my VM back to the last night’s snapshot and see if that fixes the issue. Before that I decided to check how much disk space I have left.

It turned out that I had literally zero bytes.

Just on a lark I blew away the contents ~/temp and refreshed the site. The categories and tags have magically returned, but only partially. It appears that in order to render the tags and categories and associated pages WordPress needs to write a buch of temp files to disk. I have no clue why this happens, but I’m assuming it is an optimization strategy of some sort that is intended to limit the number of database reads per page view. However, if your disk is filled to the brim, it can’t do that. Therefore it fails silently and does the best to render the page without that additional information.

On one hand it is quite amazing that inability to perform some internal core caching does not bring the entire site down. On the other hand it seems wrong to me that such an operation is necessary. But despite using WordPress for many years now I have never actually felt compelled to peel the hood back and look at it’s database queries, so I guess I shouldn’t criticize something I don’t know all that much about.

Over the years I have gotten pretty good at cleaning up linux machines from accumulating temp file cruft. So it only took me a few minutes to identify the source of my disk bloat. It was the temp directory used by my WP-Cache plugin which ballooned up to few Gigabytes somehow. Apparently the old cache files were being discarded but never deleted. Blowing away the entire cache reduced by disk usage from 100% to 35%.

To prevent this sort of thing happening again I wrote a tiny guard-dog script that checks my disk usage on a weekly basis and prints out a nice report that is then emailed to me via a cron job:

#!/bin/bash
 
# use colors if available
[ -f "$HOME/scripts/colors" ] && source $HOME/scripts/colors
 
command -v awk >/dev/null 2>&1 || { echo "awk not found. Please install it and try again"; exit 1; }
command -v du >/dev/null 2>&1 || { echo "du not found. Please install it and try again"; exit 1; }
command -v df >/dev/null 2>&1 || { echo "df not found. Please install it and try again"; exit 1; }
 
# grab the % usage of the primary partition (typically line 2, col 5 on df)
read USAGE <<< $( df -h | awk 'FNR == 2 { print $5 }' )
 
# Make red if usage is above 60
if [ ${USAGE%?} -lt 60 ]; then
    Color_On=$Color_Green
else
    Color_On=$Color_Red
fi
 
echo -e "\nDisk Usage Report"
echo -e "-----------------\n"
 
echo -e "Disk usage: \t $Color_On$USAGE$Color_Off\n"
 
df -h
 
echo -e "\nLog file spot check:\n"
 
# Adding output to temp file so we can sort it later
# -sh provides human readable summary
# -BM sets the block size to Megabytes
du -shBM /tmp 2>/dev/null >> /tmp/du$$
du -shBM /var/log 2>/dev/null >> /tmp/du$$
du -shBM /srv/www/*/logs 2>/dev/null >> /tmp/du$$
du -shBM /srv/www/*/*/*/wp-content/cache 2>/dev/null >> /tmp/du$$
du -shBM /srv/www/*/*/*/wp-content/uploads 2>/dev/null >> /tmp/du$$
 
sort -nr /tmp/du$$
 
rm /tmp/du$$

The full version of the script is actually available here. The colors script I’m importing up top is also on Gighub if you want to check it out.

If you ever notice tags or categories vanishing from your blog, don’t panic. It probably just means your disk is full.

This entry was posted in sysadmin notes and tagged . Bookmark the permalink.



2 Responses to WordPress: Vanishing Categories

  1. IceBrain PORTUGAL Mozilla Firefox Windows Terminalist says:

    What do you use for mail delivery? Nowadays I run my own mail server, so I can whitelist these emails, but back when I used Gmail I was always worried it’d get caught on the spam filter, since it came from essentially unknown and therefore untrusted servers. The alternative is creating an account on some delivery service and doing the whole smarthost configuration, but that sounds overkill for a personal VPS.

    Reply  |  Quote
  2. Luke Maciak UNITED STATES Google Chrome Linux Terminalist says:

    @ IceBrain:

    The server is running it’s own smtp so that WordPress can send out notifications, but the incoming email is handled by Gmail. I set it up via Dreamhost back in the day, and they handled most of the work but I think it mostly involved setting up a “apps for your domain” account on Google and then pointing the MX records at their mail servers. Never really had any problems with it, or any need to tinker and hence I don’t even remember the details.

    Google doesn’t seem to mind emails sent from weird servers. Their Spam filter is fantastic IMHO and instead of running blacklists/whitelists they just do heuristics on the content.

    Google has gotten so good at identifying 419 emails that the scammers recently started moving towards obfuscation methods. The latest scam mail I had no body, no subject and a badly scanned JPG attachment of a printed letter hand signed by the Crown Prince of all Nigeia and the president of Nigeian National Bank as it’s sole content. :P

    Meanwhile the emails sent by my bash scripts go through without an issue.

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>