Scraping Reddit’s Json for Cool Pics

Did you know that you can add /.json to any Reddit URL to get a machine readable JSON document you can screw around with? You can test it yourself. For example, go to /r/vim/.json. It works for pretty much any kind of url, including multiredits. This has been part of the Reddit API for about seven centuries now, but I have never really paid attention. Until now, that is.

People sometimes ask me where do I get inspiration for shit like Ravenflight. Part of the explanation is of course being a natural genius like I am. Part is hanging out with other nerds, because crazy random stuff is bound to come up in a conversation. Finally, part of is the stuff I get exposed to on the internet. For example, I subscribe to a multitude of picture subs. Pretty much if it has “Imaginary” in the title or it is part of the SFWPorn thing (no, it’s not porn, it’s just pictures… Though /r/AnimalPorn should really consider changing the name to something that would raise less eyebrows).

One day I got a bright idea: what if I could create a multiredit of all these cool picture subs, and then scrape it for cool pictures and display them as a scrolling gallery. This way they would be much easier to browse (no need to click on the links or use the RES to expand them) and I could distill away all the unpleasant Redditry. Like the obligatory: “you idiot, why isn’t this imgur” or “way to go posting imgur instead of linking to source, you idiot” fight that happens every time anyone posts a picture on the internet ever. But mostly it was just a cool idea… Once I realized reddit was serving jSOn files for everything it was just too tempting not to mess around with them.

I briefly flirted with the idea of using an API wrapper such as ReditKit or Snooby and doing everything on the server side, but I quickly gave up on the idea. Part of it had to do with the fact that none of the wrappers I looked at actually did any rate limiting, which is one of the chief reasons why I wanted to use one in the first place. Syntatic sugar is really nice, but parsing jSon is relatively painless, whereas designing throttling and caching is exactly the kind of dumb and boring busy work I was trying to avoid. It also did not help that after an hour of impatiently flipping through the docs and running things in irb I still had no idea how to parse multiredits. It seems that 90% of the documentation was written with the expectation that people using these wrappers would be building funny comment-bots, and that remaining 10% of stuff was either self-explanatory or irrelevant.

Eventually I got annoyed and started fucking around in JSFiddle just so see if what I was thinking about was possible. It turns out it was, and that it was working remarkably well on the client side. You can see my prototype here:

Click on the results tab to see how it looks. I’m not sure if this is an impolite script because I’m still doing no caching or rate limiting here. But since all the fetching and processing it is happening on the client side I think I might be getting off on a technicality here. Even though the code might generate a lot of simultaneous requests, they will all technically come from different IP addresses so perhaps admins won’t yell at me for doing this.

I went ahead and dressed it up a little bit, and slapped a final, polished version at imaginary.pics for everyone to enjoy. So any time you want to look at some fantasy themed pictures of monsters and heroes, you can just type that into the address box in your browser and get inspired.

And yes, that’s a .pics domain, because why not. I like descriptive domains and I’m not afraid to use non-standard TLD’s if I can get away with it. You should have known that about me after I committed dontspoil.us back in the day. I’m quite excited about the crazy new TLD’s and being able to register all kins of dumb domains. Btw, it took me like an hour to stop clicking the “go again” link on that website, so you’re welcome. I’m calling dibs on wank.bank when that becomes available: I’m gonna just copy-paste some buggy porn-tube-clone code onto that and make like $millions.

By the way, the dumb.domains site seems to have an affiliate deal with somewhat shady registrar. If you are actually planning to buy a fun domain name, I’d recommend iwantmyname.com. Someone recommended them to me, and I really like the cut of their jib. Then again it might just be me. I previously bought domains through sites like Godaddy and Network Solutions so I was actually really confused when the registration process did not involve clicking through 17 pages of up-sell bullshit, and some lady’s cleavage was not being thrusted into my face from advertising banners. Their site is well designed, everything is intuitive and they seem like cool people. I wish I knew about them years ago.

Where do you usually buy your domains? Are you currently sitting on any domains that you bought because they were cool, but never actually put them to a good use? Have you ever stupidly bought a domain just to host five lines of Javascript like I just did? If so, what did you host?

6 Responses to Scraping Reddit’s Json for Cool Pics

Chris Wellons says:

June 4, 2014 at 12:10 pm

This is really slick! Would you consider adding infinite scrolling? You could use the “after” argument to cleanly continue from where it left off.

I’ve been considering getting a domain for my shadowban test tool: nullprogram.com/am-i-shadowbanned/ (notice the URL change when querying a name). I haven’t come up with any good ideas, though. A domain like shadowb.an could be cool, but .an isn’t available anymore. There’s no .ed, which would have been good for an am-i-shadowbann.ed domain.

Reply | Quote
Wesley says:

June 4, 2014 at 2:32 pm

iwantmyname asking for money for .tk domains is bullshit.

You can get them for free.

It’s really, really scetchy, but free, so I use it a lot for quick/temporary one-off DNS entrys.
Also, beware that they save your (non-hashed) password.

Reply | Quote
Gothmog says:

June 4, 2014 at 2:35 pm

Wow, Luke! Well done! You should have a form at the top of your page that we could add different subreddits so we could cook our own…

Reply | Quote
Luke Maciak says:

June 6, 2014 at 9:43 am

@ Chris Wellons:

Yes, infinite scrolling is a great idea. I was considering adding it, but I figured I should get the basic functionality first. But yeah, that’s the next thing I’m planning to add.

As for the url, how about shadowban.me or shadowban.us? Shadowban.club? Shadowban.technology? :P

@ Wesley:

Oh, neat. I didn’t actually realize you could get a .tk for free. Thanks for the link. I do not see that many .tk domains around these days. I guess this might be a bit like the ultra-cheep .info domains that everyone stopped using because they became really popular among scammers and spammers.

What really sold me on iwantmyname is that they offer WHOIS masking for free. Godaddy on the other hand charges like $18/year for and equivalent service.

@ Gothmog:

That’s actually an excellent idea. I’m going to add that after I’m done with infinite scrolling. :)

Reply | Quote
Gothmog says:

June 9, 2014 at 5:00 pm

Heck, you could buy reddit.pics and set it up so others can just plug in subs to their hearts content- of course that might take off and you might be stuck with a hefty server bandwidth cost. Heck, if the internet at large figures out how to use it to build NSFW collages, etc…

Y’know, maybe you shouldn’t make it extensible. It’ll all end in tears and flames.

Reply | Quote
Howard says:

June 29, 2015 at 8:50 pm

Another thing to do with a reddit JSON feed, you can paste the URL into:
json-csv.com and it will turn the JSON into a CSV file. Then you can mess around with the data in Excel.

Reply | Quote