Do you ever feel that siren call of code that needs to be written? Sometimes I get an idea into my head, and then spend the next few days thinking about little else. I’m thinking about the code in the shower, on the toilet, in bed before sleep, while I sleep. Half the time it’s not even that interesting of a project… But it is a project, and I want to get it done. This is what happened to me this weekend.
My university used to allow students to create personal websites via Novell NetDrive service. It had a rather clunky, but perfectly usable web interface that allowed anyone to log in and manage files in their PUBLIC_HTML directory from anywhere in the universe (provided they can get internet connection). I used that service extensively for the HTML lab and final project assignments. But alas, the OIT decided to phase out all the Novell stuff and replace it with something much more difficult to use for the average student.
The new system requires you to mount the networked drive using WebDav, which is already a hurdle much to high for most of my students. But to add insult to the injury the difficulty is compounded by two additional issues:
- For some strange reason you must change password of your campus wide ID for this new system to even bother talking to you
- Computers in the lab are locked down so tightly that without an admin passwords students can’t mount shit
I figured that I might be able to skirt around these issues somehow, but despite my best attempts I haven’t been able to set the damned drive on my system for like 3 days. So having filed a tech support ticket into a black hole that is the OIT support system, I got an idea: I could write a bare bones NetDrive replacement over the weekend.
I’m not sure how I settled on using Google App Engine for this. I think I just didn’t want students to put their filthy files on the same server as my blog, I didn’t want to run a home server seeing how I don’t have a spare box, and I didn’t want to pay for the pleasure. So somewhere along the way I decided that App Engine is a great idea, even though it does not actually have a real file system. But, App Engine is free, and you can easily save files in it’s Blobstore.
So the idea is simple: the user comes along and uploads a file to blob store. We save his username, the file name and the Blobstore reference into the Datastore. Then we allow the user to retrieve his vile using a neat url that looks something like:
http://example.com/pub/username/foo.html
How do we do that? First let’s set up our handlers like this:
def main():
application = webapp.WSGIApplication(
[('/', MainHandler),
('/upload', UploadHandler),
('/list', ListHandler),
('/pub/([^/]+)/([^/]+)?', ServeHandler),
], debug=True)
run_wsgi_app(application)
So the upload form will sit at /, the upload action is gonna happen at /upload and we will be serving the files at /pub/username/filename.ext. So far so good.
This is how we are going to store our file information:
class MyBlobFile(db.Model):
userName = db.StringProperty()
fileName = db.StringProperty()
blobstoreKey = blobstore.BlobReferenceProperty()
Apparently you have to use the BlobReferenceProperty to store a the unique blob id key in the dataStore. Initially I set this field as a StringProperty but App Engine was complaining like a little bitch. So I changed it.
Setting up a form is easy, but you need to remember the silly create_upload_url call. Make sure you include it. Otherwise it won’t work.
class MainHandler(webapp.RequestHandler):
def get(self):
# username = whatever, get from your session/login handler
upload_url = blobstore.create_upload_url('/upload') # don't forget this
self.response.out.write('
""")
Here is the actual upload code:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0] # that's it, we're done
# save in Datastore
f = MyBlobFile()
f.userName = self.request.get("username")
f.fileName = str(blob_info.filename)
f.blobstoreKey = blob_info.key()
f.put()
Finally, this is how we serve the file. It’s also very, very simple:
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, ffolder, ffile):
# get the blob key from the blobstore
q = db.GqlQuery("SELECT * FROM MyBlobFile WHERE netID =:1 AND fileName =:2", str(ffolder), str(ffile))
results = q.get()
resource = results.blobstoreKey
self.send_blob(resource)
If you are perceptive, you probably noticed a flaw in my logic here. What if the user uploads two files with the same name? The Blobstore and Datastore won’t care. They will simply assign a new random key to the new entry and call it a day. This is indeed an issue, but I got around it by simply checking whether or not the file exists running the same query when the file it is uploaded. If there already is a Datastore entry that matches this username and filename then I delete it, and the associated blobstore entry. This mirrors what would normally happen in a filesystem – file would get overwritten.
This, ladies and gentlebirds is how you do it.
In retrospect, I probably did not need Blobstore for this issue. You see, Blobstore is a “billing only” feature of App Engine. I did not know that when I started this project but it bears mentioning: you will need to enable billing in order to use it. So if you can get away with it, it is probably a better idea to store your files in DataStore using BlobProperty. But it is nowhere near as nice – you essentially have to implement the blob_send() function yourself, send the correct mimetype headers and etc..
I’m actually considering rewriting my code to do it that way. For the time being, I enabled billing for my app, but set the daily budget to $0 which should keep me in the free quota range of 1GB of storage space. Since I’m only going to expose this app to 26 users I’m hoping this will be enough. It will be interesting to see if they will blow through the bandwidth and concurrent access quota during the lab session. They shouldn’t but then again, you never know.
Quick note: you can’t set your quota to $0 when you first enable billing. You have to set it to $1 first, and give them your credit card information (nothing is charged up front though). Then you wait 15-20 minutes till their system makes up it’s mind about the whole billing change, and go back in to change it down to $0. At this point it will accept 0 as a valid input value.
Thursday will be a genuine test by fire for my code. I will let you know the damage on Friday or next week.
In the meantime, if you want to mess around with the code, I have it up on GitHub.
Please note that I opted not to use the built in user/session handling mechanisms and instead bolted on custom session handling with GAE Sessions. If you wanted to use my code for your project, this might be something you would want to change. My justification for doing this is that Google account registration is a pain in the ass sometimes. I actually don’t know how many of my students have Gmail accounts and I don’t feel like walking 20 people through Google registration pages during the lab. So I went with something quick, easy and hackish.
And yes, I’m storing passwords as unsalted md5 hashes. That’s like 4 WTF’s all rolled up into one right there. Sue me. I just didn’t care enough to write something more robust. If Thursday is not a complete disaster, and I decide to continue using this tool, I will probably fix this.
Also, can someone explain to me why almost all non-programmers assume that it is a great idea to try to chit-chat with you when they see you have code on your computer screen. It’s like “Oh, I see you are busy writing code. Let me interrupt you by telling you about my day, and asking irrelevant questions about the weather forecast for Monday.” That shit is getting notorious lately.
Fuck me, that’s exactly what I’ve been looking for just this week end.
I tried to hack something using indexer and uploadify but it’s just garbage and too dangerous…
I’ll try your thing, thanks!
Hey, nice! I’m glad I could help.
If you are using the code you’d probably want to comment out the Course ID check in RegisterHandler. Otherwise you won’t be able to register without a CoruseID and you won’t be able to add CourseID without registering. I’m using that check to prevent random people from registering for the service – they have to know the “secret code” to get in.
Also, there is currently now way to promote user to admin in the UI. You have to do this manually in the Datastore.
I’m not sure if the Free quota are enough. I’m doing a test run with 26 users who will be uploading their pages pretty much simultaneously so it might get ugly. I will let you know how that worked after Thursday. :)
The OIT doesn’t allow students to access their public_html folders via ssh/sftp? That’s medieval.
Also, with respect to people interrupting you while you’ve got code on your screen… My wife is finally pretty well-trained to leave me alone while I’m “typing into that black screen”. Everyone else still annoys the piss out of me, though, while I’m coding.
@ road:
Well, computer science students get full unix shell accounts with SSH access. They don’t usually deactivate them when you graduate either. Mine got taken away though because I sort of abused it… Not sure exactly what got me in trouble. It could have been port scanning school network using nmap. It could have been using it to compile various security tools (password crackers, port scanners, etc…). It could have been the fact I was using it to run a blog, which later got spammed into oblivion. It could have been the fact that was putting up various binaries for download (mostly stuff I wrote, but I also had various open source tools that I compiled and patched to work on the ancient Solaris server we were using). It’s probably all of these things combined.
I talked to them and promised to be good but they won’t give it back cause I’m no longer a student. It’s ok for them to forget to disable accounts after people graduate, but they can’t really justify handing out new ones to adjuncts.
Anyways, OIT is mostly clueless. The unix accounts I mentioned are handled by a mini-IT dept that services CSAM (computer science and mathematics) depts. They are only slightly less clueless.
Also, these are highly desirable traits in a potential life mate of either sex:
– not interrupting when you code
– not rolling eyes/yawning when you talk about technical stuff
– not minding the fact that you like to play video games
:)