PHP Like a Pro: Part 1

PHP gets a lot of flack for being a shitty language. This reputation is unfortunately well deserved. It is an ugly, quirky, idiosyncratic mess. But at the same time, PHP remains one of the most popular web programming languages out in the wild. Why? Because it is easy. Easy to install, easy to learn, easy to get working pages starting from scratch. The learning curve is gentle and the time interval it takes to go from a blank editor window, to actual dynamically generated page is extremely short.

But as most things that are easy to learn, PHP tends to be difficult to master. While you can start building functional pages right away, you likely won’t be winning any awards for nicely structured code. PHP lends itself to disturbingly messy, ugly and insecure implementations.

So, seeing how we are stuck with it, how do we make PHP suck less? Well, lets talk about this. Let’s try to sit down and establish how would we go about starting a new PHP based project in such a way that it’s future maintainer is not going to want to strangle you in your sleep. Ideally we would like to create something that a random code monkey can pick up and go “Well… I don’t completely hate this…”

Why not go for “I like”? Because that just does not happen. Firstly, we’re talking about PHP here. Secondly, no one ever likes someone else’s code. Ever. Someone could give me the cleanest, most expressive and brilliant piece of code in existence, and there is a high chance I would be like “Fuck this guy and fuck his bullshit K&R indenting!”

Don’t try this at home!

First, let me show you what can, but shouldn’t be done with PHP. Let’s take a very simplistic application and build it the quick and dirty way. What is the simplest web app you can think of? For me it’s a pastebin – just a simple page with a text box hooked up to a database with a single table that only has two columns: id and content. It is something you can implement in about 15 minutes, and 15 lines of code.

No seriously, here is a functional pastebin app in 15 lines of terrible, terrible, ugly and buggy PHP:

Paste It
          

"; } else { $mysqli->query("insert into pastebin (content) values ('$_POST[content]')"); echo "

Thanks

Your paste is here: insert_id>PASTE #$mysqli->insert_id

"; } } else { $tmp = mysqli_fetch_assoc($mysqli->query("select * from pastebin where id=$_GET[paste]")); echo "

PASTE #$tmp[id]

$tmp[content]new paste"; } ?>

This code actually works. It is small. It fits on the screen, and it fits in your head which is actually a good thing. Code that doesn’t fit on a screen, tends not to fit in your head which makes it difficult to debug it. This code is succinct and to the point.

So what exactly is wrong with it? Well, where do I start? Actually, I know – let’s start with the most glaring mistake also known as the Bobby Tables problem:

$mysqli->query("insert into pastebin (content) values ('$_POST[content]')");

If this code would ever touch live internet, the effects could be disastrous as it is prime target for SQL injection. Now I might have had enough forethought not to give the testuser privileges to drop tables, but it is likely can insert, update and delete at will. Few lines below this, I take the same un-escaped user provided string and echo it to the page opening myself up to a XSS attack. If an industrious user would drop a properly formatted iframe in there he could serve just about any code from my page creating all kinds of mischief.

Then of course there is a whole litany of less glaring issues that would nevertheless induce murderous rage in most programmers. For example, there is no error checking or input validation in there whatsoever. So if anything was go go wrong, the entire app will explode in spectacular ways. There is really no robust way to test for problems either. Since it’s only 15 lines refactoring this code shouldn’t be a huge issue – but imagine few hundred lines written in this style – weaving a thick, convoluted tapestry of overlapping PHP, HTML and SQL. How do you keep it all in order? You really can’t.

Granted, this code is this terrible because I was golfing – i was trying to write my app in as few lines of code as possible so that it fits neatly in a little code box here on the blog. But, you can find code of this quality out there in the wild… And some of it is probably mine. Or yours.

What should we do then?

How should you write PHP code? Well, not like that. Just don’t do what I did just now and you should be fine. Solid advice, eh? But I think I can do a little better than this. So here are some guidelines that are probably going to help you make things in PHP that don’t suck as much as my pastebin example.

Follow the Standards

I think everyone agrees that the rule #1 of php (and programming in general) is “don’t write shitty code”. Or at least that’s one of the many #1 rules of programming. I can’t really keep track of all of them, but this is definitely an important one. The problem with this rule is that everyone has their own definition of what can be considered a “shitty code”. Please refer to the “Fuck this guy and fuck his bullshit K&R indenting!” commentary just few paragraphs above to see that this is true.

Hence we invented coding standards. Coding standards are rules and guidelines that everyone hates, but agrees to use for the sake of consistency. They are usually chosen in such a way as to offend every programmer equally regardless of their favorite variable naming conventions, indent styles or coding habits.

As it is to be expected with PHP, it has about six billion conflicting standards. Zend has one, Pear has one, Horde has one. Which one should you adhere to? Good question. I think the best bet is to go with the standards published by the PHP Interop group cryptically named:

I’m fairly sure these numbers are supposed to represent amount of fucks you give about interoperability of your code. So if you don’t give a fuck, you should at the very least try to conform to PSR-0, unless you enjoy reading issues like “y u no psr?” in your issue tracker on Github. If you actually give a fuck, then you should read PSR-1. Or if you really want to be an awesome dude, and you give all the fucks, you should conform to PSR-2.

All of these are mostly common sense stuff, and you probably won’t piss you off much. It also won’t prevent you from writing terribly shitty code, but it may help avoiding common mistakes just by forcing you to drop (or prevent you from developing) harmful habits.

Orient yourself Objectively

OOP is great. It also sucks. You take the good with the bad, but most of the time it will help. At the very least it will force you to organize your code. If I thought about “pastes” as objects, that can be initialized with some text and then stored in the database I could have abstracted all that logic into a separate class file keeping it separate from the display layer available to the user.

Once you start encapsulating logic into methods, it puts you in a completely different fame of mind. In my purely procedural quickie code above, I was mainly concerned with stuff flying into and out of the database with my code being more or less the revolving door in between. Re-thinking that code as in OOP terms would make me more likely to actually add some validation and error checking code in there, because that’s just what you do.

Not to mention the class could be isolated and tested separately from the interface.

Unit Testing

My pastebin is a little bit simplistic example for this. But imagine building something a tad bigger – like a blog, or a shopping cart. Chances are you will be organizing our code into neatly encapsulated classes. But you can’t write these blind and just hope things will work. So in most cases you will be building little dirty hackish test mock-ups with a lot of echo statements to make sure your methods are working properly, that properties are accessible, that you are polymorphing your stuff adequatly and etc…

What you might not realize is that you are essentially unit testing your code this way. You are just doing a really terrible job at it, and your coverage blows, but it still counts. What if hitting the reload button on the browser after every change in code you could just fire up a test from the command line and just watch it beat your code up?

So instead spending time half-assing custom made test pages, you ought to just bite the bullet and write formal unit tests. Might as well do it correctly, no? Here is what no one tells young programmers about unit tests: they are not there to detect errors. At least not at first. At first they won’t detect shit, because you will write a test, and then will write the code which will pass said test. Which makes it seem like Unit Tests are just needless busy work. But the secret is that they make you start thinking about edge cases, and oddball behaviors in your code. If you didn’t have to put something in that unit test class, chances are you would just test the 2-3 most common inputs and then fuck-off to do the next thing on your check-list. So the initial benefit is that they help you focus your mind on thins you would likely overlook otherwise.

The error detection thing comes in handy later, when you go back, change bunch of things in your code and watch the carefully constructed tests fail because you of course forgot to sanitize some input, or some other stupid thing.

Document everything

There is no such thing as self-documenting code. No matter how rigorous and clean your coding standards are, it could always benefit from some comments. And by comments I mean real comments, not pseudo code or redundant crap like this:

// loop through array and display each item
foreach ($array as $value) {
    echo $value;
}

Really? You don’t say? Shit, if you didn’t comment that I wouldn’t have known. How about you tell me why you are doing this. I can see what the code is doing, but it always helps to know why it is doing it. What’s the purpose? Where is it going to be used? Of course I could read your code, and jump from method to method until I can actually have a clear picture of what is going on in my mind, but if you tell me, I don’t have to do any of that.

You know what is also nice? That JavaDoc thing that Java has. You know, those neatly defined comment blocks that describe each class and method, that can then be slurped out of your code and puked out into HTML to create a nifty user guide and API reference? That shit is great, because it actually makes people comment things in the correct way. You know that your blurb will become part of a web style manual that someone is supposed to be able to read in order to learn how to properly use your class/method without actually looking at the implementation, and that’s how you write it. You describe the behavior without doing into detail about implementation.

You should be doing this for all your PHP classes. And yes, there is a way to create those nice HTML docs too.

Do the MVC

For most web projects the Model-View-Controller pattern just makes sense. Even my pastebin example, even if simplistic could use some clear separation between the code that handles the database CRUD operations, and the display layer that the user interacts with.

Keep in mind I’m not saying you need to get married to a full blown MVC framework. Marriage is a beautiful thing, but you probably shouldn’t make such a commitment unless you are absolutely sure this is what you want. CakePHP might be great for a medium sized project but I really think it would be an overkill to use it for my pastebin. There are cases where it is probably a good idea to roll your own MVC like pattern, or choose something really lightweight to do it for you.

Always Escape!

Don’t trust the users. No, seriously. I can’t emphasize this enough. Shit they put into text boxes should never, ever touch the back end without being beaten into submission and sanitized. Most programmers know not to do stuff like this:

echo "Looking up record with id: " . $_GET["id"]; // XSS FAIL
$result = $mysql->query("SELECT * FROM sometable WHERE id='$_GET[id]'"); // SQL Injection FAIL

Unfortunately, sometimes you won’t use the user provided data until much later in your code. Sometimes you get it, stuff it into a an innocent looking variable like $foo, then let it percolate throughout your code. Many screen-fulls later you finally get to the point where you are about to store $foo in the database, completely forgetting it contains tainted, unsafe and potentially destructive payload.

Joel Spolsky actually wrote a long-ass article about this a while ago. There are ways you can actually force yourself to remember these sort of things. How? Hungarian notation is one of such methods.

Wait! Stop! Sit your ass back down! I’m not talking about the brain-dead version where you use prefixes to hint at data types… Which actually is not a terrible idea in a dynamically typed language like PHP. But that’s not what I mean. I mean the Spolsky brand of sane Hunarian where you use prefixes to differentiate between different kids of data (not types, kids) – like safe and unsafe strings for example.

When building crap that will face the internet, I like to prefix my variables with U if they contain Un-escaped, Unsafe, Unseemly User generated data. Conversely I use S for Safe, Sanitized data ready to be Shown or Stored in the database. So for example:

$uName = $_POST["name"]; // unsafe string
$sName = Sanitize($uName); // now safe via some magic function

Why does it work? Because eventually you train yourself to catch logical errors as if they were syntax errors. For example, seeing a U variable anywhere near database related code should make you very uneasy.

And yeah, I know the PSR guidelines say that Hungarian notation should burn in hell, and that you shouldn’t put visibility or type information in the variable names… But, I think in this case we might be justified – we’re not indicating type after all, but rather important piece of logic here.

Don’t make a soup out of your code

I mentioned this before – my pastebin example is like a bad soup – an unholy mix of ingredients that probably should not co-exist in a single document: PHP, HTML and SQL. In a perfect world, your HTML should reside in their own little universe away from dynamic code.

There is just something dirty and nu-sophisticated about using print statements to assemble HTML code. It should not be done that way. Ideally you would want your HTML to be made by a designer. You know, one of those nice people who don’t know shit about coding but have kick-ass skills in Photoshop, and can actually wizard up a page that doesn’t look like shit in IE6 without committing a suicide. Those guys like to work with pure HTML templates, that merely have little place-holder text and “holes” where the dynamically generated content will be added later.

Even if you don’t have a friendly designer prettifying your pages for you, HTML and CSS is much easier to work with and maintain if it’s not churned out by print statements, or interleaved with finicky business logic. That template/placeholder thing is a really good idea.

So, why not make it a thing and use a templating engine of some sort. This way you separate your client side, and server side code almost completely making your life much easier.

Similarly, you really don’t want to be embedding a lot of SQL in your PHP. It is prone to errors, can be exploited way to easily, and it is not straightforward to test. Ideally, your code should manipulate objects, which then in turn ought to transparently save their data to the database.

Don’t get me wrong, I am actually one of those weirdos that actually likes writing SQL statements. I mean, I wrote half a fucking thesis on blind database integration. It is very likely that I have forgotten of algebra that many of you even knew existed. Can I write a better query than some shmuck who threw together this or that ORM? Probably… Well, maybe…

Actually, no. If it is a sizable ORM with a decent sized community chances are it was pretty robustly tested. There are many eyes on that code so to speak whereas I am very likely to just go “fuck it, this worked for the two cases I just tried, therefore I pronounce it bug free and bulletproof.” Not to mention, anyone reading may queries may actually loose sanity points just for trying to follow my “optimization” attempts. So, even if you know your S, your Q and your L very well, sometimes it might be safer and more kosher to abstract this stuff onto some ORM engine.

Would I need an enterprise level ORM for my pastebin example? No, definitely not. In that case I would probably just use PDO instead of the native driver and call it a day. PDO has built in parametrized query support which would help me with those nasty SQL injection issues. On the other hand pastebins actually come in all shapes and sizes. If I wanted to make it extensible, I could probably make a case for picking a robust ORM to preserve sanity and make code more maintainer friendly.

Manage Dependencies

If you were keeping track, I just mentioned a whole bunch of third party code you might or might not be pulling into your project. Let’s count them: a unit testing framework, a documentation builder, a templating engine, an ORM, maybe some more stuff. Is this smart? Is this a good practice?

Well, it depends. In general, you don’t want to re-invent the wheel every time you start a new project. The more code you have to write, the more bugs you are going to introduce. So by using other people’s code you limit the attack surface of your own code, and also decreasing the amount of code you have to keep in mind at any given time. So it makes sense.

But if you will be pulling in a lot of other people’s code, you ought to have some way of managing these dependencies. Ideally, you would want a tool that would let you automatically download and update and configure all the things you need before you even write a single line of code.

A Modest Proposal

Here is a crazy idea: how about we take this pitiful pastebin code and re-design it, doing it right. And by right, I mean going completely overboard. We are going to use an MVC like approach, and implement it using a full blown templating engine, ORM engine, we will write formal unit tests for everything that can be tested and we will automatically manage run time and development time dependencies. Yes, it is a complete overkill but I guess that’s sort of the point.

This project is small enough to be able to do this in just a few short posts, and the code should be succinct enough to post here almost in its entirety. I should be able to illustrate the tools and concepts much better suited to bigger projects, using something very simple and easy to grasp.

In part 2 I’m going to discuss the tools/frameworks/engines I picked for this project, and show you how to set them up so that they can be managed almost without any conscious effort.

This entry was posted in Uncategorized. Bookmark the permalink.



6 Responses to PHP Like a Pro: Part 1

  1. This sounds like a fun series! However, I’ve never used PHP and I intend to avoid it until it’s dead, so I made my own in Elisp. I’ll probably follow along with your series. This is the initial quick and dirty version.

    https://gist.github.com/4324599/bda3aef96235891bd4f78f3ab0f5847b884c25 d5

    Here’s an example paste in action (which I’ll keep hosted here for a week or so):

    http://zeus.nullprogram.com/pastebin/1191fd7b

    Reply  |  Quote
  2. Luke Maciak UNITED STATES Google Chrome Linux Terminalist says:

    @ Chris Wellons:

    Wow, very nice. I like how succinct it is. :)

    And yes, it is probably smart to stay away from PHP. I think it is very telling that its very name is about as messy and idiosyncratic and nonsensical as some of it’s constructs.

    What does PHP stand for?
    – HTML Preprocessor

    Yeah, I don’t know how they got that either either. Still, I do have a strange fondness for the language.

    Reply  |  Quote
  3. Jason *StDoodle* Wood UNITED STATES Google Chrome Windows says:

    Well, IIRC the first “P” stands for PHP. Apparently, recursive acronyms are a thing. Personally, I think they’re about as “neat” as fart jokes, but to each their own…

    Reply  |  Quote
  4. Luke Maciak UNITED STATES Google Chrome Linux Terminalist says:

    @ Jason *StDoodle* Wood:

    Fuck it, I’m calling my new sofware GNU is Not HURD. ;)

    Does recursive acronym + doubly recursive acronym = triple recursive acronym?

    Reply  |  Quote
  5. Nathan UNITED KINGDOM Google Chrome Windows says:

    Great post! I look forward to part 2.
    Prefixing unsafe variables with ‘U’ and safe ones with ‘S’ is a great idea. Will start doing this.

    Reply  |  Quote
  6. Luke Maciak UNITED STATES Google Chrome Linux Terminalist says:

    @ Nathan:

    Thanks, I shamelessly stole that idea from Joel Spolsky, but I will gladly take credit for it. :)

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *