Word Cloud in PHP

Word clouds (or tag clouds if you will) are one of the interesting methods of visualizing textual data that got popular recently with the emergence of tagging, ad success folksonomy sites such as Flickr, del.ico.us and etc. One of the aspects of word clouds that interests me is the word frequency analysis. You can size the words in your cloud based on how often they appear in text, and get a clear visual indication of the key concepts and issues that are prevalent in it. For example, some people use it to analyze the presidential state of the union addresses.

Tag Crowd is a very good online tool to generate nice looking clouds from files, or copied and pasted text. But I decided to write one myself, just for shits and giggles (and to see how difficult it would be). I’m posting the complete Word Cloud class below. The code is loosely based on the example posted on lotsofcode.com.

class WordCloud
    var $words = array();
    function __construct($text)
        $text = preg_replace('/\W/', ' ', $text);
        $words = split(' ', $text);        
        foreach ($words as $key => $value)
    function addWord($word, $value = 1)
        $word = strtolower($word);
        if (array_key_exists($word, $this->words))
            $this->words[$word] += $value;
            $this->words[$word] = $value;
    function getSize($percent)
        $size = "font-size: ";
        if ($percent >= 99)
            $size .= "4em;";
        else if ($percent >= 95)
            $size .= "3.8em;";
        else if ($percent >= 80)
            $size .= "3.5em;";
        else if ($percent >= 70)
            $size .= "3em;";
        else if ($percent >= 60)
            $size .= "2.8em;";
        else if ($percent >= 50)
            $size .= "2.5em;";
        else if ($percent >= 40)
            $size .= "2.3em;";
        else if ($percent >= 30)
            $size .= "2.1em;";
        else if ($percent >= 25)
            $size .= "2.0em;";
        else if ($percent >= 20)
            $size .= "1.8em;";
        else if ($percent >= 15)
            $size .= "1.6em;";
        else if ($percent >= 10)
            $size .= "1.3em;";
        else if ($percent >= 5)
            $size .= "1.0em;";
            $size .= "0.8em;";
        return $size;
    function showCloud($show_freq = false)
        $this->max = max($this->words);
        foreach ($this->words as $word => $freq)
                $size = $this->getSize(($freq / $this->max) * 100);
                if($show_freq) $disp_freq = "($freq)"; else $disp_freq = "";
                $out .= "<span style='font-family: Tahoma; padding: 4px 4px 4px 4px; letter-spacing: 3px; $size'>
                            &nbsp; {$word}<sup>$disp_freq</sup> &nbsp; </span>";
        return $out;

Yes, I didn’t write it from scratch. In the time honored tradition, I stole the code, and adapted it to my needs. Why waste the precious wetware cycles in my brain, if someone already did most of the legwork.

The main difference is that my script takes unformatted text and then breaks it up into words. You can essentially pass it straight from the $_POST. The rest of the code pretty much follows the example. I also stripped the external stylesheet specification, and built it into the code, so you can just import the class at any place, without fearing that it will send HTML headers to early. I also made the sizing steps more gradual, and got rid of the color variation.

If you pass in a really long text you will have a big cloud as every word is represented. Feel free to add some conditioal statements on showCoud to prevent words with low frequency from being displayed. I’m to lazy at this point, and I want to get this post out.

Here is the usage:

$txt = "The text to be turned into a cloud";
$cloud = new WordCloud($txt);
echo $cloud->showCloud(ture);

Passing a boolean true to the showCloud it will display the word frequency as a superscript to each word. It may or may not be an useful feature. The default is false, so if you just call the fucntion without arguments you will get no frequencies displayed.

Again, it’s not a perfect solution, but it is short, sweet and it works – at least for the most part.

[tags]word cloud, tag cloud, tag crowd, tags, cloud, word frequency, text analysis[/tags]

This entry was posted in programming and tagged , . Bookmark the permalink.

6 Responses to Word Cloud in PHP

  1. Adam Dempsey UNITED KINGDOM Mozilla Firefox Windows says:

    Thanks for the great example, I’ve been looking for something like this for a while!

    Reply  |  Quote
  2. Fr3d UNITED KINGDOM Mozilla Firefox Windows says:

    echo $cloud->showCloud(ture);

    ture? :P

    Nice bit of code though :)

    Reply  |  Quote
  3. Luke UNITED STATES Mozilla Firefox Ubuntu Linux says:

    LOL! ture!

    I can’t even copy and paste code without somehow introducing typos. I swear, it was spelled correctly in the code, because it worked when I tested it.

    Reply  |  Quote
  4. Jason RUSSIAN FEDERATION Mozilla Firefox Windows says:

    It’s a WOW widget! Have it Ajax contexts around mouseovered tags, it’d be priceless.

    Reply  |  Quote
  5. Luke UNITED STATES Mozilla Firefox Ubuntu Linux says:

    Oh wow! Good idea. I don’t know what do you mean by AJAX here (mouseover effects wouldn’t be asynchronous nor would they use XML) , but yeah – some sort of effect on mouseover is a good idea!

    Reply  |  Quote
  6. Del UNITED KINGDOM Mozilla Firefox Ubuntu Linux says:

    Yes, I didn’t write it from scratch. In the time honored tradition, I stole the code, and adapted it to my needs. Why waste the precious wetware cycles in my brain, if someone already did most of the legwork.

    Very true, I am glad you found it useful! I am going to update the script soon to allow a string to be converted to a cloud (like above) and also for the ability to assign a URL to a keyword.

    Thanks again!

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>