Word clouds (or tag clouds if you will) are one of the interesting methods of visualizing textual data that got popular recently with the emergence of tagging, ad success folksonomy sites such as Flickr, del.ico.us and etc. One of the aspects of word clouds that interests me is the word frequency analysis. You can size the words in your cloud based on how often they appear in text, and get a clear visual indication of the key concepts and issues that are prevalent in it. For example, some people use it to analyze the presidential state of the union addresses.
Tag Crowd is a very good online tool to generate nice looking clouds from files, or copied and pasted text. But I decided to write one myself, just for shits and giggles (and to see how difficult it would be). I’m posting the complete Word Cloud class below. The code is loosely based on the example posted on lotsofcode.com.
$value)
{
$this->addWord($value);
}
}
function addWord($word, $value = 1)
{
$word = strtolower($word);
if (array_key_exists($word, $this->words))
$this->words[$word] += $value;
else
$this->words[$word] = $value;
}
function getSize($percent)
{
$size = "font-size: ";
if ($percent >= 99)
$size .= "4em;";
else if ($percent >= 95)
$size .= "3.8em;";
else if ($percent >= 80)
$size .= "3.5em;";
else if ($percent >= 70)
$size .= "3em;";
else if ($percent >= 60)
$size .= "2.8em;";
else if ($percent >= 50)
$size .= "2.5em;";
else if ($percent >= 40)
$size .= "2.3em;";
else if ($percent >= 30)
$size .= "2.1em;";
else if ($percent >= 25)
$size .= "2.0em;";
else if ($percent >= 20)
$size .= "1.8em;";
else if ($percent >= 15)
$size .= "1.6em;";
else if ($percent >= 10)
$size .= "1.3em;";
else if ($percent >= 5)
$size .= "1.0em;";
else
$size .= "0.8em;";
return $size;
}
function showCloud($show_freq = false)
{
$this->max = max($this->words);
foreach ($this->words as $word => $freq)
{
if(!empty($word))
{
$size = $this->getSize(($freq / $this->max) * 100);
if($show_freq) $disp_freq = "($freq)"; else $disp_freq = "";
$out .= "
{$word}$disp_freq ";
}
}
return $out;
}
}
?>
Yes, I didn’t write it from scratch. In the time honored tradition, I stole the code, and adapted it to my needs. Why waste the precious wetware cycles in my brain, if someone already did most of the legwork.
The main difference is that my script takes unformatted text and then breaks it up into words. You can essentially pass it straight from the $_POST. The rest of the code pretty much follows the example. I also stripped the external stylesheet specification, and built it into the code, so you can just import the class at any place, without fearing that it will send HTML headers to early. I also made the sizing steps more gradual, and got rid of the color variation.
If you pass in a really long text you will have a big cloud as every word is represented. Feel free to add some conditioal statements on showCoud to prevent words with low frequency from being displayed. I’m to lazy at this point, and I want to get this post out.
Here is the usage:
showCloud(ture);
?>
Passing a boolean true to the showCloud it will display the word frequency as a superscript to each word. It may or may not be an useful feature. The default is false, so if you just call the fucntion without arguments you will get no frequencies displayed.
Again, it’s not a perfect solution, but it is short, sweet and it works – at least for the most part.
[tags]word cloud, tag cloud, tag crowd, tags, cloud, word frequency, text analysis[/tags]
Thanks for the great example, I’ve been looking for something like this for a while!
ture? :P
Nice bit of code though :)
LOL! ture!
I can’t even copy and paste code without somehow introducing typos. I swear, it was spelled correctly in the code, because it worked when I tested it.
It’s a WOW widget! Have it Ajax contexts around mouseovered tags, it’d be priceless.
Oh wow! Good idea. I don’t know what do you mean by AJAX here (mouseover effects wouldn’t be asynchronous nor would they use XML) , but yeah – some sort of effect on mouseover is a good idea!
Very true, I am glad you found it useful! I am going to update the script soon to allow a string to be converted to a cloud (like above) and also for the ability to assign a URL to a keyword.
Thanks again!