Implementing related items for a Django blog

Having searched and searched and researched to figure out how to get a related article to show along side a blog post or article I stumbled upon this gem of an example which I am now using to show related articles. This works by comparing the words and the frequency of their existence, dropping any common words and giving a result between 0 and 1, with 1 being a 100% match. I have been modifying some of the implementation also to also sort the results by the rating of related-ness.

Now the problems with this method. The only issue I have had with looking for similar results is that the calculations are slow, even just comparing titles of articles has a small lag. For the full documents the comparison adds extra seconds to load time. This leads me nicely on to possible solutions that I am thinking of so far... Cacheing and/or creating lookup tables for the results... Or moving the comparison to a save and using a database to store the results.

Sources:

http://allmybrain.com/2007/10/19/similarity-of-texts-the-vector-space-model-with-python/

Published on

Filled under

Discuss!

comments powered by Disqus