HomeBlogAboutTools

On JavaBlogs

uncategorized

There has been a bit of recent discussion about the fact that as JavaBlogs grows it is changing, with a few problems with what some people see as low quality posts.

Gerard has outlined the four main methods of making a community scale, but I would like to suggest a fifth. IMO, I believe that automatted text categorisation can increase the size a community can scale to without requiring non-software intervention.

I’ve done some experimentation with using text analysis algorithms for simple match/non-match categorisation. I believe something as simple as Bayesian classification for blog posts can go some way to improving the quality of links on the “Hot List”.

Todays Java.Blogs posts
Today’s Java.Blogs posts

Ultimatly, I think that some of the more advanced text categoriation algorithms might be even more useful. For instance, Google News manages to categorise its stories fairly well, and I believe they do most of that automatically. NewsInEssence categorises news into “clusters” atomatically. A quick look on citeseer shows plenty of algorithms around, and I’m pretty sure the author of Classifier4J might be interested in implementing at least one.