On JavaBlogs

There has been a bit of recent discussion about the fact that as JavaBlogs grows it is changing, with a few problems with what some people see as low quality posts.

Gerard has outlined the four main methods of making a community scale, but I would like to suggest a fifth. IMO, I believe that automatted text categorisation can increase the size a community can scale to without requiring non-software intervention.

I’ve done some experimentation with using text analysis algorithms for simple match/non-match categorisation. I believe something as simple as Bayesian classification for blog posts can go some way to improving the quality of links on the “Hot List”.

Today’s Java.Blogs posts

Ultimatly, I think that some of the more advanced text categoriation algorithms might be even more useful. For instance, Google News manages to categorise its stories fairly well, and I believe they do most of that automatically. NewsInEssence categorises news into “clusters” atomatically. A quick look on citeseer shows plenty of algorithms around, and I’m pretty sure the author of Classifier4J might be interested in implementing at least one.

Fixing Velocity&#39;s character encoding problems

Java Vector Space Search and Latent Semantic Indexing

Classifier4J, NNTP//RSS and Bayesian Blog Classification.

Fixing Velocity's character encoding problems