Computer based reasoning

Recently Clay Shirky gave a good run-down of the problems the semantic web faces in actually working. I agree with his basic premise - that the semantic web won’t deliver what it promises.

However, I believe that the more reliable metadata we have the better, and I think the web has so much good information available that computers are begining to make a good approxmiation of giving us good answers to pretty much any question asked in many fields - at least to a level somewhat comparable to humans.

In support of my argument I offer 1 Billion Pages = 1 Million Dollars? Mining the Web to Play “Who Wants to be a Millionaire? from Overture Research. The abstract reads:

We exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show “Who Wants To Be A Millionaire?”. The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple information retrieval algorithms, and multiple search engines; results are combined in the spirit of ensemble learning using an adaptive scheme. Empirically, the system correctly answers about 75% of questions from the Millionaire CD-ROM, 3rd edition —general-interest trivia questions often about popular culture and common knowledge. The decision-making module chooses from allowable actions in the game in order to maximize expected risk-adjusted winnings, where the estimated probability of answering correctly is a function of past performance and confidence in correctly answering the current question. When given a six uestion head start (i.e., when starting from the $2,000 level), we find that the system performs about as well on average as humans starting at the beginning. Our system demonstrates the potential of simple but well-chosen techniques for mining answers from unstructured information such as the web.

So humans are (only?) six questions better at “Who Wants To Be A Millionaire?” than a computer - without even using the semantic web. With even imperfect meta-data, it’s hard to imagine that not getting better over time (IMO, of course).

Fixing Velocity&#39;s character encoding problems

Java Vector Space Search and Latent Semantic Indexing

Classifier4J, NNTP//RSS and Bayesian Blog Classification.

Fixing Velocity's character encoding problems