Tuesday, September 07, 2004

MSN Research ( future of search) Economist.com | Computing

Economist.com | Computing: "a way of getting answers from the web"

Article states: "But the real prize will surely go to whoever can use the web to deliver a straight answer to a straight question. And Eric Brill, a researcher at Microsoft, intends that his firm will be the first to do that.

Ask MSR uses information on web pages to respond to questions to which the answer is a single word or phrase—such as “When was Marilyn Monroe born?” Ask MSR starts by manipulating the question in various ways: by identifying the verb, for example, and then changing its tense or moving it into different positions in the sentence (“Marilyn was Monroe born”, “Marilyn Monroe was born” and so on). The resulting phrases are then fed into a search engine, and documents containing matching strings of words are retrieved. It sounds a promiscuous strategy, but gibberish phrases produce few matches, so, as Dr Brill puts it, “being wrong is very cheap.”

Ask MSR is still a prototype, although Microsoft is trying to improve it and it may be launched commercially under the name AnswerBot"

2)“Beyond the Factoid”. Dr Brill, meanwhile, has moved to a more difficult task. One of his most recent papers, written jointly with Radu Soricut of the University of Southern California, is entitled “Beyond the Factoid”. It describes his efforts to build a system capable of providing 50-word answers to questions such as “What are the rules for qualifying for the Academy Awards?” This is harder than finding a single-word answer, but Dr Brill thinks it should be possible using something called a “noisy channel” model (NCM).

NCM works by modelling the transformation between what a user means (in spell-checking, the word he intended to type) and what he does (the garbled word actually typed). Dr Brill's question-answering system does something similar. Many question-and-answer pairs exist on the web, in the form of “frequently asked questions” (FAQ) pages. Dr Brill trained his system using a million such pairs, to create a model that, given a question, can work out various structures that the answer could take. These structures are then used to generate search queries, and the matching documents found on the web are scanned for things that look like answers.

The current prototype provides appropriate answers about 40% of the time. Not brilliant, but not bad. And it should improve as the web grows. Rather than relying on a traditional “artificial intelligence” approach of parsing sentences and trying to work out what a question actually means, this quick-and-dirty method draws instead on the collective, ever-growing intelligence of the web itself.

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.