Thursday, February 26, 2004

Yahoo!: Birth of a New Machine (Article): "Yahoo!: Birth of a New Machine"

Yahoo! Search new home page

Finally, a few facts about the new yahoo rather than usual guess work:

Yahoo isn't replacing Google with Inktomi. Rather, the company developed a brand-new search engine, drawing on lessons learned from what the company calls the "critical mass" of search engineering talent it assembled through hiring and acquisitions, as well as investment in infrastructure and product quality.

Last week's launch begins a progressive rollout that takes place over the next few weeks. It's the start of numerous planned enhancements focusing on Web search, personalization, and vertical search.

Note the new search engine is for Web results only. Image search remains powered by Google. News search is still a combination of Yahoo's own editorial and technological resources.

How does the Yahoo's new search engine differ from Google? Results presentation is very similar. Yahoo wisely opted to keep things looking mostly the same, with a few exceptions. There's a link to the cached copy of each indexed page -- now served from Yahoo, not Google. Just about everything else on search result pages looks the same.

Actual results returned by Google and Yahoo depend on the query. For popular or common queries, there seems very little difference between the two engines in the top few results. Once past those, results tend to diverge dramatically. For less common or unpopular queries, Yahoo results look quite different from Google's.

Although Yahoo and Google likely use similar algorithms, one reason for the differences is Yahoo's e-mail and search teams leverage what they've learned about spam. Yahoo mail processes billions of e-mail messages, so this knowledge is likely quite helpful in providing Yahoo with a much deeper understanding of spam characteristics -- and helps keep nasty stuff out the Web page index.

Bottom line: I'm impressed with the quality of the results Yahoo delivers. It's a very viable alternative to Google and the other "last engine standing," Ask Jeeves/Teoma.

"The Yahoo Search index captures the full text of Web pages, up to a 500K limit. That's greater than the 101K maximum indexed by Google.
A broad range of file types, including HTML, PDF, and Microsoft Office documents, is included in the mix.

How big is Yahoo's index? The portal isn't saying, despite Google's recent announcement it's expanded its index to nearly 4.3 billion documents (6 billion, counting images and newsgroup postings, which Google does).

In almost all of my tests with random queries, Yahoo reported more results found than Google. Does this mean Yahoo's index is bigger? Perhaps. But reported results are estimates, not exact counts. They can include factors other than keyword matches, making them notoriously unreliable measures of overall index size. Suffice it to say Yahoo's index is comparable to Google's for most queries."

Yahoo plans particular emphasis in coming months on personalization and vertical search. The company's My Yahoo portal already offers extensive content customization options.

Newly released features such as the SmartSort option in Yahoo Shopping, which provides very specific product advice for digital cameras, MP3 players, computers, and other electronic devices based on criteria the user enters, is an example. The ability to add RSS feeds to the My Yahoo page is another.

"Ultimately we want to understand the intention of the user, and I think we're going to get closer to that through personalization," said Weiner.

In the vertical search arena, Yahoo plans to focus on local, travel, personals, and its Hot Jobs search portal.

These moves are clearly only the beginning of many more to come at Yahoo. "Over time, you're going to see Yahoo extend our search technology, and ultimately into our media properties," said Weiner. "To a large extent that will help drive our growth."

All this gives Google, Ask Jeeves, and Microsoft's fledgling Web search initiative good reason to be even more attentive to the quality of their search results. It promises to be a very good year for searchers.

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.