Monday, October 04, 2004

Defininition: 'hubs and authorities' : Teoma, Ask Jeeves , HITS and Clever Mike Grehan's eMarketing News - Internet Marketing Tips

Mike Grehan's eMarketing News - Internet Marketing Tips:Interview by Mike Grehan with Paul Gardi, SVP Search at Ask Jeeves/Teoma, and Alexa Rudin, Director of Communications at Ask Jeeves.

Main topic discussed is social network theory. which is about understanding how people interact and how networked structures are predictive of certain links, like hubs and authorities." This goes further than the initial academic citation model....

"Kleinberg...is... the developer of an algorithm known as HITS (Hypertext Induced Topic Search). The intuition behind HITS is very important as it's based on the notion of 'hubs and authorities',"

Main points:

Quote"

o Authority comes from in-edges (pages which point to yours)

o Being a good hub comes from out-edges (pages which you point to)

This creates a mutually reinforcing relationship:

o A good authority is a page that is pointed to by many Good hubs.

O A good hub is a page that points to many good authorities.

However, it's vital to remember that, this process is a way of, not just identifying linkage patterns, but also identifying web communities and the major players within them...

..with search engines, some links are certainly more equal than others: and some are infinitely more equal.

This is why we talk about "link quality" and not just quantity. A single quality link can frequently have 50 times more power than 100 random or "less qualified" links.

Both Google and Teoma are prime examples of search engines which base their ranking algorithm around the nature and the characteristics of linkage data."

Teoma's algorithm is based on HITS but develops it further by amalgamating with another variation on the algorithm called CLEVER, which was an IBM project.

Quote on Ask Jeeves & Teoma:"The combination of algorithmic search and other data we have to identify structures is incredible. What we do ensure with paid inclusion though, is that it has no impact whatsoever on relevance i.e. paid inclusion is guaranteed entry to the index, but no priority or preference is shown. We maintain absolute integrity within the ranked results. We've come so far with all of this research into structures and hubs and authorities in order to be able to determine exactly what are the authoritative sites. So we're all about absolute relevance. If you see the shift in Jeeves - Jeeves has come along way in terms of relevance....If it's just about being found, then you can try submitting if it's free anywhere, but if you're linked well - we'll find you eventually anyway..."

On flash: Quote Flash, through the methods we use, we'd be able to find that page more easily. (Comment: By analysing link patterns of hubs and authorites) We would have a better picture of what that page is and what it's about than most. But more to the point, we'd understand why people were looking for it. Even though it's a Flash page. Whereas, if our crawler looked at that Flash page - it just sees nothing... maybe just a couple of words that really aren't relevant...

Paul: They're learning all the time. That's why no one can know what they really are. These algorithms are continually being tweaked and tuned...
o Mike: Is it too far fetched to start and imagine these machines beginning to start and think for themselves?

o Paul: This is real, what's actually happening. It becomes an interesting philosophical discussion...

Paul: How intelligent are these machines? Think about your brain and think about how things work, like how do we make decisions? So how could a computer make decisions? How different is that? In the same way, I, guess, that the brain uses facts, figures, intuition.

What the algorithms are doing is gathering this type of information. And in the case of Teoma, it turns out, because we're going down to the level and depth of information we are doing, it happens to be extremely valuable information. This is very targeted stuff and these machines are very smart and they can find this very valuable information and then process it at these almost unimaginable speeds. No one could have imagined this... It is artificial intelligence. Our job is to put all of that information, to input it to the engine so that it gets smarter and smarter and has more and more to think about and things come together in a better completeness.


On Spam
Quote "Spam is a significant issue on the web because it affects our user experience and that's what we are concerned about. If I take another perspective on it, the truth is, Spammers spend more time at looking at these methods, when if fact, if they spent more time creating great content, they'd score anyway - without fear of retribution. You know, we have ways of dealing with Spam. We don't talk about them, naturally. We're successful where we have to be. We always see new techniques being used and we watch it, and if we don't like it - why would anyone else?"

On "Good SEO"... NB this metaphor summarises the thinking behind authority sites and hubs...

Quote " I'd say the same thing as I'd say to some guy who's just arrived in town and said, you know... I need a job. I'd say: "What are you good at?"

And then what kind of advice would you give that person? Well this is exactly the way the web works. I'd say knock on a few doors. Go and meet people. Talk to them and tell them what you're good at. And the ones who will like you for that will put you in their address book or their filing system so they have a record of you. They can then refer to you when they have a question, or refer you to someone who may be looking for whatever it is you do. You become known for something and you become a member of a community. You mix in that community.

If you're good at tennis, you join the tennis club. You become a member of a community and if you're a genuine value provider, of genuine interest they will treat you and accept you as part of the community as well. And it doesn't take as long as some people think it takes. The structures are already there, we're refreshing the pages all the time. If a new page comes up, we may not find it straight away, so you can use paid inclusion so we actually know you're there right at the beginning. And if you've got some links on your page going out, we'll see where they are going and we'll start understanding what community you're in and... well I can't go much further, but that's the key to success. Certainly at our level of understanding. We're mostly concerned with, are you in communities and are they good communities. Are you an authority or not an authority? And by that I don't mean that you have to be the leading authority at something, it's just about being weighted as part of that subject community.

Of course, we don't know what the subject is, we simply can't know exactly what all these subjects are, but when someone types in a word [or words] that brings up that community, as a subject specific community and related to that word [those words] then you're part of that and we know. And you'll be found quicker than you might think. It's not that hard to do. And communities don't have to be that big. If it's a good community on a subject...

On themed web sites, the theory is:
Quote " that if you have a web site which sticks to one theme and one theme only, which is centred around a few keywords then this is the ticket to success. A themed web site wins by pure mass, or dense aggregation, or something...Paul: You mean creating page after page on the same subject? Again, they're focusing on the wrong thing...

o Mike: Let me jump in again and put it this way: "Does the guy who has a blue widget web site with 100 pages beat the guy who has only one page - but one very IMPORTANT page?

o Paul: No the larger site does not do better: Because we don't count the number of pages. We care about this: Are other pages on the same subject considering this to be a GOOD PAGE. And you know, even Google and what they do and the other methods, they can't do this. Sure, they do look at who's referring to the page but they don't look at the subject - the subject of the page. Yes, we look at all the information that the others do as well as everything else...If you want to be prominent then simply become known on your subject. Become good at what you do, become valuable to somebody else online for something. Go ahead and optimise your page - but don't make stupid mistakes... If you're selling, I don't know, window dressings, just make sure you've got a term on there that says "window dressings". You know, there are many people who make that kind of stupid mistake by not having the actual text on the page. And we are matching text at some level...the main point: Become a member of your community. It's not so hard. If you're about something commercial there are many places you can go to get noticed. And then, of course, someone linking to you, well that's a good reference....

Again, it's essential to come out of the realm of only thinking about this as the realm of ones and zeros, like it's only complex mathematical equations and very complex architectures and just think about it this way: How do I relate to other people and organisations?...



Paul: Philosophically our approach is the same - or similar should I say. (Comment: as HITS and Clever) But the methodology is not the same. In fact completely different.

BECOME A "PARTNER FOR PRIVILEGED INFORMATION"... QUOTE:" Paul: We're already developing relationships with selected partners. We have our partners in the paid inclusion program. And these really are trusted partners. And any partner who violated that trust would get notice immediately. Because we allow them to provide us with information that expands our ability to understand what's there, we have to be certain that it's the right information as it's just slightly below our defences.

o Mike: So you've got two levels here: you've got third party suppliers who work on the pay for inclusion side. Whether that's subscription or an XML trusted feed like at Position Technologies. And then at the next level you've got the guys with the search engine marketing firms, the smaller agencies (and the larger for that matter), can they just apply to become a partner?

o Paul: Absolutely, but I do have to say that we limit the number because we can't manage that many ourselves right now. Personally, I'm always open to new partners coming in. We'll work with them as long as they meet a minimum threshold. If they prove themselves to be good partners and valuable assets to their own customers, which is very important to us, then we can choose to work with them on a more permanent basis."

On Teoma brand: QUOTE:

"Paul: Well sure, it's Teoma versus Google and all the other engines in the market place. Whether it's branded Teoma and delivering results to Ask Jeeves and not being recognised is not that important as such."

Algo and beyond: "Beyond the algo, incorporate search data: Quote: " Paul: Yes, there is a level of understanding which goes beyond the algorithm...Ask Jeeves for instance, we layer in what's called 'The Knowledge Base'. When we see an opportunity which we consider to be statistically inside a range where we know that somebody is asking for something where we know we have additional knowledge then we pull that to the top. Basically we analyse and look at the GUI [graphic user interface] very closely. We have the Direct Hit technology...

Click popularity, as it has been known, is a very important aspect of how to rank pages...

the large players are realising the power behind the algorithm. Algorithms scale more efficiently, more predictably than humans if you think about. Teoma is a great example. Because we can use the 'hubs' we don't need one hundred editors working for us. We have 50 million editors working for us [big grin]"

Interview ends with :" At this point I want to delve more deeply into algorithmic search. Paul is happy to continue the conversation, but says he'd be much more relaxed about it without the tape running. I switch it off and we talk for another 15 minutes during which Paul is very candid. This further information is reserved for the third edition of Search Engine marketing: The essential best practice guide."

Free document about HITS and linkage based algorithms here:

http://www.e-marketing-news.co.uk/topic_distillation


http://www.teoma.com/

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.