Sunday, October 03, 2004

Yahoo: Mike Grehan Interviews Jon Glick Yahoo!'s Senior Manager for Web Search eMarketing News

Mike Grehan's eMarketing News - Internet Marketing TipsMike Grehan's eMarketing News - Internet Marketing Tips:

Jon Glick is Yahoo!'s Senior Manager for Web Search, managing the core relevancy initiatives for Yahoo! Search. Prior to joining Yahoo!, Jon served as the Director of Internet Search at AltaVista and has held positions in new product development, strategic analysis and product management at Raychem Corp., Booz Allen & Hamilton Consulting and the Lincoln Electric Co. Jon has a BS in Computer-Aided Engineering from Cornell University and an MBA from Harvard Business School.

John quote: "to get a good search engine... the best for our end users, everything has to be working well. You know, if you have a great relevancy algorithm and lousy Spam detection you just get a bad experience for instance. You really can't fall down on any of these areas. If you don't have good *de-aliasing tables users get a bad experience. It's all about a lot of things coming together with a very good team. And I think that's what the Yahoo! search team has done very, very well.

<>
[Note: Jon uses the term de-aliasing in reference to knowing that something such as www.coke.com and www.coca-cola.com are the same content. If Yahoo! were to show both URL's following a search on 'coke' then the user wouldn't be getting the diversity of results which would be optimal. He's also happy to point out that a search for 'coke' at Google is representative of the problem!]...

And our first goal, as I've said, is to give our users the best experience: full stop. Without that, nothing else really matters. They're the engine that drives everything. But we do also realise that the people who create pages, the content providers do have a curiosity about what they're doing that's working; what they're doing that isn't working... And this is part of transparency. So, we try and give that kind of fuel bar score in the same way as we'd try and answer questions in a forum. We want people to be able to do the right things. It's something we're considering along with a lot of other things. And if it makes sense, we'll roll it out. The other thing is... well, you mentioned that we'd touch on personalisation. For me it seems as though there have been two phases in search. The first phase was all about what was on the page. The second generation of engines started to look at what it was they could find out about that page by looking at what else there was on the web that gave more information about it. The directory listings, the connectivity, the anchor text etc. And we're still in phase two.

For me, and this is me speaking personally, the next phase will be where you're able to take into account information about the user. And of course local, because local search is a subset of personalisation. For local to really work, you need to know where the person is. So, the issue of: "I'm number one for this keyword"... may not exist at all in a few years. You know, you'll be number one for that keyword depending on who types it in! And from where and on what day... and... It is going to get more complex than something that can simply be summed up in a ranking algorithm, let alone how many checks somebody has on a toolbar.

Comment: The promised clarity and feedback to webmasters etc has not been in much evidence. The fuel bar score has also been disabled.


Site Match
There are three components to the Site Match program.

1) Site Match " the basic per URL submission. It's a subscription charge plus a cost per click. We do this for a number of reasons. If you take a look at what you would have had to have done to get into all the individual subscription programs, Alta Vista Express Inclusion, Inktomi Site Submit etc. You'd generate a subscription fee of over 150 dollars. But now the base fee, for the first year is 49 dollars and then drops for subsequent URL's. So it's much more economical. Especially for a small site that wants to get across a large network. Also, it means that people who are going into a category where they're going to generate a lot of traffic where there's very high value, they have a chance to do it on an ROI basis which they can measure. So it's a more tuned program that we're offering."


2) Public Site Match "This is where we take high quality feeds from Governmental sites, not for profit organisations, Library of Congress and that type of source. This helps to improve the comprehensiveness of our index and also...

How does this XML feed, or "sheet feeds" as they're known, which is basically meta data, blend with the ranking data from a crawl? I mean the feed is data about data, it's not actually being crawled at all. How do you know which is which and what about the linkage and connectivity data...

Site Match Xchange program o Jon:

"We still have connectivity values for the sites because there's a lot of information that we take from the free crawl which factors in.
For example, an individual eBay auction may not be linked to. But we know what the connectivity score is for eBay on aggregate. So we can take that into account. And as part of the Site Match program, editors are going through and making sure that there is quality to the content and evaluating the quality of that content. For example, pages which are included in the Site Match Xchange program have to have unique titles and they have to have meta data. Things which are not necessarily requirements for a page to be free crawled out on the web. The standards are actually higher because our goal is simply to add quality content. The intention of the entire Site Match program is to increase both the comprehensiveness and also the relevancy of results to our users. We run our own tests to monitor user behaviour. What links users click on; do they click higher on the page... when are we giving users a better experience...

o Mike:

Just before I forget to mention it Jon: What about the Yahoo! directory and the 299 dollars for inclusion?

o Jon:

That does still exist. The Yahoo! directory is there for the different ways that people decide to look for information on the web. Some people like to parse a hierarchy, some people want to find other sites that are related within a certain category. And other people take the more direct route of: "I know what I want, I know the keywords..." and they just go directly to the search.

If by benefit you mean ranking - no there's not. It's an inclusion program. It is just about inclusion. It gives us an opportunity to use resources to go through and give them an editorial review of their site and puts them on a one-to-one relationship with the folks at Yahoo! And if you go to Site Match Xchange then you get some good customer service support. It's not going to do anything to influence their ranking. But let's take an example of say, a travel company. The Yahoo! Slurp crawler typically is going to come around and visit a site every three to four weeks. If you're a travel company... two weeks ago you wanted to sell Mardi Gras Getaways. But that's finished and nobody's buying those breaks now. It's Spring breaks for college students maybe. Now if your content changes that dramatically, having us come back and crawl your site every 48 hours may have a significant impact on your business. If you have a page which doesn’t change much, like consumer electronics... standard web crawl may be fine. There's a guy who came to see me earlier and he's doing an art exhibit and they won't have the pages ready until a few days before they're in each city. So waiting for the free crawl to come around may mean that they're not in when they need to be. It is an additional service and if it makes sense for people then they're welcome to take advantage of it. If they're happy with it and they're positioned well and have the crawl frequency, then use it. People who don't use the program will never be disadvantaged in the rankings as compared to other people who do."

Meta data and Yahoo!

"Yes we do use meta keywords. So let me touch on meta tags real fast
.
We index the meta description tag. It counts similar to body text. It's also a good fallback for us if there's no text on the page for us to lift an abstract to show to users. It won't always be used because we prefer to have the users search terms in what we show. So if we find those in the body text we're going to show that so that people can see a little snippet of what they're going to see when they land on that page. Other meta tags we deal with are things like the noindex, nofollow, nocache we respect those. For the meta keywords tag... well, originally it was a good idea. To me it's a great idea which unfortunately went wrong because its so heavily spammed. It's like, the people who knew how to use it, also knew how to abuse it! What we use it for right now is... I'd explain it as match and not rank. Let me give a better description of what that really means. Obviously, for a page to show up for a users query, it has to contain all the terms that the user types, either on the page, through the meta data, or anchor text in a link. So, if you have a product which is frequently misspelled. If you're located in one community, but do business in several surrounding communities, having the names for those communities or those alternate spellings in your meta keywords tag means that your page is now a candidate to show up in that search. That doesn't say that it'll rank, but at least it's considered. Whereas, if those words never appear then it can't be considered...

how many keywords do you put in a meta keywords tag before you start to flag yourself up as spamming?

o Jon:

Okay here's a couple of parameters. Each keyword is an individual token separated by commas. So that's that. You want to separate these things with commas and not just put one long string of text. The more keywords that are put in and the more they're repeated, the much larger the chance our spam team is going to want to check out that page. It doesn't mean that page is going to get any specific judgement. But it is very much a red flag. For best practice you just need to remember it's for matching - not ranking. Repeating the same word 20 times is only going to raise a red flag... It doesn't increase your likelihood of showing up on any given set of search results. It's just a risk with no benefit.
"So I could put, I don't know... er... for instance, ‘laptop computers, desktop computers, palm computers...’

o Jon:

Exactly, and, of course, since each of those is separated by commas, then ‘laptop computers’ will count for ‘laptop computers’ and not ‘laptop’ or ‘computers’ separately. So doing it like that means that you're not going to be penalised for keyword spamming on the word ‘computers’.

o Mike:

Okay, let's take the description tag now. That gives us a little bit of editorial control still?

o Jon:

The description tag does give you just a little bit of editorial control, depending on what your body text looks like. Ideally we like to find the keywords the user typed in your body text. But this can be a very good fallback for search engines in the event that you have something like, for example, an all Flash page which can't be well indexed, in terms of text by search engines... "


Spam

Quote: "Alright then Jon: It's been mentioned again. The dark side that is. Let's talk Spam! Of course it's a huge problem with search engines. People who are creating web pages in the industry worry so much about what they're doing with the pages and how they're linking and submitting... and will I get banned... I get asked a lot of questions like: "If I link to my other web site will they know it's mine and ban me?" Or: "My hotel is in New York, New York, will I get banned for keyword stuffing?" Crazy worries. I guess for most of the smaller businesses which aren't up to speed with search engine optimisation, they hear a lot of propaganda which worries them. But at the other end of the scale, I tend to hear more from you guys at the search engines about the activities of less ethical affiliate marketers out there. Now those guys certainly live by their own rules. How do you deal with it?

o Jon:

Well let me just say first that, in that sense Spam has gotten a lot better over the years. You don't really much have people trying to appear for off topic terms as they tended to.

How Yahoo! deals with affiliate sites and duplicate content:
Quote: "You now have people who are trying to be very relevant. They're trying to offer a service, but the issue with affiliate Spam is that they're trying to offer the same service as three hundred other people. And the way we look at that is... we look at that the same as we look at duplicate content. If someone searches for a book and there are affiliates in there, we're giving the user ten opportunities to see the same information, to buy the same product, from the same store, at the same price. If that happens, we haven't given our user a good service or a good experience. We've given them one result. So we are looking at how we can filter a lot of this stuff out....

There are a lot of free sign up affiliate programs. They've pretty much mushroomed over the past few years. The plus side is, they're on topic. They're not showing up where they shouldn't... it's the other way... they're showing up too much where they should [laughs] We look at it like this: what does a site bring to the table? Is there some unique information here? Or is the sole purpose of that site to transact on another site, so that someone can get a commission... if that's the case, we'd rather put them directly in the store ourselves, than send them to someone else who's simply telling them how to get to the store.
o Mike:

You guys must get Spam reports the same as all the other engines. So when somebody does a search on a particular product and it turns up that there are ten affiliates in there, whether they're Spamming or not, it's likely that the affiliates could be turning up before the merchant ever does. If you get a high level of that occurring, do you ever go back to the merchant with some feedback. You know, say like, guys do want to optimise your web site or just do something about your own ranking?
o Jon:

We do actually talk to a lot of companies. We obviously have a relationship with many of them through the various Yahoo! properties. Different companies often take a different tack. For instance, a company which has been very, very good on listening to us is eBay. I have to say is a company which has been very good at working with us and listening to us on the affiliate issue. Their feeling is really twofold: One is, the people that are confusing the results in the search engines are the same people who are doing things that they don't like on eBay. And for them they tend to see bad actors in one space and bad actors in another. The other thing, of course, is if you have someone who is using a cloaked page, and so, to a search engine it's a huge bundle of keywords and massive interlinking of domains on different IP's and for a user coming in with IE 5, it's an automatic redirect to pages on eBay... they know that the user doesn't think: "Oh it's an affiliate Spammer. The perception for the user it's simply this: eBay tricked me! There's a link that I clicked that said "get something free" I clicked it and ended up on eBay. And they wonder why eBay would do that to them. And they know that those things hurt their brand. So that's why they have been very proactive in working with us to ensure that those kind of affiliates are not part of their program.

But... some other merchants may look at it and say: since we're paying on a CPA (cost per acquisition) basis we're actually indifferent as to how that traffic comes to us. They may say, it's like, we don't want to monitor our affiliates, or we can't monitor our affiliates... whatever, we'll take the traffic because there's no downside. It's a different way that they may look at it. And you know, it depends what position they're in, and more, how much they care about their brand, or don't care...

o Mike:

And a similar kind of thing happens on the paid side. I don't want to get too much into that because this is the organic side and I don't want you to get too embroiled in that as I don't know if you're much connected with it. But in PPC with a campaign you can only bid once on the same keyword. It's not possible for you to fix it so that you can turn up at one, two and three on the paid search side. So, what tends to happen there is that, the merchants don't mind if the affiliates are bidding on the same keywords. So one way or another, it's likely that, if they can't hold all the positions down the right hand side, the affiliates will help them. And at least that way they get the sale anyway.

o Jon:

The downside of that for some of them... I actually covered this in a session yesterday. They’re competing with their affiliates who are actually bidding up to what their zero margin is on their CPA against the cost of those bidded clicks because their landing pages were just like.. you know, one page with a link on it that said: "Click here to shop at Nordstrom." And their marketing spend was actually going up. They were paying people to get traffic that they were likely to have gotten anyway. And they need to roll that back. It may make some kind of sense for a product. But it often doesn't make sense for a brand. It's like, people are probably going to find their own way to your brand name on their own without the affiliate inserting themselves in the value chain. In that case, unnecessarily. And I think people are getting a little more savvy about their affiliate programs. Now they're thinking more about here's what you can do - here's what you can't do. Now they're thinking a bit more about the ways that affiliates can give them distribution. Here are ways that can optimise sales or hurt the brand. They know that people don't view them as affiliates, they view them as their representatives. If you make lousy pages for people it reflects badly on the brand.

o Mike:

So, to finish off the affiliate and Spamming fear factor... because your lunch is getting cold... if for no other reason [laughs] What is it that gets you banned - if at all? Is it cloaking, mini networks...o Jon:

Mike there isn't an exhaustive list. There are new technologies coming out all of the time. At the highest, or fundamental level, someone who is doing something for the intent of distorting search results to users... that's pretty much the over arching view of what would be considered a violation of our content policies. In terms of specifics... um.. let's do some notes on cloaking. If you're showing vastly different content to different user agents... that's basically cloaking. Two different pages - one for IE and one for Netscape with the formatting difference between those, or having different presentation formats for people coming in an a mobile device perhaps, or just different type of GUI that's acceptable. That's helpful.

o Mike:

What about a Flash site with cloaked text pages just describing the content - but a true description of the content.

o Jon:

Exactly. For a Flash site which has good text embedded in it. And the cloaked page simply says the non cloaked page has the following text in it... no problem with that. That being said, if someone cloaks the content, that will raise the red flag. The Spam teams are going to look at it. And if what they see is a legitimate representation of the content that's fine. If what they see does NOT represent the content, I mean something entirely different to what the users would get.. they're going to look at that and probably introduce the penalty.
o Mike:

Linkage data...
obviously people are going to do this... they know that links count with search engines, maybe not exactly why though... so the quest begins to get links... any links. Some will buy a thousand fake domains and have them all interlinked and pointing back to the main site...

o Jon:

Yeah. Massively interlinked domains will most definitely get you banned. Again, it's spotted as an attempt to distort the results of the search engine. The general rule is that we're looking at popularity on the web via in-links. The links are viewed as votes for other pages. And part of voting is that you can't vote for yourself. And people who buy multiple domains and interlink them for the purpose of falsely increasing popularity, are doing that, just voting for themselves. And the same applies with people who join reciprocal link programs. Unfortunately there are many people who join these because they're fairly new to search engine marketing and maybe someone tells them that this is a great way to do things. That's very dangerous. People linking to you for financial or mutual gain reasons Vs those linking to your site because it's a great site, a site they would go to themselves and would prefer their visitors to see, are doing it the wrong way. Let's just take the travel space again. Someone who has 30 pages of links buried behind the home page, literally each with several hundred links, with everything from... golf carts, to roofing, to... who knows. You know that's kind of like: hey if you like our travel to Jamaica site, you may also be interested in our roofing site... [Mike and Jon burst out laughing here]

o Mike:

It's a shame really. People seem so desperate for links but frequently just have no idea where they're going to get them from. It's my mantra over and over again, and I know you've heard me saying it many times at the conferences: the importance is in the quality of the links you have - not the quantity. And of course, everyone wants to do incoming links. They don't want to do reciprocal linking. They even worry too much about whether they should link out themselves. Getting links in is a lovely blessing, but should people worry too much about linking out?

o Jon:

The thing to remember here Mike, is about who you're linking out to. If you hang out in bad neighbourhoods as we say, then you will get more scrutiny, that's inevitable. If you end up linking to a lot of people who are bad actors and maybe have their site banned -- then you linking to them means you're more likely to be scrutinised to see if you're part of that chain. The other thing, of course, is, when you take a look at connectivity, every site has a certain amount of weight that it gets when it's voting on the web and that is based on the in links. And they get to distribute that...energy... via its out links. And by that, I mean outside the domain
.

Navigational links and other links within a domain don't help connectivity, they help crawlers find their way through the site. I'm just talking here about the true out links. Those outside of the domain. For those... how much each link counts is divided by the number that exists. So if you have a couple of partners, or suppliers you're working with and have an affinity with, if you link out to them - then that helps a lot. If you have... 3,4,5 of them... well if you added 300 random reciprocal links, then you've just diluted the value of the links that you gave to the other people you have the real relationship with. It's as simple as this, people who have massive link farms aren't really giving much of a vote to anyone because they're diluting their own voting capability across so many other people. So you need to consider the number of out links you have on a page, because each additional link makes them all count for less.
o Mike:

Jon... I feel as though I've virtually exhausted you. This has been so useful and I really do appreciate the time you've given, not just to discuss your own Yahoo! properties but for giving such a wonderful insight into search engine marketing best practice. I honestly believe your contribution here will help the entire readership, at whatever level they're at in the field, to have a more comprehensive knowledge. Thank you so much.

Jon:

No problem Mike. Anytime at all. It's always good to talk with you.

<>
Jon Glick is Yahoo!'s Senior Manager for Web Search, managing the core relevancy initiatives for Yahoo! Search. Prior to joining Yahoo!, Jon served as the Director of Internet Search at AltaVista and has held positions in new product development, strategic analysis and product management at Raychem Corp., Booz Allen & Hamilton Consulting and the Lincoln Electric Co. Jon has a BS in Computer-Aided Engineering from Cornell University and an MBA from Harvard Business School.

Intermission Time - Go get a drink, take a break, but come back! There's more good stuff to come!

Study Shows How Searchers Use The Engines
by Christine Churchill

Usability has always been one of my favorite subjects, so when Enquiro published a new study showing how users interact with search engines, it was a must-read. The study turned out to be such a fascinating report, I had to share it.

Gord Hotchiss, President of Enquiro, and his team of able research assistants ran 24 demographically diverse participants through a series of tests to observe and record their behavior as they interacted with search engines. While everyone will agree that 24 is not a statistically significant sample size, I think the results of the project show interesting findings that are worth considering.

As I read the study, a number of his findings in user behavior correlated with other studies I've read. For example, Gord mentions that almost 60% of his users started with one search engine (usually Google) and then would switch to a different engine if the results weren't satisfying. This finding is consistent with data from ComScore Media Metrix that talks about user fickleness toward search engines. CNET writer Stephanie Olsen did a great job summarizing that data in her article on search wars . The message to the search engines is "Stay on your toes guys and show us relevant results or we're out of here."

The Enquiro team found that there was no consistent search method. Everyone in the study did it a little different. People doing research used engines differently than people on a buying mission. Women searchers differed from men in their searching techniques. Gord tells us "an organic listing in the number 8 position on Google might not have been seen by almost half the men in the group, but would have been seen by the majority of the women." Let's hear it for women's powers of observation!

One finding of the study that is near and dear to every search engine marketer's heart is, "If no relevant results were found on the first results page, only 5 participants (20.8%) went to the second page."

This is consistent with numerous studies documenting that users don't go very far in the results pages for answers. Probably the most famous research to document this behavior was the study by Amanda Spink and Bernard Jansen where they found 58% of users did not access any results past the first page. I had the pleasure of talking with Amanda a few years ago when I was first moving to Dallas and she was moving out of it. She's a fun lady with a flair for writing provocative titles to research papers on search engines. Expect to hear more from her in the future.

A finding that warmed my longtime SEO innards was that there was a "sweet spot" for being found on a search engine's results page and that place was in the "above the fold organic results," that is to say, in the portion of the free listings that can be viewed without scrolling. Considering how cluttered some search engines results pages are getting this is good news! According to Gord, "All 24 participants checked these 2 or 3 top organic rankings."

I suppose it shouldn't be too surprising to find the "prime real estate" in the middle section of the page, this is consistent with eye tracking studies that show the center column to be the first place user look on a web page. Of course, one might wonder why users tended to skip over the category and product search lists? Gord's team asked users about why none of them bothered to look at the news and shopping feeds that appear at the top of the organic results. Users said they didn't know what they were.

I had a déjà vu moment when I read that because this is almost identical to a comment that was made to me by a usability tester in an in-house usability test. My tester said they skipped over the product search section because they were unfamiliar with it and it "looked confusing". They jumped straight to what they recognized as "safe" - that being the organic list of results.

Another finding I found myself agreeing emphatically with was that top sponsored positions had "a 40% advantage in click throughs over sponsored links on the right side of the screen". It makes sense when you think about it - the spot is so in your face - users can't miss it. The fact that this spot produced a great click through was a well known PPC insider secret and many of us who do PPC management had devised elaborate methods to get our clients in those top spots. We've been hearing evil rumors that Google may be phasing this spot out in the future. It was still there today when I checked, so maybe Google is planning on keeping it awhile.

A finding that could be affected by Google's recent ad overhaul was that users of Google were more likely to resist to looking at sponsored ads than on other engines. Part of the answer explaining this has to do with Google ads looking more like ads than on other sites - hey, they were in little colored boxes off to the right that practically screamed "Ad!" You couldn't possibly mistake them for content or organic results. Since Google has dropped the little colored boxes and gone with plain text for the ads, one can't help but wonder if users will be less resistant to ads now.

The Enquiro study includes a summary section toward the end of the report. Here they identified items that captured the searchers' attention enough to make them click and listed important items to include on a landing page. I won't give away the store by telling you everything, but I will tell you, as you may expect, the title and description shown in the results page were the most important eye magnets for attracting user's attention.

Perhaps the most intriguing of the report findings was that search is a circular and complex process, not a linear process as we sometimes like to simplify it into. Instead, search is a multi-step process with multiple interactions with sites and search engine results pages. Gord's team found that "a typical online research interaction can involve 5 to 6 different queries and interactions with 15 to 20 different sites." That's a lot of sites and a lot of back and forth between sites and search engines.

The takeaway point from this study is that search is definitely more complicated than at first glance. I guess that's what makes search marketing so absorbing. For every thing you learn about it, there are ten more questions yet unanswered. Sounds like we need a sequel to this report - eh, Gord?

Check out the study yourself by downloading it off the Enquiro web site. It's a fascinating report and it's only 30 pages including lots of pictures. Happy reading!

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.