Sunday, October 10, 2004

Filthy Linking Rich by Mike Grehan TBC

Bias caused by dependence on linking...Filthy Linking Rich by Mike Grehan: "Are search engines giving a fair representation of what's actually available on the web? Not really. If pages were judged on the quality and the relevance for ranking, then there would be less search engine bias towards pages which are simply popular by 'linkage voting'. Unfortunately, quality is subjective so finding a universally acceptable measurement or metric is not going to be easy"

SEs apply "random graph theory to the web, they have viewed it as a type of static, equilibrium network with a classic Poisson type distribution of connections." but net is not static...

network theory throws light on a number of social mechanisms which operate beyond the world wide web to structure it...

Lada Adamic, of Xerox, Palo Alto Research Centre... discovered that, just as in the social sphere, one could pick two sites at random and get from one to the other within four clicks...

one of the primary features of a random graph is that its degree distribution always has a particular mathematical form known as Poisson distribution...

physicist Albert-Lazlo Barabasi ...has shown that many networks in the real world have degree distributions that don't look anything like a Poisson distribution. Instead, they follow what is known as a power law...relates to links and nodes. ...He has discovered that all networks have a deep underlying order and operate according to simple but powerful rules...

Clustering ...is an almost universal feature, not just in social networks, but of networks in general...Perhaps the greatest discovery of the laws of network organisation focuses on the idea of "hubs" and how they form. These are the centrepieces of networks, around which many links form...


Hyperlink based "popularity" algorithms (esp PR) are "inherently biased against new and unknown pages.

It is essential to be aware that "" the importance or quality of page" is distinct to "the relevance of page" to a user query.

Relevence is query specific...

He quotes an experiment which found that "the top 20% of the pages with the highest number of incoming links obtained 70% of the new links after seven months, while the bottom 60% of the pages obtained virtually no incoming links at all during that period."

Ends with the teaser that "A new model has been developed which can be used to predict and analyse competition and diversity in different communities on the web."

This has to be clustering ( See Teoma)

References:
"rich get richer" problem at Google: Impact Of Search Engines On Page Popularity.


A New Paradigm For Ranking Web Pages On The World Wide Web. http://www2003.org/cdrom/papers/refereed/p042/paper42_html/p42-tomlin.htm

On network science he recommends: Evolution of Networks



Google
Creative Commons Licence
This work is licensed under a Creative Commons License.