logo

PageRank Citation Ranking􏰏 Bringing Order to the Web

PDF Publication Title:

PageRank Citation Ranking􏰏 Bringing Order to the Web ( pagerank-citation-ranking􏰏-bringing-order-web )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 013

􏰜 Applications 􏰜􏰗􏰑 Estimating Web Tra􏰖c Because PageRank roughly corresp onds to a random web surfer 􏰤see Section 􏰒􏰗􏰚􏰥􏰐 it is interesting to see how PageRank corresp onds to actual usage􏰗 We used the counts of web page accesses from NLANR 􏰭NLA 􏰯 proxy cache and compared these to PageRank􏰗 The NLANR data was from several national proxy caches over the p erio d of several months and consisted of 􏰑􏰑􏰐􏰓􏰑􏰜􏰐􏰛􏰛􏰚 unique URLs with the highest hit count going to Altavista with 􏰛􏰘􏰓􏰐􏰛􏰚􏰜 hits􏰗 There were 􏰒􏰗􏰛 million pages in the intersection of the cache data and our 􏰜􏰚 million URL database􏰗 It is extremely di􏰖cult to compare these datasets analytically for a numb er of di􏰕erent reasons􏰗 Many of the URLs in the cache access data are p eople reading their p ersonal mail on free email services􏰗 Duplicate server names and page names are a serious problem􏰗 Incompleteness and bias a problem is b oth the PageRank data and the usage data􏰗 However􏰐 we did see some interesting trends in the data􏰗 There seems to b e a high usage of p ornographic sites in the cache data􏰐 but these sites generally had b elieve this is b ecause p eople do not want to link to p ornographic sites from Using this technique of lo oking for di􏰕erences b etween PageRank and usage􏰐 􏰝nd things that p eople like to lo ok at􏰐 but do not want to mention on their web pages􏰗 There are some sites that have a very high usage􏰐 but low PageRank such as netscap e􏰗yaho o􏰗com􏰗 We b elieve there is probably an imp ortant backlink which simply is omitted from our database 􏰤we only have a partial link structure of the web􏰥􏰗 It may b e p ossible to use usage data as a start vector for PageRank􏰐 and then iterate PageRank a few times􏰗 This might allow 􏰝lling in holes in the usage data􏰗 In any case􏰐 these typ es of comparisons are an interesting topic for future study􏰗 􏰜􏰗􏰒 PageRank as Backlink Predictor One justi􏰝cation for PageRank is that it is a predictor for backlinks􏰗 In 􏰭CGMP􏰔􏰓 􏰯 we explore the issue of how to crawl the web e􏰖ciently􏰐 of the Stanford web that PageRank is a b etter predictor of future citation counts than citation counts themselves􏰗 The exp eriment assumes that the system starts out with only a single URL and no other information􏰐 and the goal is to try to crawl the pages in as close to the optimal order as p ossible􏰗 The optimal order is to crawl pages in exactly the order of their rank according to an evaluation function􏰗 For the purp oses here􏰐 the evaluation function is simply the numb er of citations􏰐 given complete information􏰗 The catch is that all the information to calculate the evaluation function is trying to crawl b etter do cuments 􏰝rst􏰗 We found on tests not available until after all the do cuments have data􏰐 PageRank is a more e􏰕ective way to order In other words􏰐 PageRank is a b etter predictor the numb er of citations􏰠 The explanation for this seems to b e that PageRank avoids the lo cal maxima that citation counting gets stuck in􏰗 For example􏰐 citation counting tends to get stuck in lo cal collections like the Stanford CS cited pages in other areas􏰗 PageRank preference to its children resulting in an e􏰖cient􏰐 broad search􏰗 b een crawled􏰗 It turns out using the incomplete the crawling than the numb er of known citations􏰗 than citation counting even when the measure is web pages􏰐 taking a long time to branch out and 􏰝nd highly quickly 􏰝nds the Stanford homepage is imp ortant􏰐 and gives This ability of PageRank to predict citation counts is a p owerful justi􏰝cation for using PageR􏰧 ank􏰗 Since it is very di􏰖cult to map the citation structure of the web completely􏰐 PageRank may even b e a b etter citation count approximation than citation counts themselves􏰗 􏰑􏰘 low PageRanks􏰗 We their own web pages􏰗 it may b e p ossible to

PDF Image | PageRank Citation Ranking􏰏 Bringing Order to the Web

pagerank-citation-ranking􏰏-bringing-order-web-013

PDF Search Title:

PageRank Citation Ranking􏰏 Bringing Order to the Web

Original File Name Searched:

1999-66.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP