PDF Publication Title:
Text from PDF Page: 013
Applications Estimating Web Trac Because PageRank roughly corresp onds to a random web surfer see Section it is interesting to see how PageRank corresp onds to actual usage We used the counts of web page accesses from NLANR NLA proxy cache and compared these to PageRank The NLANR data was from several national proxy caches over the p erio d of several months and consisted of unique URLs with the highest hit count going to Altavista with hits There were million pages in the intersection of the cache data and our million URL database It is extremely dicult to compare these datasets analytically for a numb er of dierent reasons Many of the URLs in the cache access data are p eople reading their p ersonal mail on free email services Duplicate server names and page names are a serious problem Incompleteness and bias a problem is b oth the PageRank data and the usage data However we did see some interesting trends in the data There seems to b e a high usage of p ornographic sites in the cache data but these sites generally had b elieve this is b ecause p eople do not want to link to p ornographic sites from Using this technique of lo oking for dierences b etween PageRank and usage nd things that p eople like to lo ok at but do not want to mention on their web pages There are some sites that have a very high usage but low PageRank such as netscap eyaho ocom We b elieve there is probably an imp ortant backlink which simply is omitted from our database we only have a partial link structure of the web It may b e p ossible to use usage data as a start vector for PageRank and then iterate PageRank a few times This might allow lling in holes in the usage data In any case these typ es of comparisons are an interesting topic for future study PageRank as Backlink Predictor One justication for PageRank is that it is a predictor for backlinks In CGMP we explore the issue of how to crawl the web eciently of the Stanford web that PageRank is a b etter predictor of future citation counts than citation counts themselves The exp eriment assumes that the system starts out with only a single URL and no other information and the goal is to try to crawl the pages in as close to the optimal order as p ossible The optimal order is to crawl pages in exactly the order of their rank according to an evaluation function For the purp oses here the evaluation function is simply the numb er of citations given complete information The catch is that all the information to calculate the evaluation function is trying to crawl b etter do cuments rst We found on tests not available until after all the do cuments have data PageRank is a more eective way to order In other words PageRank is a b etter predictor the numb er of citations The explanation for this seems to b e that PageRank avoids the lo cal maxima that citation counting gets stuck in For example citation counting tends to get stuck in lo cal collections like the Stanford CS cited pages in other areas PageRank preference to its children resulting in an ecient broad search b een crawled It turns out using the incomplete the crawling than the numb er of known citations than citation counting even when the measure is web pages taking a long time to branch out and nd highly quickly nds the Stanford homepage is imp ortant and gives This ability of PageRank to predict citation counts is a p owerful justication for using PageR ank Since it is very dicult to map the citation structure of the web completely PageRank may even b e a b etter citation count approximation than citation counts themselves low PageRanks We their own web pages it may b e p ossible toPDF Image | PageRank Citation Ranking Bringing Order to the Web
PDF Search Title:
PageRank Citation Ranking Bringing Order to the WebOriginal File Name Searched:
1999-66.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |