MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

PDF Publication Title:

MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY ( models-and-algorithms-for-pagerank-sensitivity )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 090

70 4 ⋅ random alpha pagerank 4.5 empirical distribution As we argued in the previous sections, the RAPr model generalizes Page- Rank to multiple random surfers. Instead of picking a value of α to control when the random surfer teleports, RAPr forces us to pick a distribution for a random variable A that controls how likely surfers are to pick a value of α. This second task seems more problematic. For a natural choice of A, it is not. This natural choice is to pick A according to how surfers actually behave on the web. Recall that a single value of α is the probability a user clicks a link on a web page. With a custom browser plug-in, you could compute your own value of α. It’s a simple ratio number of pages viewed after clicking a link α≈. total number of pages viewed With more browsing, and more information, the approximation to α grows more refined. If people tracked their own α, finding the empirical distribution for A would be just a matter of data collection. Loosely speaking, browser toolbars collect precisely this type of informa- tion. That is, the Microsoft, Yahoo!, and Google browser toolbars—which users download and install into their browsers for a few improvements— collect this data and send it back to Microsoft, Yahoo!, and Google. (Of course, each company ensures that users provide explicit consent for trans- mitting the data.) Toolbar logs, then, have the information to compute A. Following our presentation on the initial RAPr model at the Workshop on Algorithms for the Web Graph, Abraham Flaxman and Asela Gunawardana provided a summary of these logs. They reported values of α from one mil- lion “users” on the web collected in a two hour window. From this data, the mean value of α = 0.375. The data shows a good fit to a Beta(1.5, 0.5, [0, 1]) distribution (figure 4.6). For the figure, the analysis used a kernel density estimator [Asmussen and Glynn, 2007] to generate an approximate probability distribution from the raw data. The density fit itself looks quite similar to a Beta distribution. A nonlinear least squares fit produces a Beta(1.52, 0.53, [0, 1]) distribution. Instead, a Beta(1.5, 0.5, [0, 1]) is more simple and matches the mean of the data. For the estimate displayed in the figure, we dropped all values of α mea- sured at 0 and 1. Both of these values are impossible and represent problems with the sampling procedure. In an ideal case, the data would be collected with pseudo-counts [Agresti, 2002], where we estimate number of pages viewed after clicking a link + 1 α≈. total number of pages viewed + 2 Pseudo-counts correct for the two known, but unobserved, future actions. That is, a person will always click another link, so we add 1 to both totals. Also, a person will always visit a page without clicking a link, and so we add another page to the total pages viewed. This adjustment fixes an important

PDF Image | MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

PDF Search Title:

MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

Original File Name Searched:

gleich-pagerank-thesis.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)