logo

MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

PDF Publication Title:

MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY ( models-and-algorithms-for-pagerank-sensitivity )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 079

This vector shows that x5 and x6 have the highest standard deviation. In a traditional PageRank context, these pages are both in a sink-component and accumulate rank from the largest connected component (x1 , x2 , and x4 ). A high standard deviation signals that the rank of these pages is “more likely” to change for different realizations of A. Another interesting quantity derived from our model is the correlation coefficient between ranks. The correlation coefficient between two ranks xi and xj provides a measure of how xi will vary as xj varies with different realizations of A. If the correlation between xi and xj is positive then an increase in x i from separate realizations of A implies that x j tends to increase as well. If the correlation is negative, then an increase in xi implies a decrease in x j is likely. Here, we have computed the correlation coefficient between all pages:3 3 This matrix is symmetric, so we could simply present one half of it. ⎡⎢ 1.000 000 0.999 996 0.998 844 0.999 211 −0.999 951 −0.999 373 ⎤⎥ ⎢⎥ ⎢ 0.999 996 1.000 000 0.998 764 0.999 149 −0.999 936 −0.999 313 ⎥ ⎢⎥ ⎢ 0.998 844 0.998 764 1.000 000 0.999 963 −0.999 261 −0.999 920 ⎥ . ⎢ 0.999 211 0.999 149 0.999 963 1.000 000 −0.999 550 −0.999 989 ⎥ ⎢⎥ ⎢ −0.999 951 −0.999 936 −0.999 261 −0.999 550 1.000 000 ⎢⎣ −0.999 373 −0.999 313 −0.999 920 −0.999 989 0.999 667 The sign pattern in the correlation structure shows that there are effectively two groups of pages, (x1, x2, x3, x4) and (x5, x6). Thus far, we have seen a few useful quantities that we can derive from the RAPr model. In practice, we anticipate many problem-dependent uses for these quantities. Our experiments show that the standard deviation vector is uncorrelated (in a Kendall-τ sense4) with the PageRank vector itself. Because pages with a high standard deviation have highly variable PageRank values, the standard deviation vector could be an important input to a machine learning framework for web search or web page categorization. The correlation structure between the random ranks indicates that some of the pages form natural groups. One may explore connections between negatively correlated ranks to glean information from the underlying graph. We do not pursue this idea further, though it may aid in applications such as spam detection.5 Another application for these techniques is local site analysis. On a website such as Wikipedia, the entire graph structure is available. Further, site usage logs contain the information necessary to generate the vector v based on incoming searches. These same logs also contain the information necessary to estimate the distribution of A. With the RAPr formulation, extra information is then available to help the site owner understand how people use the site.6 More generally, the PageRank model has become a key tool for network and graph analysis. It has been used to find graph cuts [Andersen et al., 2006], infer missing values on a partially labeled graph [Zhou et al., 2005], find interesting genes [Morrison et al., 2005], and help match graph structures in protein networks [Singh et al., 2007].7 In all of these cases, the random surfer model does not directly apply. Each paper picks a particular value for α and computes a PageRank vector from that value. With RAPr, each 0.999 667 ⎥ 1.000 000 ⎥⎦ 4.2 ⋅ vision 59 4 Kendall’s τ difference in concordant and discordant pairs between two lists relative to an identical ordering and an inverted ordering. The τ value is 1 for identical lists and −1 for inverted lists. 5 There is already a paper on using a closely related idea for spam detection. See section 4.3.5. 6 One of the most useful observations from this model is when people use the site in a way that is not predicted by the random surfer model with fitted parameters. This indicates that random surfer models are not appropriate and could suggest monitoring a different set of statistics. 7 See section 1.4 for an informal descrip- tion of these topics.

PDF Image | MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

models-and-algorithms-for-pagerank-sensitivity-079

PDF Search Title:

MODELS AND ALGORITHMS FOR PAGERANK SENSITIVITY

Original File Name Searched:

gleich-pagerank-thesis.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP