ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATION

PDF Publication Title:

ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATION ( algorithms-for-pagerank-sensitivity-dissertation )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 040

20 2 ⋅ pagerank background 2.4 algorithms So far, we have seen how PageRank is formulated as the linear equation (I − αP)x = (1 − α)v or the eigensystem Mx = x. Both of these formulations correspond to extremely well studied problems: solving linear systems and computing eigenvectors, respectively [Golub and van Loan, 1996]. So why do we need to write about algorithms for PageRank? The property that makes PageRank an interesting problem is that the matrices are H U G E! Recent reports about the size of the web establish that there are over one trillion (1,000,000,000,000) pages [Alpert and Hajaj, 2008]— although many are duplicates—and at least one search engine has crawled over 180 billion pages [Cuil, 2009]. Algorithms to compute PageRank, then, must cope with matrices derived from such graphs. In this case, classic iter- ative algorithms for linear systems and eigenvectors actually perform well, partly because they use only one or two working vectors. In order to take advantage of additional structure in the strongly personalized PageRank prob- lem, we derive all the algorithms for a sub-stochastic matrix P ̄ and a graph. In terms of figure 2.2, this structure is why algorithms lie before the theory. To discuss specializations of these algorithms on the PageRank problem properly, we need one surprising preliminary discussion. 2.4.1 Important implementation details Let’s begin with a silly question: given a list of positive numbers, how should we compute their sum? Given this task, most people would produce a simple code that resembles the following four lines of Matlab. sumx = 0; for i=1:numel(x) sumx = sumx + x(i); end This routine correctly sums the numbers in exact arithmetic. When computed in floating-point arithmetic, every individual sum contributes an error of at most ε because fl(sumx + x(i))=(1+δ)(sumx + x(i)), ∣δ∣≤ε for some machine ε. After n of these sums, the output sumx has error bounded by nε where n is the number of summands [Higham, 2002]. For a large PageRank problem n ∼ 109 and for double precision arithmetic ε = 2.2⋅10−16, in which case nε is not small. Computing sums is a common operation in PageRank algorithms. We need a better algorithm. Nearly since the dawn of computation, this problem has been studied [Higham, 2002]. (In fact, the reference for the remainder of this section is Higham [2002], which contains an exhaustive treatment of the problem.) The proposed solutions range from phenomenally complicated to subtly simple,

PDF Image | ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATION

PDF Search Title:

ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATION

Original File Name Searched:

gleich.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)