Andrew Thangaraj
Aug-Nov 2020
Origin of Google
Before Google, world wide web pages had extensive directory sites
Search mostly used to struggle to give relevant results first
Larry Page and Sergey Brin
Developed PageRank as an algorithm to rank web pages based on their importance
Importance was determined by the links connecting the pages
Algorithm rooted in linear algebra, eigenvalues and eigenvectors
Construction
Each web page is a node
Edge from Node \(i\) to Node \(j\) if
Entire www is a “huge” graph
Degrees
Out-degree of a node: number of outgoing links
In-degree of a node: number of nodes that backlink
High in-degree: could indicate “importance” or popularity
\(x_i\): real number that denotes importance of Node \(i\)
\(n_i\): out-degree of Node \(i\)
\(L_i\): set of nodes that link to Node \(i\)
Condition enforced on importance
\[x_i=\sum_{j\in L_i}\dfrac{x_j}{n_j}\]
\(A=\begin{bmatrix} 0&0&0&1/3&0\\ 1&0&0&1/3&0\\ 0&1/2&0&0&0\\ 0&1/2&1/2&0&1\\ 0&0&1/2&1/3&0 \end{bmatrix}\)
\(x=(x_0,x_1,x_2,x_3,x_4)\)
Condition on importance
\(x = Ax\)
Importance vector \(x\) is an eigenvector of \(A\) with eigenvalue 1
\(A\): \((i,j)\)-th entry is \(\dfrac{1}{n_j}\)
Node \(j\) links to Node \(i\)
\(n_j\): out-degree of Node \(j\)
columns of \(A\) add to 1
or, rows of \(A^T\) add to 1
\(v=(1,1,\ldots,1)\) (all-1 vector) is an eigenvector of \(A^T\) with eigenvalue 1
\(\lambda=1\) is an eigenvalue of \(A\)
Importance vector: eigenvector of \(A\) corresponding to eigenvalue 1
\(A=\begin{bmatrix} 0&0&0&1/3&0\\ 1&0&0&1/3&0\\ 0&1/2&0&0&0\\ 0&1/2&1/2&0&1\\ 0&0&1/2&1/3&0 \end{bmatrix}\)
\(x=(2,4,2,6,3)\)
Highest importance: Node 3
Dataset from Laboratory for Web Algorithmics
Sample webgraph: European union domains in 2015
Nodes: 1,070,557,254
Edges: 91,792,261,600
average degree: 85.743
maximum in-degree: 20,252,239
maximum out-degree: 35,340
Importance vector: needs to be efficiently computed