{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import networkx as nx\n", "import time\n", "import math\n", "import pandas as pd\n", "import scipy as sp\n", "import plotly.express as px\n", "import plotly.graph_objs as go\n", "from scipy.sparse import *\n", "from scipy.sparse.linalg import norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create two graphs from the list of edges downloaded from the Snap database. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G1 = nx.read_edgelist('../data/web-Stanford.txt', create_using=nx.DiGraph(), nodetype=int)\n", "\n", "# G2 = nx.read_edgelist('../data/web-BerkStan.txt', create_using=nx.DiGraph(), nodetype=int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating the transition probability matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# square matrix of size n x n, where n is the number of nodes in the graph. The matrix is filled with zeros and the (i,j) element is x if the node i is connected to the node j. Where x is 1/(number of nodes connected to i).\n", "\n", "def create_matrix(G):\n", " n = G.number_of_nodes()\n", " P = sp.sparse.lil_matrix((n,n))\n", " for i in G.nodes():\n", " for j in G[i]: #G[i] is the list of nodes connected to i, it's neighbors\n", " P[i-1,j-1] = 1/len(G[i])\n", " return P" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To ensure that the random process has a unique stationary distribution and it will not stagnate, the transition matrix P is usually modified to be an irreducible stochastic matrix A (called the Google matrix) as follows\n", "\n", "$$ A = \\alpha \\tilde{P} + (1-\\alpha)v e^T$$\n", "\n", "Where $\\tilde{P}$ is defined as \n", "\n", "$$ \\tilde{P} = P + v d^T$$\n", "\n", "Where $d \\in \\mathbb{N}^{n \\times 1}$ s a binary vector tracing the indices of dangling web-pages with no hyperlinks, i.e., $d(i ) = 1$ if the `ith` page has no hyperlink, $v \\in \\mathbb{R}^{n \\times 1}$ is a probability vector, $e = [1, 1, . . . , 1]^T$ , and $0 < \\alpha < 1$ is the so-called damping factor that represents the probability in the model that the surfer transfer by clicking a hyperlink rather than other ways" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = G1.number_of_nodes()\n", "P = create_matrix(G1) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the vector `d` solves the dangling nodes problem" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define d as a nx1 sparse matrix, where n is the number of nodes in the graph. The vector is filled with d(i) = 1 if the i row of the matrix P is filled with zeros, other wise is 0\n", "\n", "# d is the vector of dangling nodes\n", "d = sp.sparse.lil_matrix((n,1))\n", "for i in range(n):\n", " if P[i].sum() == 0:\n", " d[i] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The vector v is a probability vector, the sum of its elements bust be one" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define v as the probability vector of size n x 1, where n is the number of nodes in the graph. The vector is filled with 1/n\n", "# https://en.wikipedia.org/wiki/Probability_vector\n", "\n", "v = sp.sparse.lil_matrix((n,1))\n", "for i in range(n):\n", " v[i] = 1/n " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute the transition matrix\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Pt = P + v.dot(d.T)\n", "\n", "# Pt is a sparse matrix too" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# e is a nx1 sparse matrix filled with ones\n", "e = sp.sparse.lil_matrix((1,n))\n", "for i in range(n):\n", " e[0,i] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# # v*eT is a nxn sparse matrix filled all with 1/n, let's call it B\n", "\n", "# B = sp.sparse.lil_matrix((n,n))\n", "# for i in range(n):\n", "# for j in range(n):\n", "# B[i,j] = 1/n\n", "\n", "# A = alpha*Pt + (1-alpha)*B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithm 1 Shifted-Power method for PageRank with multiple damping factors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# pandas dataframe to store the results\n", "df = pd.DataFrame(columns=['alpha', 'iterations', 'tau', 'time'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this should return mv (the number of iteration needed for the convergence), and two vector called x and r. Where x is the vector of the pagerank and r is the residual vector

def Algorithm1(Pt, v, tau, max_mv, a: list):
 
 start_time = time.time()

 u = Pt.dot(v) - v 
 mv = 1 # number of iteration
 r = sp.sparse.lil_matrix((n,1)) 
 Res = sp.sparse.lil_matrix((len(a),1))
 x = sp.sparse.lil_matrix((n,1)) 

 for i in range(len(a)):
 r = a[i]*(u) 
 normed_r = norm(r)
 Res[i] = normed_r 

 if Res[i] > tau:
 x = r + v 

 while max(Res) > tau and mv < max_mv:
 u = Pt*u # should it be the same u of the beginning?
 mv += 1 

 for i in range(len(a)):
 if Res[i] >= tau: 
 r = (a[i]**(mv+1))*(u)
 Res[i] = norm(r)

 if Res[i] > tau:
 x = r + x

 if mv == max_mv:
 print("The algorithm didn't converge in ", max_mv, " iterations")
 else:
 print("The algorithm converged in ", mv, " iterations")

 total_time = time.time() - start_time
 print("The algorithm took ", total_time, " seconds")
 
 return mv, x, r, total_time