minor fixes

pull/1/head
Luca Lombardo 2 years ago
parent da82dcfa14
commit 362287fcde

1
.gitignore vendored

@ -277,3 +277,4 @@ TSWLatexianTemp*
*.txt *.txt
sources/istruzioni.md

@ -1,6 +1,6 @@
alpha,iterations,tau,time alpha,products m-v,tau,time
"[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",19,1e-05,10.23 "[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",19,1e-05,13.51
"[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",56,1e-06,26.99 "[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",56,1e-06,34.74
"[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",113,1e-07,63.37 "[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",113,1e-07,90.85
"[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",275,1e-08,99.99 "[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",275,1e-08,157.45
"[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",454,1e-09,136.72 "[0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]",454,1e-09,221.05

1 alpha iterations products m-v tau time
2 [0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] 19 19 1e-05 10.23 13.51
3 [0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] 56 56 1e-06 26.99 34.74
4 [0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] 113 113 1e-07 63.37 90.85
5 [0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] 275 275 1e-08 99.99 157.45
6 [0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] 454 454 1e-09 136.72 221.05

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

@ -1,26 +0,0 @@
# MAIL DI BINI
Buongiorno,
il progetto del corso di Calcolo Scientifico deve necessariamente intersecare gli argomenti sviluppati a lezione.
Le propongo allora questo progetto che tocca i metodi delle potenze, il processo di Arnoldi e il GMRES con restart e riguarda il page rank.
Nel lavoro che trova allegato (_autori Shen, Su, Carpentieri e Weng_) sono introdotti e confrontati alcuni metodi per effettuare il calcolo del vettore PageRank di una stessa rete ma con diversi valori del parametro di damping. Gli autori motivano questo interesse scrivendo che questo problema computazionale si incontra nel progetto di dispositivi anti-spam e citano per questo il lavoro _P.G. Constantine , D.F. Gleich , Random alpha PageRank, Internet Math. 6 (2009) 189236._
Nel lavoro allegato vengono introdotti e confrontati alcuni metodi per effettuare questo calcolo. Precisamente:
`Algorithm 1 (pag. 6):` Metodo delle potenze adattato al calcolo del PageRank con diversi valori del parametro di damping
`Algorithm 4 (pag. 10):` Metodo del GMRES adattato al calcolo del PageRank con diversi valori del parametro di damping. Questo algoritmo utilizza il metodo di Arnoldi e il GMRES con restart che gli autori riportano per completezza rispettivamente nell'Algorithm 2 a pag. 7 e nell'Algorithm 3 a pag. 8.
Gli autori combinano poi `Algorithm 4` e `Algorithm 1` nell'`Algorithm 5` a pag. 10 per aumentare l'efficienza.
Il progetto riguarderebbe l'implementazione di `Algorithm 1` e `Algorithm 4` e un loro confronto per valutare l'efficienza in termini di numero di iterazioni e di tempo di cpu impiegato. Problemi test validi sono la matrice web-Stanford e web-BerkStan.
Se nella descrizione degli algoritmi ci fossero dei punti non chiari, bisogna andare a leggere la parte di testo che descrive gli algoritmi stessi. Ad esempio nella linea 14 dell'algoritmo 4 compare un vettore z che gli autori non definiscono dentro l'algoritmo. Questo vettore compare nella formula (40) e infatti viene definito nella riga subito dopo la (40).
Mi faccia sapere se incontra difficolta' di comprensione degli algoritmi (talvolta gli autori non sono molto precisi nel riportare i loro risultati). Ci risentiamo poi per mettere a punto i test da fare.
A presto,
Dario Bini

Binary file not shown.

12
tex/main.tex vendored

@ -2,6 +2,8 @@
\usepackage[margin=1in]{geometry} \usepackage[margin=1in]{geometry}
\usepackage[utf8]{inputenc} \usepackage[utf8]{inputenc}
\usepackage[english]{babel} \usepackage[english]{babel}
\usepackage[T1]{fontenc}
\usepackage{fourier}
\usepackage{amsthm} \usepackage{amsthm}
\usepackage{amssymb} \usepackage{amssymb}
\usepackage{amsmath} \usepackage{amsmath}
@ -62,9 +64,10 @@ or, equivalently, as the solution of the linear system
\begin{equation}\label{eq:pr2} \begin{equation}\label{eq:pr2}
(I - \alpha \tilde P)x = (1 - \alpha)v (I - \alpha \tilde P)x = (1 - \alpha)v
\end{equation} \end{equation}
In the past decade or so, considerable research attention has been devoted to the efficient solution of problems \ref{eq:pr} \ref{eq:pr2}, especially when $n$ is very large. For moderate values of the damping factor, e.g. for $\alpha = 0.85$ as initially suggested by Google for search engine rankings, solution strategies based on the simple Power method have proved to be very effective. However, when $\alpha$ approaches 1, as is required in some applications, the convergence rates of classical stationary iterative methods including the Power method tend to deteriorate sharply, and more robust algorithms need to be used. \\
One area that is largely unexplored in PageRank computations is the efficient solution of problems with the same network structure but multiple damping factors. For example, in the Random Alpha PageRank model used in the design of anti-spam mechanism \cite{Constantine2009Random}, the rankings corresponding to many different damping factors close to 1 need to be computed simultaneously. This problem can be expressed mathematically as solving a sequence of linear systems \noindent In the past decade or so, considerable research attention has been devoted to the efficient solution of problems \ref{eq:pr} \ref{eq:pr2}, especially when $n$ is very large. For moderate values of the damping factor, e.g. for $\alpha = 0.85$ as initially suggested by Google for search engine rankings, solution strategies based on the simple Power method have proved to be very effective. However, when $\alpha$ approaches 1, as is required in some applications, the convergence rates of classical stationary iterative methods including the Power method tend to deteriorate sharply, and more robust algorithms need to be used. \vspace*{0.4cm}
\noindent One area that is largely unexplored in PageRank computations is the efficient solution of problems with the same network structure but multiple damping factors. For example, in the Random Alpha PageRank model used in the design of anti-spam mechanism \cite{Constantine2009Random}, the rankings corresponding to many different damping factors close to 1 need to be computed simultaneously. This problem can be expressed mathematically as solving a sequence of linear systems
\begin{equation}\label{eq:pr3} \begin{equation}\label{eq:pr3}
(I - \alpha_i \tilde P)x_i = (1 - \alpha_i)v \quad \alpha_i \in (0, 1) \quad \forall i \in \{1, 2, ..., s\} S (I - \alpha_i \tilde P)x_i = (1 - \alpha_i)v \quad \alpha_i \in (0, 1) \quad \forall i \in \{1, 2, ..., s\} S
\end{equation} \end{equation}
@ -83,8 +86,9 @@ The Power method is considered one of the algorithms of choice for solving eithe
\end{equation} \end{equation}
The convergence behavior is determined mainly by the ratio between the two largest eigenvalues of A. When $\alpha$ gets closer to $1$, though, the convergence can slow down significantly. \\ The convergence behavior is determined mainly by the ratio between the two largest eigenvalues of A. When $\alpha$ gets closer to $1$, though, the convergence can slow down significantly. \\
\noindent As stated in \cite{SHEN2022126799} The number of iterations required to reduce the initial residual down to a tolerance $\tau$, measured as $\tau = \lVert Ax_k - x_k \rVert = \lVert x_{k+1} - x_k \rVert$ can be estimated as $\frac{\log_10 \tau}{\log_10 \alpha}$. For example, when $\tau = 10^{-8}$ the Power method requires about 175 steps to converge for $\alpha = 0.9$ but the iteration count rapidly grows to 1833 for $\alpha = 0.99$. Therefore, for values of the damping parameter very close to 1 more robust alternatives to the simple Power algorithm should be used. \noindent As stated in \cite{SHEN2022126799} The number of iterations required to reduce the initial residual down to a tolerance $\tau$, measured as $\tau = \lVert Ax_k - x_k \rVert = \lVert x_{k+1} - x_k \rVert$ can be estimated as $\frac{\log_{10} \tau}{\log_{10} \alpha}$. For example, when $\tau = 10^{-8}$ the Power method requires about 175 steps to converge for $\alpha = 0.9$ but the iteration count rapidly grows to 1833 for $\alpha = 0.99$. Therefore, for values of the damping parameter very close to 1 more robust alternatives to the simple Power algorithm should be used.
\clearpage
\section{The shifted power method for PageRank computations} \section{The shifted power method for PageRank computations}
In this section we consider extensions of stationary iterative methods for the solution of PageRank problems with multiple damping factors. We look in particular at the Power method, the Gauss-Seidel method, and the GIO iteration scheme. We are concerned with how these methods can be executed with the highest efficiency for solving such problems, especially with the question: for each method, whether there exist an implementation such that the computational cost of solving the PageRank problem with multiple damping factor is comparable to that of solving the ordinary PageRank problem with single damping factor. In this section we consider extensions of stationary iterative methods for the solution of PageRank problems with multiple damping factors. We look in particular at the Power method, the Gauss-Seidel method, and the GIO iteration scheme. We are concerned with how these methods can be executed with the highest efficiency for solving such problems, especially with the question: for each method, whether there exist an implementation such that the computational cost of solving the PageRank problem with multiple damping factor is comparable to that of solving the ordinary PageRank problem with single damping factor.
@ -123,7 +127,7 @@ Since in general each of the $s$ linear systems may require a different number o
\Ensure $mv,~ x^{(i)},~ r^{(i)} ~ (1 \leq i \leq s)$ \Ensure $mv,~ x^{(i)},~ r^{(i)} ~ (1 \leq i \leq s)$
\State Compute $\mu = \tilde P v - v$ \State Compute $\mu = \tilde P v - v$
\State Set $mv =1$ \State Set $mv =1$
\For $i = 1:s$ \For {$i = 1:s$}
\State Compute $r^{(i)} = \alpha_i \mu$ \State Compute $r^{(i)} = \alpha_i \mu$
\State Compute $Res(i) = \lVert r^{(i)} \rVert$ \State Compute $Res(i) = \lVert r^{(i)} \rVert$
\If {$Res(i) \geq \tau$} \If {$Res(i) \geq \tau$}

Loading…
Cancel
Save