diff --git a/tex/src/algorithm.tex b/tex/src/algorithm.tex index eded834..247f8b9 100644 --- a/tex/src/algorithm.tex +++ b/tex/src/algorithm.tex @@ -10,7 +10,7 @@ where $c(v)$ is the closeness centrality defined in \eqref{closeness}. Since we \begin{equation}\label{wrong-farness} f(v) = \frac{1}{c(v)} = \frac{1}{r(v)-1} \displaystyle \sum_{w \in V} d(v,w) \end{equation} -where $r(v) = |R(v)|$ is the cardinality of the set of reachable nodes from $v$. To avoid any problem during the computation, this formula still needs to be modified. Let's assume that the node $v$ that we are considering has just one link at distance $1$ with another node $w$ with \emph{out-degree} 0. If we consider the formula \eqref{wrong-farness} we will get a false result: $v$ would appear to be very central, even if it's obviously very peripheral. To avoid this problem, we can generalize the formula \eqref{wrong-farness} normalizing as suggested in \texttt{[Lin 1976; Wasserman and Faust 1994; Boldi and Vigna 2013; 2014; Olsen et al. 2014]} +where $r(v) = |R(v)|$ is the cardinality of the set of reachable nodes from $v$. To avoid any problem during the computation, this formula still needs to be modified. Let's assume that the node $v$ that we are considering has just one link at distance $1$ with another node $w$ with \emph{out-degree} 0. If we consider the formula \eqref{wrong-farness} we will get a false result: $v$ would appear to be very central, even if it's obviously very peripheral. To avoid this problem, we can generalize the formula \eqref{wrong-farness} normalizing as suggested in \cite{wasserman_faust_1994,doi:10.1080/15427951.2013.865686, olsen2014upoa} \begin{equation}\label{farness} f(v) = \frac{n-1}{(r(v)-1)^2} \sum_{w \in R(v)} d(v,w) diff --git a/tex/src/improvement.tex b/tex/src/improvement.tex index 5d79857..b415be5 100644 --- a/tex/src/improvement.tex +++ b/tex/src/improvement.tex @@ -2,7 +2,7 @@ The algorithm shown in this paper is very versatile. We have tested it with two different graphs and obtained excellent results. But there could be more. -\s \nd It can be adapted very easily to compute other centralities, as the harmonic one. Given a graph $G = (V,E)$ and a node $v \in V$, it's defined as +\s \nd It can be adapted very easily to compute other centralities, as the harmonic one \cite{2000}. Given a graph $G = (V,E)$ and a node $v \in V$, it's defined as \begin{equation} h(v) = \sum_{w \neq v} \frac{1}{d(v,w)} diff --git a/tex/src/introduction.tex b/tex/src/introduction.tex index f10ed70..099d682 100644 --- a/tex/src/introduction.tex +++ b/tex/src/introduction.tex @@ -1,7 +1,7 @@ \section{Introduction} A graph $G= (V,E)$ is a pair of a sets. Where $V = \{v_1,...,v_n\}$ is the set of \emph{nodes}, and $E \subseteq V \times V, ~ E = \{(v_i,v_j),...\}$ is the set of \emph{edges} (with $|E| = m \leq n^2$). \s -\nd In this paper we discuss the problem of identifying the most central nodes in a network using the measure of \emph{closeness centrality}. Given a connected graph, the closeness centrality of a node $v \in V$ is defined as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Normalizing, we obtain the following formula: +\nd In this paper we discuss the problem of identifying the most central nodes in a network using the measure of \emph{closeness centrality}. Given a connected graph, the closeness centrality of a node $v \in V$ is defined \cite{Sodeur2019} as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Normalizing, we obtain the following formula: \begin{equation}\label{closeness} c(v) = \frac{n-1}{\displaystyle \sum_{w \in V} d(v,w)} @@ -27,6 +27,6 @@ Since we are dealing with a web-scale network any brute force algorithm would re \noindent We can solve the APSP problem either using the fast matrix multiplication or, as made in this paper, implementing a breath-first-search (BFS) method. There are several reason to prefer this second approach over the first one in this type of problems. \s -\noindent A graph is a data structure and we can describe it in different ways. Choosing one over another can have an enormous impact on performance. In this case, we need to remember the type of graph that we are dealing with: a very big and sparse one. The fast matrix multiplication implement the graph as an $n\times n$ matrix where the position $(i,j)$ is zero if the nodes $i,j$ are not linked, 1 (or a generic number if weighted) otherwise. This method requires $O(n^2)$ space in memory. That is an enormous quantity on a web-scale graph. Furthermore the time complexity is $O(n^{2.373} \log n)\}$ \texttt{[Zwick 2002; Williams 2012]} \s +\noindent A graph is a data structure and we can describe it in different ways \cite{skienna08}. Choosing one over another can have an enormous impact on performance. In this case, we need to remember the type of graph that we are dealing with: a very big and sparse one. The fast matrix multiplication implement the graph as an $n\times n$ matrix where the position $(i,j)$ is zero if the nodes $i,j$ are not linked, 1 (or a generic number if weighted) otherwise. This method requires $O(n^2)$ space in memory. That is an enormous quantity on a web-scale graph. Furthermore the time complexity is $O(n^{2.373} \log n)\}$ \cite{10.1145/567112.567114} \s \noindent Using the BFS method the space complexity is $O(n+m)$, which is a very lower value compared to the previous method. In terms of time, the complexity is $O(nm)$. Unfortunately, this is not enough to compute all the distances in a reasonable time. It has also been proven that this method can not be improved. In this paper I propose an exact algorithm to compute the top-$k$ nodes with the higher closeness centrality. diff --git a/tex/src/main.pdf b/tex/src/main.pdf index 65811df..8b4ff93 100644 Binary files a/tex/src/main.pdf and b/tex/src/main.pdf differ diff --git a/tex/src/ref.bib b/tex/src/ref.bib index 4596499..21e8bbb 100644 --- a/tex/src/ref.bib +++ b/tex/src/ref.bib @@ -15,3 +15,101 @@ biburl = {https://dblp.org/rec/journals/corr/BergaminiBCMM17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } + +@article{10.1145/567112.567114, +author = {Zwick, Uri}, +title = {All Pairs Shortest Paths Using Bridging Sets and Rectangular Matrix Multiplication}, +year = {2002}, +issue_date = {May 2002}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +volume = {49}, +number = {3}, +issn = {0004-5411}, +url = {https://doi.org/10.1145/567112.567114}, +doi = {10.1145/567112.567114}, +abstract = {We present two new algorithms for solving the All Pairs Shortest Paths (APSP) problem for weighted directed graphs. Both algorithms use fast matrix multiplication algorithms.The first algorithm solves the APSP problem for weighted directed graphs in which the edge weights are integers of small absolute value in \~{O}(n2+μ) time, where μ satisfies the equation ω(1, μ, 1) = 1 + 2μ and ω(1, μ, 1) is the exponent of the multiplication of an n \texttimes{} nμ matrix by an nμ \texttimes{} n matrix. Currently, the best available bounds on ω(1, μ, 1), obtained by Coppersmith, imply that μ < 0.575. The running time of our algorithm is therefore O(n2.575). Our algorithm improves on the \~{O}(n(3c+ω)/2) time algorithm, where ω = ω(1, 1, 1) < 2.376 is the usual exponent of matrix multiplication, obtained by Alon et al., whose running time is only known to be O(n2.688).The second algorithm solves the APSP problem almost exactly for directed graphs with arbitrary nonnegative real weights. The algorithm runs in \~{O}((nω/ϵ) log(W/ϵ)) time, where ϵ > 0 is an error parameter and W is the largest edge weight in the graph, after the edge weights are scaled so that the smallest non-zero edge weight in the graph is 1. It returns estimates of all the distances in the graph with a stretch of at most 1 + ϵ. Corresponding paths can also be found efficiently.}, +journal = {J. ACM}, +month = {may}, +pages = {289-317}, +numpages = {29}, +keywords = {shortest paths, Matrix multiplication} +} + +@book{skienna08, + abstract = {This expanded and updated second edition of a classic bestseller continues to take the mystery out of designing and analyzing algorithms and their efficacy and efficiency. Expanding on the highly successful formula of the first edition, the book now serves as the primary textbook of choice for any algorithm design course while maintaining its status as the premier practical reference guide to algorithms. NEW: (1) Incorporates twice the tutorial material and exercises. (2) Provides full online support for lecturers, and a completely updated and improved website component with lecture slides, audio and video. (3) Contains a highly unique catalog of the 75 most important algorithmic problems. (4) Includes new war stories and interview problems, relating experiences from real-world applications. Unique, handy reference package with a practical, hands-on appeal to a wide audience This classic bestseller has been expanded and updated with twice the original tutorial material and exercises Contains a highly unique catalog of the 75 most important algorithmic problems Additional useful information such as lecture slides and updates available via author's website.}, + added-at = {2015-05-11T09:08:43.000+0200}, + address = {London}, + author = {Skiena, Steven S.}, + biburl = {https://www.bibsonomy.org/bibtex/29b2f5050241fea63ff5f49cc29b5bebf/ytyoun}, + doi = {10.1007/978-1-84800-070-4}, + interhash = {1087fa2b9db733b071ece02fc207a88d}, + intrahash = {9b2f5050241fea63ff5f49cc29b5bebf}, + isbn = {9781848000704 1848000707 9781848000698 1848000693}, + keywords = {algorithm programming textbook}, + publisher = {Springer}, + refid = {370729337}, + timestamp = {2015-12-12T14:24:40.000+0100}, + title = {The Algorithm Design Manual}, + year = 2008 +} + +@Inbook{Sodeur2019, +author="Sodeur, Wolfgang", +editor="Holzer, Boris +and Stegbauer, Christian", +title="Bavelas (1950): Communication Patterns in Task-Oriented Groups", +bookTitle="Schl{\"u}sselwerke der Netzwerkforschung", +year="2019", +publisher="Springer Fachmedien Wiesbaden", +address="Wiesbaden", +pages="35--38", +abstract="Auf der Basis fr{\"u}herer Arbeiten zur Gruppenstruktur (Bavelas 1948) entfaltet Bavelas in diesem Aufsatz Vorschl{\"a}ge zur Untersuchung der Zusammenh{\"a}nge zwischen der Verf{\"u}gbarkeit von Kommunikationswegen in Gruppen und einigen Folgen f{\"u}r die Gruppenleistung und die Zufriedenheit der Gruppenmitglieder. Bavelas war schon etwa 1945 noch als Graduierter mit Kurt Lewin von Iowa zum M. I. T. gekommenen, konnte seine Dissertation (1948) nach Lewins pl{\"o}tzlichem Tod erst bei Dorwin Cartwright abschlie{\ss}en. Dieser und Leon Festinger waren es auch, die {\"u}ber ihren Wechsel nach Michigan hinaus Bavelas am M. I. T. f{\"o}rderten. Auf Festinger geht wohl auch der Kontakt von Bavelas mit R. Duncan Luce zur{\"u}ck, der noch vor Abschluss seiner Dissertation in Mathematik ein wichtiger Mitarbeiter in der Arbeitsgruppe von Bavelas wurde. Diese und zahlreiche weitere Einzelheiten sind dem Buch von Linton C. Freeman zu entnehmen ({\textrightarrow} 2004, S. 67 ff.).", +isbn="978-3-658-21742-6", +doi="10.1007/978-3-658-21742-6_8", +url="https://doi.org/10.1007/978-3-658-21742-6_8" +} + + +@article{2000, + title={Harmony in the small-world}, + volume={285}, + ISSN={0378-4371}, + url={http://dx.doi.org/10.1016/S0378-4371(00)00311-3}, + DOI={10.1016/s0378-4371(00)00311-3}, + number={3–4}, + journal={Physica A: Statistical Mechanics and its Applications}, + publisher={Elsevier BV}, + author={Marchiori, Massimo and Latora, Vito}, + year={2000}, + month={Oct}, + pages={539–546} } + + +@book{wasserman_faust_1994, place={Cambridge}, series={Structural Analysis in the Social Sciences}, title={Social Network Analysis: Methods and Applications}, DOI={10.1017/CBO9780511815478}, publisher={Cambridge University Press}, author={Wasserman, Stanley and Faust, Katherine}, year={1994}, collection={Structural Analysis in the Social Sciences}} + +@article{doi:10.1080/15427951.2013.865686, +author = {Paolo Boldi and Sebastiano Vigna}, +title = {Axioms for Centrality}, +journal = {Internet Mathematics}, +volume = {10}, +number = {3-4}, +pages = {222-262}, +year = {2014}, +publisher = {Taylor & Francis}, +doi = {10.1080/15427951.2013.865686}, + +URL = {https://doi.org/10.1080/15427951.2013.865686}, +eprint = {https://doi.org/10.1080/15427951.2013.865686} +} + +@incollection{olsen2014upoa, + author={Are {Olsen} and Sara {Jutterstr{\o}m} and Truls {Johannessen}}, + title={{Underway physical oceanography and carbon dioxide measurements during Nuka Arctica cruise 26NA20090722}}, + year={2014}, + doi={10.1594/PANGAEA.812457}, + url={https://doi.org/10.1594/PANGAEA.812457}, + organization={Bjerknes Centre for Climate Research}, + type={data set}, + publisher={PANGAEA} +}