diff --git a/.gitignore b/.gitignore index 57eeefa..8c523ca 100644 --- a/.gitignore +++ b/.gitignore @@ -147,3 +147,4 @@ data/ backup/ sources/ extra/ +html_graphs/ diff --git a/analysis_results.pkl b/analysis_results.pkl index b9a1193..2fb5818 100644 Binary files a/analysis_results.pkl and b/analysis_results.pkl differ diff --git a/main.ipynb b/main.ipynb index 234c4db..f067967 100644 --- a/main.ipynb +++ b/main.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -57,12 +57,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Background theory: The Erdős-Rényi model\n", + "# Random Networks: The Erdős-Rényi model\n", "\n", "\n", "\n", - "Prior to the 1960s, graph theory primarily focused on the characteristics of individual graphs. In the 1960s, Paul Erdős and Alfred Rényi introduced a systematic approach to studying random graphs, which involves analyzing a collection, or ensemble, of many different graphs. Each graph in the ensemble is assigned a probability, and a property is said to hold with probability $P$ if the total probability of the graphs in the ensemble possessing that property is $P$, or if the fraction of graphs in the ensemble with the property is $P$. This method allows for the application of probability theory in conjunction with discrete math to study ensembles of graphs. A property is considered to hold for a class of graphs if the fraction of graphs in the ensemble without the property has zero measure, which is typically referred to as being true for \"almost every\" graph in the ensemble. The terms \"almost surely\" and \"with high probability\" may also be used, with the former generally indicating that the residual probability decreases exponentially with the size of the system\n", + "Prior to the 1960s, graph theory primarily focused on the characteristics of individual graphs. In the 1960s, Paul Erdős and Alfred Rényi introduced a systematic approach to studying random graphs, which involves analyzing a collection, or ensemble, of many different graphs. Each graph in the ensemble is assigned a probability, and a property is said to hold with probability $P$ if the total probability of the graphs in the ensemble possessing that property is $P$, or if the fraction of graphs in the ensemble with the property is $P$. This method allows for the application of probability theory in conjunction with discrete math to study ensembles of graphs. A property is considered to hold for a class of graphs if the fraction of graphs in the ensemble without the property has zero measure, which is typically referred to as being true for \"almost every\" graph in the ensemble. \n", + "\n", + "## Definition of a random graph\n", + "\n", + "Let $E_{n,N}$ denote the set of alla graphs having $n$ given labelled vertices $V_1,V_2, \\dots, V_n$ and $N$ edges. The graphs considered are supposed to be not oriented, without parallel edges and without slings. Thus a graph belonging to $E_{n,N}$ is obtained by choosing $N$ out of the $\\binom{n}{2}$ possible edges between the points $V_1,V_2, \\dots, V_n$, and therefore the number of elements of $E_{n,N}$ is given by the binomial coefficient $\\binom{\\binom{n}{2}}{N}$. \n", + "\n", + "A random graph $\\Gamma_{n,N}$ can be defined as a element of $E_{n,N}$ chosen at random, so that each of the elements of $E_{n,N}$ has the same probability of being chosen, namely $\\frac{1}{\\binom{\\binom{n}{2}}{N}}$.\n", + "\n", + "Let's try to modify this point of view and use a bit of probability theory. _We may consider the formation of a random graph as a stochastic process_ defined as follows: At time $t=1$ we choose out of the $\\binom{n}{2}$ possible edges between the points $V_1,V_2, \\dots, V_n$ $N$ edges, each of this edges having the same probability of being chosen; let this edge be denoted as $e_1$. At time $t=2$ we choose one of the possible $\\binom{n}{2} -1$, different from $e_1$, all this being equiprobable. Continuing this process at time $t=k+1$ we choose one of the possible $\\binom{n}{2} -k$, different from $e_1, e_2, \\dots, e_k$, all this being equiprobable, i.e having the probability $\\frac{1}{\\binom{n}{2} -k}$. We denote $\\Gamma_{n,N}$ the graph obtained by choosing $N$ edges in this way.\n", + "\n", + "> NOTE: the two definitions are equivalent, but the second one is more convenient for the study of the properties of random graphs. According to this interpretation we may study the evolution of random graphs, i.e. the step-by-step unraveling of the structure of the graph when $N$ increases. This will be an essential point in our study of the properties of small-worldness.\n", + "\n", + "\n", + "**SOURCES:** \n", "\n", + "- `[1]` On the evolution of random graphs, P. Erdős, A. Rényi, _Publ. Math. Inst. Hungar. Acad. Sci._, 5, 17-61 (1960).\n", "\n", "## Erdős-Rényi graphs\n", "\n", @@ -82,6 +96,9 @@ "\n", "Another property of interest is the average path length between any two nodes, which is typically of order $\\ln N$ in almost every graph of the ensemble (with $\\langle k \\rangle > 1$ and finite). This small, logarithmic distance is the source of the \"small-world\" phenomena that are characteristic of networks.\n", "\n", + "**SOURCE:**\n", + "- `[i]` Complex Networks: Structure, Robustness, and Function, R. Cohen, S. Havlin, D. ben-Avraham, H. E. Stanley, _Cambridge University Press, 2009_.\n", + "\n", "\n", "## Scale-free networks\n", "\n", @@ -141,6 +158,8 @@ "\n", "The degree distribution is not the only characteristic that can be used to describe a network. Other quantities, such as the degree-degree correlation (between connected nodes), spatial correlations, clustering coefficient, betweenness or centrality distribution, and self-similarity exponents, can also provide insight into the network's structure and behavior.\n", "\n", + "- `[i]` Complex Networks: Structure, Robustness, and Function, R. Cohen, S. Havlin, D. ben-Avraham, H. E. Stanley, _Cambridge University Press, 2009_.\n", + "\n", "# Diameter and fractal dimension\n", "\n", "" + "> **EXTRA:** If you want to see a visualization of a complete different graph, here you can check che collaboration network of the actors on the IMDb website. It has very distinct communities and clusters. Only actors with more then 100 movies have been considered. Click [here](https://lukefleed.xyz/graph/imdb-graph.html) to see the visualization." ] }, { @@ -883,18 +1010,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Introduzione da scrivere\n", + "# Properties of the networks\n", "\n", "To help us visualize the results of our analysis we can create a dataframe and fill it with all the information that we will retrive from our networks in this section.\n", "\n", - "As we'll see in the cells below, the full networks are very big, even after the filtering that we did. This leads to long run times for the functions that we are going to use. To avoid this, we are going to use a sub-sample of the networks. Depending on how much we want to sample, our results will be more or less accurate. \n", + "As we'll see in the cells below, the full networks are very big, even after the filtering that we did. This leads to long run times for the functions that we are going to use. To avoid this, we are going to use a sub-sample of the networks. Consider that depending on how much we want to sample, our results will be more or less accurate. \n", "\n", - "What I suggest to do while reviewing this network is to use higher values for the sampling rate, so that you can see the results faster. This will give you a general idea of how the implemented functions work. Then, at the end of this section I have provided a link from my GitHub repository where you can download the results obtained with very low sampling rates. In this way you can test the functions with mock-networks and see if they work as expected, then we can proceed with the analysis using the more accurate results that required more time to compute." + "What I suggest to do while reviewing this notebook is to use higher values for the sampling rate, so that you can see the results faster. This will give you a general idea of how the implemented functions work. Then, at the end of this section I have provided a link from my GitHub repository where you can download the results obtained with very low sampling rates. In this way you can test the functions with mock-networks and see if they work as expected, then we can proceed with the analysis using the more accurate results that required more time to compute." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ @@ -906,7 +1033,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 15, "metadata": {}, "outputs": [ { @@ -938,6 +1065,7 @@ " log N\n", " Average Shortest Path Length\n", " betweenness centrality\n", + " omega-coefficient\n", " \n", " \n", " \n", @@ -951,6 +1079,7 @@ " 8.778480\n", " NaN\n", " NaN\n", + " NaN\n", " \n", " \n", " 1\n", @@ -962,6 +1091,7 @@ " 8.030410\n", " NaN\n", " NaN\n", + " NaN\n", " \n", " \n", " 2\n", @@ -973,26 +1103,29 @@ " 7.751045\n", " NaN\n", " NaN\n", + " NaN\n", " \n", " \n", " 3\n", " Brightkite Friendship Graph\n", - " 5420\n", - " 14690\n", + " 1500\n", + " 1170\n", + " NaN\n", " NaN\n", + " 7.313220\n", " NaN\n", - " 8.597851\n", " NaN\n", " NaN\n", " \n", " \n", " 4\n", - " (Filtered) Gowalla Friendship Graph\n", - " 2294\n", - " 5548\n", + " Gowalla Friendship Graph\n", + " 1500\n", + " 2300\n", + " NaN\n", " NaN\n", + " 7.313220\n", " NaN\n", - " 7.738052\n", " NaN\n", " NaN\n", " \n", @@ -1006,38 +1139,39 @@ " 7.242082\n", " NaN\n", " NaN\n", + " NaN\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", + " Graph Number of Nodes Number of Edges Average Degree \\\n", + "0 Brightkite Checkins Graph 6493 292973 NaN \n", + "1 Gowalla Checkins Graph 3073 62790 NaN \n", + "2 Foursquare Checkins Graph 2324 246702 NaN \n", + "3 Brightkite Friendship Graph 1500 1170 NaN \n", + "4 Gowalla Friendship Graph 1500 2300 NaN \n", + "5 Foursquare Friendship Graph 1397 5323 NaN \n", "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 NaN NaN 8.778480 \n", - "1 NaN NaN 8.030410 \n", - "2 NaN NaN 7.751045 \n", - "3 NaN NaN 8.597851 \n", - "4 NaN NaN 7.738052 \n", - "5 NaN NaN 7.242082 \n", + " Average Clustering Coefficient log N Average Shortest Path Length \\\n", + "0 NaN 8.778480 NaN \n", + "1 NaN 8.030410 NaN \n", + "2 NaN 7.751045 NaN \n", + "3 NaN 7.313220 NaN \n", + "4 NaN 7.313220 NaN \n", + "5 NaN 7.242082 NaN \n", "\n", - " Average Shortest Path Length betweenness centrality \n", - "0 NaN NaN \n", - "1 NaN NaN \n", - "2 NaN NaN \n", - "3 NaN NaN \n", - "4 NaN NaN \n", - "5 NaN NaN " + " betweenness centrality omega-coefficient \n", + "0 NaN NaN \n", + "1 NaN NaN \n", + "2 NaN NaN \n", + "3 NaN NaN \n", + "4 NaN NaN \n", + "5 NaN NaN " ] }, - "execution_count": 38, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -1072,7 +1206,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -1097,113 +1231,55 @@ " \n", " \n", " Graph\n", - " Number of Nodes\n", - " Number of Edges\n", " Average Degree\n", - " Average Clustering Coefficient\n", - " log N\n", - " Average Shortest Path Length\n", - " betweenness centrality\n", " \n", " \n", " \n", " \n", " 0\n", " Brightkite Checkins Graph\n", - " 6493\n", - " 292973\n", " 90.242723\n", - " NaN\n", - " 8.778480\n", - " NaN\n", - " NaN\n", " \n", " \n", " 1\n", " Gowalla Checkins Graph\n", - " 3073\n", - " 62790\n", " 40.865604\n", - " NaN\n", - " 8.030410\n", - " NaN\n", - " NaN\n", " \n", " \n", " 2\n", " Foursquare Checkins Graph\n", - " 2324\n", - " 246702\n", " 212.30809\n", - " NaN\n", - " 7.751045\n", - " NaN\n", - " NaN\n", " \n", " \n", " 3\n", " Brightkite Friendship Graph\n", - " 5420\n", - " 14690\n", - " 5.420664\n", - " NaN\n", - " 8.597851\n", - " NaN\n", - " NaN\n", + " 1.56\n", " \n", " \n", " 4\n", - " (Filtered) Gowalla Friendship Graph\n", - " 2294\n", - " 5548\n", - " 4.836966\n", - " NaN\n", - " 7.738052\n", - " NaN\n", - " NaN\n", + " Gowalla Friendship Graph\n", + " 3.066667\n", " \n", " \n", " 5\n", " Foursquare Friendship Graph\n", - " 1397\n", - " 5323\n", " 7.620616\n", - " NaN\n", - " 7.242082\n", - " NaN\n", - " NaN\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", - "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 90.242723 NaN 8.778480 \n", - "1 40.865604 NaN 8.030410 \n", - "2 212.30809 NaN 7.751045 \n", - "3 5.420664 NaN 8.597851 \n", - "4 4.836966 NaN 7.738052 \n", - "5 7.620616 NaN 7.242082 \n", - "\n", - " Average Shortest Path Length betweenness centrality \n", - "0 NaN NaN \n", - "1 NaN NaN \n", - "2 NaN NaN \n", - "3 NaN NaN \n", - "4 NaN NaN \n", - "5 NaN NaN " + " Graph Average Degree\n", + "0 Brightkite Checkins Graph 90.242723\n", + "1 Gowalla Checkins Graph 40.865604\n", + "2 Foursquare Checkins Graph 212.30809\n", + "3 Brightkite Friendship Graph 1.56\n", + "4 Gowalla Friendship Graph 3.066667\n", + "5 Foursquare Friendship Graph 7.620616" ] }, - "execution_count": 39, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -1213,7 +1289,7 @@ " avg_deg = np.mean([d for n, d in G.degree()])\n", " analysis_results.loc[analysis_results['Graph'] == G.name, 'Average Degree'] = avg_deg\n", "\n", - "analysis_results" + "analysis_results[['Graph', 'Average Degree']]" ] }, { @@ -1225,9 +1301,7 @@ "\n", "The clustering coefficient is usually related to a community represented by local structures. The usual definition of clustering is related to the number of triangles in the network. The clustering is high if two nodes sharing a neighbor have a high probability of being connected to each other. There are two common definitions of clustering. The first is global,\n", "\n", - "\\begin{equation}\n", - " C = \\frac{3 \\times \\text{the number of triangles in the network}}{\\text{the number of connected triples of vertices}}\n", - "\\end{equation}\n", + "$$ C = \\frac{3 \\times \\text{the number of triangles in the network}}{\\text{the number of connected triples of vertices}}$$\n", "\n", "where a “connected triple” means a single vertex with edges running to an unordered\n", "pair of other vertices. \n", @@ -1246,11 +1320,13 @@ "\n", "In both cases the clustering is in the range $0 \\leq C \\leq 1$. \n", "\n", - "In random graph models such as the ER model and the configuration model, the clustering coefficient is low and decreases to $0$ as the system size increases. This is also the situation in many growing network models. However, in many real-world networks the clustering coefficient is rather high and remains constant for large network sizes. This observation led to the introduction of the small-world model, which offers a combination of a regular lattice with high clustering and a random graph. \n", + "In random graph models such as the ER model and the configuration model, the clustering coefficient is low and decreases to $0$ as the system size increases. This is also the situation in many growing network models. However, in many real-world networks the clustering coefficient is rather high and remains constant for large network sizes. \n", + "\n", + "> This observation led to the introduction of the small-world model, which offers a combination of a regular lattice with high clustering and a random graph. \n", "\n", "---\n", "\n", - "As one can imagine by the definition given above, this operation is very expensive. The library `networkx` provides a function to compute the clustering coefficient of a graph. In particular, the function `average_clustering` computes the average clustering coefficient of a graph. \n", + "The library `networkx` provides a function to compute the clustering coefficient of a graph. In particular, the function `average_clustering` computes the average clustering coefficient of a graph. \n", "\n", " 8\u001b[0m \u001b[39mfor\u001b[39;00m graph \u001b[39min\u001b[39;00m graphs_all:\n\u001b[1;32m 9\u001b[0m G \u001b[39m=\u001b[39m create_random_graphs(graph, model\u001b[39m=\u001b[39mmodel_name, save \u001b[39m=\u001b[39m \u001b[39mFalse\u001b[39;00m)\n\u001b[1;32m 10\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mRandom graph created for \u001b[39m\u001b[39m\"\u001b[39m, graph\u001b[39m.\u001b[39mname, \u001b[39m\"\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39mStarting computation of betweenness centrality...\u001b[39m\u001b[39m\"\u001b[39m)\n", + "\u001b[0;31mNameError\u001b[0m: name 'graphs_all' is not defined" ] } ], @@ -16622,14 +16881,13 @@ "# As said before, for a quick testing I suggest to use k=0.6 and at least k=0.4 for accurate results\n", "\n", "# uncomment the model that you want to use for the random graphs\n", - "\n", "# model_name = 'watts_strogatz'\n", "model_name = 'erdos_renyi'\n", "\n", "random_graphs = {}\n", "for graph in graphs_all:\n", " G = create_random_graphs(graph, model=model_name, save = False)\n", - " print(\"Random graph created for \", graph.name, \"Starting computation of betweenness centrality...\")\n", + " print(\"Random graph created for \", graph.name, \"\\nStarting computation of betweenness centrality...\")\n", " betweenness_centrality = np.mean(list(betweenness_centrality_parallel(G, 6, k = 0.4).values()))\n", " print(\"\\tBetweenness centrality for Erdos-Renyi random graph: \", betweenness_centrality)\n", " random_graphs[graph.name] = betweenness_centrality\n", @@ -16638,37 +16896,12 @@ }, { "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'Brightkite Checkins Graph': 0.0003728834232472551,\n", - " 'Gowalla Checkins Graph': 0.0009215261155179815,\n", - " 'Foursquare Checkins Graph': 0.0006522226121634739,\n", - " 'Brightkite Friendship Graph': 0.0016407812858385549,\n", - " 'Gowalla Friendship Graph': 0.0037251547240147328,\n", - " 'Foursquare Friendship Graph': 0.0042446600624415146}" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "random_graphs" - ] - }, - { - "cell_type": "code", - "execution_count": 16, + "execution_count": 22, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -16744,41 +16977,23 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 24, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'Brightkite Checkins Graph': 0.6426519071903248,\n", - " 'Gowalla Checkins Graph': 0.6159366386543966,\n", - " 'Foursquare Checkins Graph': 0.6949399573838294,\n", - " 'Brightkite Friendship Graph': 0.4044470961191924,\n", - " 'Gowalla Friendship Graph': 0.4228365321024048,\n", - " 'Foursquare Friendship Graph': 0.4585372995852263}" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "generalized_cc = {}\n", "for graph in graphs_all:\n", - " generalized_cc[graph.name] = generalized_average_clustering_coefficient(graph)\n", - "\n", - "generalized_cc" + " generalized_cc[graph.name] = generalized_average_clustering_coefficient(graph)" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 25, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -16788,10 +17003,6 @@ } ], "source": [ - "# now we can compare the results of the generalized average clustering coefficient with the original average clustering coefficient. Use matplotlib to plot the results as an histogram with two bars for each graph\n", - "\n", - "import matplotlib.pyplot as plt\n", - "\n", "fig, ax = plt.subplots(figsize=(15, 10))\n", "index = np.arange(len(generalized_cc))\n", "bar_width = 0.35\n", @@ -16800,12 +17011,12 @@ "rects1 = plt.bar(index, analysis_results['Average Clustering Coefficient'], bar_width,\n", "alpha=opacity,\n", "color='b',\n", - "label='Original Graph')\n", + "label='Standard Clustering')\n", "\n", "rects2 = plt.bar(index + bar_width, generalized_cc.values(), bar_width,\n", "alpha=opacity,\n", "color='g',\n", - "label='Generalized Graph')\n", + "label='Generalized Clustering')\n", "\n", "plt.xlabel('Graph')\n", "plt.ylabel('Average Clustering Coefficient')\n", @@ -16832,14 +17043,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Omega coefficient\n", + "## Conclusion: Omega coefficient\n", "\n", "We have already discussed a lot in the previous sections about this measure, let's see the results that we obtained after days of computations on the server:" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 25, "metadata": {}, "outputs": [ { @@ -16871,48 +17082,48 @@ " \n", " 0\n", " Brightkite Checkins Graph\n", - " -0.180\n", + " NaN\n", " \n", " \n", " 1\n", " Gowalla Checkins Graph\n", - " -0.240\n", + " NaN\n", " \n", " \n", " 2\n", " Foursquare Checkins Graph\n", - " -0.056\n", + " NaN\n", " \n", " \n", " 3\n", " Brightkite Friendship Graph\n", - " -0.200\n", + " NaN\n", " \n", " \n", " 4\n", " Gowalla Friendship Graph\n", - " -0.250\n", + " NaN\n", " \n", " \n", " 5\n", " Foursquare Friendship Graph\n", - " -0.170\n", + " NaN\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Graph omega-coefficient\n", - "0 Brightkite Checkins Graph -0.180\n", - "1 Gowalla Checkins Graph -0.240\n", - "2 Foursquare Checkins Graph -0.056\n", - "3 Brightkite Friendship Graph -0.200\n", - "4 Gowalla Friendship Graph -0.250\n", - "5 Foursquare Friendship Graph -0.170" + " Graph omega-coefficient\n", + "0 Brightkite Checkins Graph NaN\n", + "1 Gowalla Checkins Graph NaN\n", + "2 Foursquare Checkins Graph NaN\n", + "3 Brightkite Friendship Graph NaN\n", + "4 Gowalla Friendship Graph NaN\n", + "5 Foursquare Friendship Graph NaN" ] }, - "execution_count": 11, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } @@ -16926,17 +17137,42 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This results are a bit of a surprise. The small-world coefficient (omega) measures how much a network is like a lattice or a random graph. Negative values mean G is similar to a lattice whereas positive values mean G is a random graph. Values close to 0 mean that G has small-world characteristics.\n", + "To give you a better idea of how time consuming is this computation, I will report below the time that it took to compute the omega coefficient for the networks generated from all this networks:\n", + "\n", + "\n", + "\n", + "| Network | Time |\n", + "|:-------:|:----:|\n", + "| Brightkite Checkins | 9d 11h 25m |\n", + "| Gowalla Checkins | 3d 2h 55m |\n", + "| FourSquare Checkins | 6d 14h 13m |\n", + "| Brightkite Friendships | 17h 55m |\n", + "| Gowalla Friendships | 2h 22m |\n", + "| FourSquare Friendships | 2h 9m |\n", + "\n", + "Note that due to the small size of the friendships graphs, I have been able to compute the omega coefficent for the whole networks. However, for the checkins graphs, I had to take a 50% sample of the nodes. In both cases, I used `niter` and `nrand` equal to 3." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "This results are a bit of a surprise. The small-world coefficient (omega) measures how much a network is like a lattice or a random graph. Negative values mean the graph is similar to a lattice whereas positive values mean the graph is more random-like. Values close to 0 instead, should represent small-world characteristics.\n", "\n", - "Based only on this metric, we may conclude that all the networks are small-worlds. In fact, all the values of the omega coefficient are $~0.2$ (with the exception of the foursquare checkins graph, whose value is very close to $0$). However, I don't this this is the case. \n", + "Based only on this metric, we may conclude that all the networks are small-worlds. In fact, all the values of the omega coefficient are ~$0.2$ (with the exception of the foursquare checkins graph, whose value is very close to $0$). However, I don't think this is the case. \n", "\n", - "# Conclusion\n", + "We have seen in the previous section that the $\\omega$ coefficient can be tricked by networks that have a very low clustering coefficient, and in my opinion this is exactly what is happening here. The networks generated from the friendships have a very low clustering coefficient, and therefore they are biasing the $\\omega$ coefficient. This conclusion is supported by the fact the measures like the betweenness centrality and the clustering coefficient that we have shown before, suggest that the networks generated from the friendships are not small-world networks. \n", "\n", - "We have seen in the previous section that the $\\omega$ coefficient can be tricked by networks that have a very low clustering coefficient, and in my opinion this is excatly what is happening here. The networks generated from the friendships have a very low clustering coefficient, and therefore they are biasing the $\\omega$ coefficient. This conclusion is supported by the fact the measures like the betweenness centrality and the clustering coefficient that we have shown before, suggest that the networks generated from the friendships are not small-world networks. \n", + "Furthermore, on a more heuristic level, those graphs represent a social network with data taken in 2010, a time when social networks were not as popular as they are today. Therefore, I would not be surprised if those networks are not small-worlds. \n", "\n", - "Furthermore, on a more euristic level, those graphs represent a social network with data taken in 2010, a time when social networks were not as popular as they are today. Therefore, I would not be surprised if those networks were not small-world networks. \n", + "On the other hand, on a more technical level, I think that using `niter` and `nrand` equal to $3$ is not enough to reach a definitive conclusion. However, choosing bigger values would have exponentially increased the time needed to compute the $\\omega$ coefficient and reducing the number of nodes in the sample would have reduced the accuracy of the results. \n", + "\n", + "---\n", "\n", - "This study evidences why the charaterization of the small-world propriety of a real-world network is still subject of debate. Even if we have used the most reliable techniques that the literature has to offer, we still have not been able to reach a definitive conclusion." + "To summarize the work done: this study evidences why the characterization of the small-world propriety of a real-world network is still subject of debate. Even if we have used the most reliable techniques that the literature has to offer, we still have not been able to reach a definitive conclusion and specific observations on the single networks were necessary. For real networks, we still have not reached the completeness (in a metaphorical way, not topological) of the theoretical models firstly proposed in the 60s by Erdős and Rényi." ] } ], diff --git a/omega_sampled_server.py b/omega_sampled_server.py index e524afb..454a29e 100755 --- a/omega_sampled_server.py +++ b/omega_sampled_server.py @@ -31,20 +31,11 @@ if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("graph", help="Name of the graph to be used. Options are 'checkins-foursquare', 'checkins-gowalla', 'checkins-brightkite', 'friends-foursquare', 'friends-gowalla', 'friends-brightkite'") parser.add_argument("k", help="Percentage of nodes to be sampled. Needs to be a float between 0 and 1") - parser.add_argument("niter", help="Number of rewiring per edge. Needs to be an integer. Default is 5") - parser.add_argument("nrand", help="Number of random graphs. Needs to be an integer. Default is 5") + parser.add_argument("--niter", help="Number of rewiring per edge. Needs to be an integer. Default is 5", default=5) + parser.add_argument("--nrand", help="Number of random graphs. Needs to be an integer. Default is 5", default=5) parser.add_help = True args = parser.parse_args() - # if no input is given for niter and nrand, set them to default values - if args.niter == None: - print("No input for niter. Setting it to default value: 5") - args.niter = 5 - - if args.nrand == None: - print("No input for nrand. Setting it to default value: 5") - args.nrand = 5 - # the name of the graph is the first part of the input string name = args.graph.split('-')[1] if 'checkins' in args.graph: diff --git a/testing.ipynb b/testing.ipynb index 07edd25..888c4c0 100644 --- a/testing.ipynb +++ b/testing.ipynb @@ -16,809 +16,171 @@ "import pandas as pd\n", "import networkx as nx\n", "import plotly.graph_objects as go\n", - "# from utils import *\n", + "from utils import *\n", "from collections import Counter\n", "from tqdm import tqdm\n", "import time\n", "import geopandas as gpd\n", "import gdown # for downloading files from google drive\n", "import shutil\n", - "# ignore warnings\n", "import warnings\n", "import sys\n", + "from pyvis.network import Network\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 2, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
GraphNumber of NodesNumber of EdgesAverage DegreeAverage Clustering Coefficientlog NAverage Shortest Path Lengthbetweenness centrality
0Brightkite Checkins Graph649329297390.2427230.7139998.7784803.0133690.000534
1Gowalla Checkins Graph30736279040.8656040.5483728.0304103.5080310.001277
2Foursquare Checkins Graph2324246702212.308090.652737.7510452.1861120.000938
3Brightkite Friendship Graph5420146905.4206640.2185718.5978515.2318070.000664
4(Filtered) Gowalla Friendship Graph229455484.8369660.2342937.7380525.3964880.001331
5Foursquare Friendship Graph139753237.6206160.1834857.2420826.458410.001531
\n", - "
" - ], - "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", - "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 90.242723 0.713999 8.778480 \n", - "1 40.865604 0.548372 8.030410 \n", - "2 212.30809 0.65273 7.751045 \n", - "3 5.420664 0.218571 8.597851 \n", - "4 4.836966 0.234293 7.738052 \n", - "5 7.620616 0.183485 7.242082 \n", - "\n", - " Average Shortest Path Length betweenness centrality \n", - "0 3.013369 0.000534 \n", - "1 3.508031 0.001277 \n", - "2 2.186112 0.000938 \n", - "3 5.231807 0.000664 \n", - "4 5.396488 0.001331 \n", - "5 6.45841 0.001531 " - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "analysis_results = pd.read_pickle('analysis_results.pkl')\n", - "analysis_results" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
GraphNumber of NodesNumber of EdgesAverage DegreeAverage Clustering Coefficientlog NAverage Shortest Path Lengthbetweenness centralityomega-coefficient
0Brightkite Checkins Graph649329297390.2427230.7139998.7784803.0133690.000534NaN
1Gowalla Checkins Graph30736279040.8656040.5483728.0304103.5080310.001277NaN
2Foursquare Checkins Graph2324246702212.308090.652737.7510452.1861120.000938NaN
3Brightkite Friendship Graph5420146905.4206640.2185718.5978515.2318070.000664NaN
4(Filtered) Gowalla Friendship Graph229455484.8369660.2342937.7380525.3964880.001331NaN
5Foursquare Friendship Graph139753237.6206160.1834857.2420826.458410.001531NaN
\n", - "
" - ], - "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", - "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 90.242723 0.713999 8.778480 \n", - "1 40.865604 0.548372 8.030410 \n", - "2 212.30809 0.65273 7.751045 \n", - "3 5.420664 0.218571 8.597851 \n", - "4 4.836966 0.234293 7.738052 \n", - "5 7.620616 0.183485 7.242082 \n", - "\n", - " Average Shortest Path Length betweenness centrality omega-coefficient \n", - "0 3.013369 0.000534 NaN \n", - "1 3.508031 0.001277 NaN \n", - "2 2.186112 0.000938 NaN \n", - "3 5.231807 0.000664 NaN \n", - "4 5.396488 0.001331 NaN \n", - "5 6.45841 0.001531 NaN " - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "analysis_results['omega-coefficient'] = np.nan\n", - "analysis_results" + "import multiprocessing\n", + "import random\n", + "import networkx as nx\n", + "import numpy as np\n", + "import math\n", + "\n", + "def parallel_omega(G, nrand=10, seed=None):\n", + "\n", + " random.seed(seed)\n", + " if not nx.is_connected(G):\n", + " G = G.subgraph(max(nx.connected_components(G), key=len))\n", + "\n", + " if len(G) == 1:\n", + " return 0\n", + "\n", + " niter_lattice_reference = nrand\n", + " niter_random_reference = nrand * 2\n", + " \n", + " def worker(queue):\n", + " while True:\n", + " task = queue.get()\n", + " if task is None:\n", + " break\n", + " random_graph = nx.random_reference(G)\n", + " lattice_graph = nx.lattice_reference(G)\n", + " random_shortest_path = nx.average_shortest_path_length(random_graph)\n", + " lattice_clustering = nx.average_clustering(lattice_graph)\n", + " queue.put((random_shortest_path, lattice_clustering))\n", + " \n", + " n_processes = multiprocessing.cpu_count()\n", + " manager = multiprocessing.Manager()\n", + " queue = manager.Queue()\n", + " processes = [multiprocessing.Process(target=worker, args=(queue,)) for _ in range(n_processes)]\n", + " for process in processes:\n", + " process.start()\n", + " \n", + " for _ in range(nrand):\n", + " queue.put(1)\n", + " \n", + " for _ in range(n_processes):\n", + " queue.put(None)\n", + " \n", + " for process in processes:\n", + " process.join()\n", + " \n", + " shortest_paths = []\n", + " clustering_coeffs = []\n", + " while not queue.empty():\n", + " random_shortest_path, lattice_clustering = queue.get()\n", + " shortest_paths.append(random_shortest_path)\n", + " clustering_coeffs.append(lattice_clustering)\n", + " \n", + " L = nx.average_shortest_path_length(G)\n", + " C = nx.average_clustering(G)\n", + "\n", + " # kill the process\n", + " for process in processes:\n", + " process.terminate()\n", + " process.join()\n", + "\n", + " omega = (np.mean(shortest_paths) / L) - (C / np.mean(clustering_coeffs))\n", + "\n", + "\n", + " return omega" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 3, "metadata": {}, "outputs": [ { "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
GraphNumber of NodesNumber of EdgesAverage DegreeAverage Clustering Coefficientlog NAverage Shortest Path Lengthbetweenness centralityomega-coefficient
0Brightkite Checkins Graph649329297390.2427230.7139998.7784803.0133690.000534NaN
1Gowalla Checkins Graph30736279040.8656040.5483728.0304103.5080310.001277NaN
2Foursquare Checkins Graph2324246702212.308090.652737.7510452.1861120.000938NaN
3Brightkite Friendship Graph5420146905.4206640.2185718.5978515.2318070.000664NaN
4(Filtered) Gowalla Friendship Graph229455484.8369660.2342937.7380525.3964880.001331NaN
5Foursquare Friendship Graph139753237.6206160.1834857.2420826.458410.001531NaN
\n", - "
" - ], "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", - "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 90.242723 0.713999 8.778480 \n", - "1 40.865604 0.548372 8.030410 \n", - "2 212.30809 0.65273 7.751045 \n", - "3 5.420664 0.218571 8.597851 \n", - "4 4.836966 0.234293 7.738052 \n", - "5 7.620616 0.183485 7.242082 \n", - "\n", - " Average Shortest Path Length betweenness centrality omega-coefficient \n", - "0 3.013369 0.000534 NaN \n", - "1 3.508031 0.001277 NaN \n", - "2 2.186112 0.000938 NaN \n", - "3 5.231807 0.000664 NaN \n", - "4 5.396488 0.001331 NaN \n", - "5 6.45841 0.001531 NaN " + "'Graph with 200 nodes and 584 edges'" ] }, - "execution_count": 16, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# rename (Filtered) Gowalla Friendship Graph in Gowalla Friendship Graph\n", - "analysis_results.loc[analysis_results['Graph'] == 'Filtered Gowalla Friendship Graph', 'Graph'] = 'Gowalla Friendship Graph'\n", - "analysis_results" + "G = nx.erdos_renyi_graph(200, 0.03)\n", + "G = G.subgraph(max(nx.connected_components(G), key=len))\n", + "nx.info(G)" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 5, "metadata": {}, "outputs": [ { "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
GraphNumber of NodesNumber of EdgesAverage DegreeAverage Clustering Coefficientlog NAverage Shortest Path Lengthbetweenness centralityomega-coefficient
0Brightkite Checkins Graph649329297390.2427230.7139998.7784803.0133690.000534-0.180
1Gowalla Checkins Graph30736279040.8656040.5483728.0304103.5080310.001277-0.240
2Foursquare Checkins Graph2324246702212.308090.652737.7510452.1861120.000938-0.056
3Brightkite Friendship Graph5420146905.4206640.2185718.5978515.2318070.000664NaN
4(Filtered) Gowalla Friendship Graph229455484.8369660.2342937.7380525.3964880.001331NaN
5Foursquare Friendship Graph139753237.6206160.1834857.2420826.458410.001531NaN
\n", - "
" - ], "text/plain": [ - " Graph Number of Nodes Number of Edges \\\n", - "0 Brightkite Checkins Graph 6493 292973 \n", - "1 Gowalla Checkins Graph 3073 62790 \n", - "2 Foursquare Checkins Graph 2324 246702 \n", - "3 Brightkite Friendship Graph 5420 14690 \n", - "4 (Filtered) Gowalla Friendship Graph 2294 5548 \n", - "5 Foursquare Friendship Graph 1397 5323 \n", - "\n", - " Average Degree Average Clustering Coefficient log N \\\n", - "0 90.242723 0.713999 8.778480 \n", - "1 40.865604 0.548372 8.030410 \n", - "2 212.30809 0.65273 7.751045 \n", - "3 5.420664 0.218571 8.597851 \n", - "4 4.836966 0.234293 7.738052 \n", - "5 7.620616 0.183485 7.242082 \n", - "\n", - " Average Shortest Path Length betweenness centrality omega-coefficient \n", - "0 3.013369 0.000534 -0.180 \n", - "1 3.508031 0.001277 -0.240 \n", - "2 2.186112 0.000938 -0.056 \n", - "3 5.231807 0.000664 NaN \n", - "4 5.396488 0.001331 NaN \n", - "5 6.45841 0.001531 NaN " + "0.6776975801779451" ] }, - "execution_count": 18, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Foursquare Checkins Graph : -0.056\n", - "# Gowalla Checkins Graph : -0.24\n", - "# Brightkite Checkins Graph : -0.18\n", - "\n", - "# add omega-coefficient to the respective graphs\n", - "analysis_results.loc[analysis_results['Graph'] == 'Foursquare Checkins Graph', 'omega-coefficient'] = -0.056\n", - "analysis_results.loc[analysis_results['Graph'] == 'Gowalla Checkins Graph', 'omega-coefficient'] = -0.24\n", - "analysis_results.loc[analysis_results['Graph'] == 'Brightkite Checkins Graph', 'omega-coefficient'] = -0.18\n", - "analysis_results" + "omega = parallel_omega(G, nrand=10, seed=42)\n", + "omega" ] }, { "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "# rename (Filtered) Gowalla Friendship Graph in Gowalla Friendship Graph\n", - "analysis_results.loc[analysis_results['Graph'] == '(Filtered) Gowalla Friendship Graph', 'Graph'] = 'Gowalla Friendship Graph'" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "# FourSquare Friendship Graph : -0.17\n", - "# Gowalla Friendship Graph : -0.25\n", - "# Brightkite Friendship Graph : -0.20\n", - "\n", - "# add omega-coefficient to the respective graphs\n", - "analysis_results.loc[analysis_results['Graph'] == 'Foursquare Friendship Graph', 'omega-coefficient'] = -0.17\n", - "analysis_results.loc[analysis_results['Graph'] == 'Gowalla Friendship Graph', 'omega-coefficient'] = -0.25\n", - "analysis_results.loc[analysis_results['Graph'] == 'Brightkite Friendship Graph', 'omega-coefficient'] = -0.20" - ] - }, - { - "cell_type": "code", - "execution_count": 27, + "execution_count": 4, "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
GraphNumber of NodesNumber of EdgesAverage DegreeAverage Clustering Coefficientlog NAverage Shortest Path Lengthbetweenness centralityomega-coefficient
0Brightkite Checkins Graph649329297390.2427230.7139998.7784803.0133690.000534-0.180
1Gowalla Checkins Graph30736279040.8656040.5483728.0304103.5080310.001277-0.240
2Foursquare Checkins Graph2324246702212.308090.652737.7510452.1861120.000938-0.056
3Brightkite Friendship Graph5420146905.4206640.2185718.5978515.2318070.000664-0.200
4Gowalla Friendship Graph229455484.8369660.2342937.7380525.3964880.001331-0.250
5Foursquare Friendship Graph139753237.6206160.1834857.2420826.458410.001531-0.170
\n", - "
" - ], - "text/plain": [ - " Graph Number of Nodes Number of Edges Average Degree \\\n", - "0 Brightkite Checkins Graph 6493 292973 90.242723 \n", - "1 Gowalla Checkins Graph 3073 62790 40.865604 \n", - "2 Foursquare Checkins Graph 2324 246702 212.30809 \n", - "3 Brightkite Friendship Graph 5420 14690 5.420664 \n", - "4 Gowalla Friendship Graph 2294 5548 4.836966 \n", - "5 Foursquare Friendship Graph 1397 5323 7.620616 \n", - "\n", - " Average Clustering Coefficient log N Average Shortest Path Length \\\n", - "0 0.713999 8.778480 3.013369 \n", - "1 0.548372 8.030410 3.508031 \n", - "2 0.65273 7.751045 2.186112 \n", - "3 0.218571 8.597851 5.231807 \n", - "4 0.234293 7.738052 5.396488 \n", - "5 0.183485 7.242082 6.45841 \n", - "\n", - " betweenness centrality omega-coefficient \n", - "0 0.000534 -0.180 \n", - "1 0.001277 -0.240 \n", - "2 0.000938 -0.056 \n", - "3 0.000664 -0.200 \n", - "4 0.001331 -0.250 \n", - "5 0.001531 -0.170 " - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" + "ename": "KeyboardInterrupt", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[4], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m standard_omega \u001b[39m=\u001b[39m nx\u001b[39m.\u001b[39;49momega(G, nrand\u001b[39m=\u001b[39;49m\u001b[39m10\u001b[39;49m, seed\u001b[39m=\u001b[39;49m\u001b[39m42\u001b[39;49m)\n\u001b[1;32m 2\u001b[0m standard_omega\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/utils/decorators.py:845\u001b[0m, in \u001b[0;36margmap.__call__..func\u001b[0;34m(_argmap__wrapper, *args, **kwargs)\u001b[0m\n\u001b[1;32m 844\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mfunc\u001b[39m(\u001b[39m*\u001b[39margs, __wrapper\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[0;32m--> 845\u001b[0m \u001b[39mreturn\u001b[39;00m argmap\u001b[39m.\u001b[39;49m_lazy_compile(__wrapper)(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", + "File \u001b[0;32m compilation 14:6\u001b[0m, in \u001b[0;36margmap_omega_9\u001b[0;34m(G, niter, nrand, seed)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39minspect\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39mitertools\u001b[39;00m\n\u001b[0;32m----> 6\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39mre\u001b[39;00m\n\u001b[1;32m 7\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mcollections\u001b[39;00m \u001b[39mimport\u001b[39;00m defaultdict\n\u001b[1;32m 8\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mcontextlib\u001b[39;00m \u001b[39mimport\u001b[39;00m contextmanager\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/smallworld.py:367\u001b[0m, in \u001b[0;36momega\u001b[0;34m(G, niter, nrand, seed)\u001b[0m\n\u001b[1;32m 363\u001b[0m niter_random_reference \u001b[39m=\u001b[39m niter \u001b[39m*\u001b[39m \u001b[39m2\u001b[39m\n\u001b[1;32m 365\u001b[0m \u001b[39mfor\u001b[39;00m _ \u001b[39min\u001b[39;00m \u001b[39mrange\u001b[39m(nrand):\n\u001b[1;32m 366\u001b[0m \u001b[39m# Generate random graph\u001b[39;00m\n\u001b[0;32m--> 367\u001b[0m Gr \u001b[39m=\u001b[39m random_reference(G, niter\u001b[39m=\u001b[39;49mniter_random_reference, seed\u001b[39m=\u001b[39;49mseed)\n\u001b[1;32m 368\u001b[0m randMetrics[\u001b[39m\"\u001b[39m\u001b[39mL\u001b[39m\u001b[39m\"\u001b[39m]\u001b[39m.\u001b[39mappend(nx\u001b[39m.\u001b[39maverage_shortest_path_length(Gr))\n\u001b[1;32m 370\u001b[0m \u001b[39m# Generate lattice graph\u001b[39;00m\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/utils/decorators.py:845\u001b[0m, in \u001b[0;36margmap.__call__..func\u001b[0;34m(_argmap__wrapper, *args, **kwargs)\u001b[0m\n\u001b[1;32m 844\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mfunc\u001b[39m(\u001b[39m*\u001b[39margs, __wrapper\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[0;32m--> 845\u001b[0m \u001b[39mreturn\u001b[39;00m argmap\u001b[39m.\u001b[39;49m_lazy_compile(__wrapper)(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", + "File \u001b[0;32m compilation 24:6\u001b[0m, in \u001b[0;36margmap_random_reference_19\u001b[0;34m(G, niter, connectivity, seed)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39minspect\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39mitertools\u001b[39;00m\n\u001b[0;32m----> 6\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39mre\u001b[39;00m\n\u001b[1;32m 7\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mcollections\u001b[39;00m \u001b[39mimport\u001b[39;00m defaultdict\n\u001b[1;32m 8\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mcontextlib\u001b[39;00m \u001b[39mimport\u001b[39;00m contextmanager\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/smallworld.py:100\u001b[0m, in \u001b[0;36mrandom_reference\u001b[0;34m(G, niter, connectivity, seed)\u001b[0m\n\u001b[1;32m 97\u001b[0m G\u001b[39m.\u001b[39mremove_edge(c, d)\n\u001b[1;32m 99\u001b[0m \u001b[39m# Check if the graph is still connected\u001b[39;00m\n\u001b[0;32m--> 100\u001b[0m \u001b[39mif\u001b[39;00m connectivity \u001b[39mand\u001b[39;00m local_conn(G, a, b) \u001b[39m==\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[1;32m 101\u001b[0m \u001b[39m# Not connected, revert the swap\u001b[39;00m\n\u001b[1;32m 102\u001b[0m G\u001b[39m.\u001b[39mremove_edge(a, d)\n\u001b[1;32m 103\u001b[0m G\u001b[39m.\u001b[39mremove_edge(c, b)\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/connectivity/connectivity.py:649\u001b[0m, in \u001b[0;36mlocal_edge_connectivity\u001b[0;34m(G, s, t, flow_func, auxiliary, residual, cutoff)\u001b[0m\n\u001b[1;32m 646\u001b[0m \u001b[39melif\u001b[39;00m flow_func \u001b[39mis\u001b[39;00m boykov_kolmogorov:\n\u001b[1;32m 647\u001b[0m kwargs[\u001b[39m\"\u001b[39m\u001b[39mcutoff\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m cutoff\n\u001b[0;32m--> 649\u001b[0m \u001b[39mreturn\u001b[39;00m nx\u001b[39m.\u001b[39;49mmaximum_flow_value(H, s, t, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/flow/maxflow.py:307\u001b[0m, in \u001b[0;36mmaximum_flow_value\u001b[0;34m(flowG, _s, _t, capacity, flow_func, **kwargs)\u001b[0m\n\u001b[1;32m 304\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m callable(flow_func):\n\u001b[1;32m 305\u001b[0m \u001b[39mraise\u001b[39;00m nx\u001b[39m.\u001b[39mNetworkXError(\u001b[39m\"\u001b[39m\u001b[39mflow_func has to be callable.\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m--> 307\u001b[0m R \u001b[39m=\u001b[39m flow_func(flowG, _s, _t, capacity\u001b[39m=\u001b[39;49mcapacity, value_only\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m 309\u001b[0m \u001b[39mreturn\u001b[39;00m R\u001b[39m.\u001b[39mgraph[\u001b[39m\"\u001b[39m\u001b[39mflow_value\u001b[39m\u001b[39m\"\u001b[39m]\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/flow/edmondskarp.py:237\u001b[0m, in \u001b[0;36medmonds_karp\u001b[0;34m(G, s, t, capacity, residual, value_only, cutoff)\u001b[0m\n\u001b[1;32m 120\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39medmonds_karp\u001b[39m(\n\u001b[1;32m 121\u001b[0m G, s, t, capacity\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mcapacity\u001b[39m\u001b[39m\"\u001b[39m, residual\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, value_only\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m, cutoff\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m\n\u001b[1;32m 122\u001b[0m ):\n\u001b[1;32m 123\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"Find a maximum single-commodity flow using the Edmonds-Karp algorithm.\u001b[39;00m\n\u001b[1;32m 124\u001b[0m \n\u001b[1;32m 125\u001b[0m \u001b[39m This function returns the residual network resulting after computing\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 235\u001b[0m \n\u001b[1;32m 236\u001b[0m \u001b[39m \"\"\"\u001b[39;00m\n\u001b[0;32m--> 237\u001b[0m R \u001b[39m=\u001b[39m edmonds_karp_impl(G, s, t, capacity, residual, cutoff)\n\u001b[1;32m 238\u001b[0m R\u001b[39m.\u001b[39mgraph[\u001b[39m\"\u001b[39m\u001b[39malgorithm\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39m\"\u001b[39m\u001b[39medmonds_karp\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 239\u001b[0m \u001b[39mreturn\u001b[39;00m R\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/flow/edmondskarp.py:104\u001b[0m, in \u001b[0;36medmonds_karp_impl\u001b[0;34m(G, s, t, capacity, residual, cutoff)\u001b[0m\n\u001b[1;32m 101\u001b[0m \u001b[39mraise\u001b[39;00m nx\u001b[39m.\u001b[39mNetworkXError(\u001b[39m\"\u001b[39m\u001b[39msource and sink are the same node\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 103\u001b[0m \u001b[39mif\u001b[39;00m residual \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m--> 104\u001b[0m R \u001b[39m=\u001b[39m build_residual_network(G, capacity)\n\u001b[1;32m 105\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 106\u001b[0m R \u001b[39m=\u001b[39m residual\n", + "File \u001b[0;32m/usr/lib/python3.10/site-packages/networkx/algorithms/flow/utils.py:139\u001b[0m, in \u001b[0;36mbuild_residual_network\u001b[0;34m(G, capacity)\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m R\u001b[39m.\u001b[39mhas_edge(u, v):\n\u001b[1;32m 136\u001b[0m \u001b[39m# Both (u, v) and (v, u) must be present in the residual\u001b[39;00m\n\u001b[1;32m 137\u001b[0m \u001b[39m# network.\u001b[39;00m\n\u001b[1;32m 138\u001b[0m R\u001b[39m.\u001b[39madd_edge(u, v, capacity\u001b[39m=\u001b[39mr)\n\u001b[0;32m--> 139\u001b[0m R\u001b[39m.\u001b[39;49madd_edge(v, u, capacity\u001b[39m=\u001b[39;49m\u001b[39m0\u001b[39;49m)\n\u001b[1;32m 140\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 141\u001b[0m \u001b[39m# The edge (u, v) was added when (v, u) was visited.\u001b[39;00m\n\u001b[1;32m 142\u001b[0m R[u][v][\u001b[39m\"\u001b[39m\u001b[39mcapacity\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m r\n", + "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + ] } ], "source": [ - "analysis_results\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# save the results into a pickle file\n", - "analysis_results.to_pickle('analysis_results.pkl')" + "standard_omega = nx.omega(G, nrand=10, seed=42)\n", + "standard_omega" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3.10.8 64-bit", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -832,7 +194,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]" + "version": "3.10.9" }, "orig_nbformat": 4, "vscode": { diff --git a/utils.py b/utils.py index 184c6f1..31b89cf 100755 --- a/utils.py +++ b/utils.py @@ -23,6 +23,7 @@ import numpy as np import gdown from networkx.utils import py_random_state import shutil +from pyvis.network import Network # ------------------------------------------------------------------------# @@ -100,21 +101,15 @@ def download_datasets(): shutil.rmtree(os.path.join("data", "foursquare", "dataset_WWW2019")) shutil.rmtree(os.path.join("data", "foursquare", "__MACOSX")) - os.rename(os.path.join("data", "foursquare", "dataset_WWW_friendship_new.txt"), os.path.join("data", "foursquare", "foursquare_friends_edges.txt")) - - os.rename(os.path.join("data", "foursquare", "dataset_WWW_Checkins_anonymized.txt"), os.path.join("data", "foursquare", "foursquare_checkins.txt")) + os.rename(os.path.join("data", "foursquare", "dataset_WWW_Checkins_anonymized.txt"), os.path.join("data", "foursquare", "foursquare_checkins_full.txt")) ## BRIGHTKITE CLEANING ## - - os.rename(os.path.join("data", "brightkite", "loc-brightkite_totalCheckins.txt"), os.path.join("data", "brightkite", "brightkite_checkins.txt")) - + os.rename(os.path.join("data", "brightkite", "loc-brightkite_totalCheckins.txt"), os.path.join("data", "brightkite", "brightkite_checkins_full.txt")) os.rename(os.path.join("data", "brightkite", "loc-brightkite_edges.txt"), os.path.join("data", "brightkite", "brightkite_friends_edges.txt")) ## GOWALLA CLEANING ## - - os.rename(os.path.join("data", "gowalla", "loc-gowalla_totalCheckins.txt"), os.path.join("data", "gowalla", "gowalla_checkins.txt")) - + os.rename(os.path.join("data", "gowalla", "loc-gowalla_totalCheckins.txt"), os.path.join("data", "gowalla", "gowalla_checkins_full.txt")) os.rename(os.path.join("data", "gowalla", "loc-gowalla_edges.txt"), os.path.join("data", "gowalla", "gowalla_friends_edges.txt")) # ------------------------------------------------------------------------# @@ -392,7 +387,7 @@ def average_shortest_path(G: nx.Graph, k=None) -> float: ---------- `G` : networkx graph The graph to compute the average shortest path length of. - `k` : int + `k` : float percentage of nodes to remove from the graph. If k is None, the average shortest path length of each connected component is computed using all the nodes of the connected component. Returns @@ -548,3 +543,90 @@ def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph: print("\tThe file graph has been saved in the folder data/random/watts_strogatz with the syntax watts_strogatz_n_nodes_n_edges.gpickle") return G_random + + +def visualize_graphs(G: nx.Graph, k: float, connected = True): + + """ + Function to visualize the graph in a HTML page using pyvis + + Parameters + ---------- + G: nx.Graph + The graph to visualize + + k: float + The percentage of nodes to remove from the graph. Default is None, in which case it will be chosen such that there are about 1000 nodes in the sampled graph. I strongly suggest to use the default value, other wise the visualization will be very slow. + + connected: bool + If True, we will consider only the largest connected component of the graph + + Returns + ------- + html file + The html file containing the visualization of the graph + + Notes: + ------ + This is of course an approximation, it's nice to have an idea of the graph, but it's not a good idea trying to understand the graph in details from this sampled visualization. + """ + + if k is None: + if len(G.nodes) > 1500: + k = 1 - 1500/len(G.nodes) + else: + k = 0 + + # remove a percentage of the nodes + nodes_to_remove = np.random.choice(list(G.nodes), size=int(k*len(G.nodes)), replace=False) + G.remove_nodes_from(nodes_to_remove) + + if connected: + # take only the largest connected component + connected_components = list(nx.connected_components(G)) + largest_connected_component = max(connected_components, key=len) + G = G.subgraph(largest_connected_component) + + + # create a networkx graph + net = net = Network(directed=False, bgcolor='#1e1f29', font_color='white') + + # for some reasons, if I put % values, the graph is not displayed correctly. So I use pixels, sorry non FHD users + net.width = '1920px' + net.height = '1080px' + + # add nodes and edges + net.add_nodes(list(G.nodes)) + net.add_edges(list(G.edges)) + + # set the physics layout of the network + net.set_options(""" + var options = { + "edges": { + "color": { + "inherit": true + }, + "smooth": false + }, + "physics": { + "repulsion": { + "centralGravity": 0.25, + "nodeDistance": 500, + "damping": 0.67 + }, + "maxVelocity": 48, + "minVelocity": 0.39, + "solver": "repulsion" + } + } + """) + + name = G.name.replace(" ", "_").lower() + + if not os.path.exists("html_graphs"): + os.mkdir("html_graphs") + + # save the graph in a html file + net.show("html_graphs/{}.html".format(name)) + + print("The graph has been saved in the folder html_graphs with the name {}.html" .format(name))