diff --git a/main.ipynb b/main.ipynb index 9f9d293..a8bce4a 100644 --- a/main.ipynb +++ b/main.ipynb @@ -1288,9 +1288,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As we can see, there is a clear difference between the betweenness centrality of the networks generated from the checkins and the networks generated from the friendships. Since the values of the betweenness centrality of the networks generated from the checkins are higher than the theoretical values of a random network, we can conclude that the networks generated from the checkins are more likely to be a small-world network. On the other hand, the networks generated from the friendships have a lower value of betweenness centrality than the theoretical values of a random network, therefore we can conclude that the networks generated from the friendships are less likely to be a small-world network.\n", + "We can observe a marked distinction between the betweenness centrality of the networks generated from checkins and the networks generated from friendships. As the values of the betweenness centrality of the networks generated from checkins surpass the theoretical values of a random network, we can deduce that these networks generated from checkins exhibit a higher probability of being a small-world network. Conversely, the networks generated from friendships display a lower value of betweenness centrality than the theoretical values of a random network, thus indicating a lower likelihood of being a small-world network.\n", "\n", - "This propriety appears both with the erdos-renyi and the watts-strogatz models." + "This property is consistent across both the Erdös-Rényi and Watts-Strogatz models." ] }, { @@ -1300,24 +1300,19 @@ "source": [ "## Clustering coefficient\n", "\n", - "The simplest way `[5]` to treat clustering analytically in a small-world network is to use the link addition, rather than the rewiring model. In the limit of large network size, $N \\to \\infty$, and for a fixed fraction of shortcuts $\\phi$, it is clear that the probability of forming triangle vanishes as we approach $1/N$, so the contribution of the shortcuts to the clustering is negligible. Therefore, the clustering of a small-world network is determined by its underlying ordered lattice. For example, consider a ring where each node is connected to its $k$ closest neighbors from each side. A node's number of neighbors is therefore $2k$, and thus it has $2k(2k - 1)/2 = k(2k - 1)$ pairs of neighbors. Consider a node, $i$. All of the $k$ nearest nodes on $i$'s left are connected to each other, and the same is true for the nodes on $i$'s right. This amounts to $2k(k - 1)/2 = k(k - 1)$ pairs. Now consider a node located $d$ places to the left of $k$. It is also connected to its $k$ nearest neighbors from each side. Therefore, it will be connected to $k - d$ neighbors on $i$'s right side. The total number of connected neighbor pairs is\n", + "The simplest `[5]` analytical treatment of clustering in a small-world network involves the use of the link addition model, as opposed to the rewiring model. As the network size $N$ approaches infinity and the fraction of shortcuts $\\phi$ remains fixed, the probability of forming a triangle vanishes and the contribution of shortcuts to the clustering becomes negligible. As a result, the clustering of a small-world network is determined by its underlying ordered lattice.\n", "\n", - "\\begin{equation}\n", - " k(k-1) + \\sum_{d=1}^k (k-d) = k(k-1) + \\frac{k(k-1)}{2} = \\frac{3}{2} k (k-1)\n", - "\\end{equation}\n", - "\n", - "and the clustering coefficient is:\n", + "As an example, consider a ring network where each node is connected to its $k$ closest neighbors on each side. The number of neighbors for a node is $2k$, resulting in $k(2k-1)$ pairs of neighbors. If we consider a node $i$, the $k$ nearest nodes on $i$'s left are connected to each other, as are the nodes on $i$'s right. This results in $k(k-1)$ pairs. If we consider a node located $d$ places to the left of node $i$, it will be connected to $k-d$ neighbors on $i$'s right side. The total number of connected neighbor pairs is $\\frac{3}{2} k (k-1)$.\n", "\n", - "\\begin{equation}\n", - " C = \\frac{\\frac{3}{2}k(k-1)}{k(2k-1)} =\\frac{3 (k-1)}{2(2k-1)}\n", - "\\end{equation}\n", + "Therefore, the clustering coefficient $C$ can be calculated as follows:\n", "\n", - "For every $k > 1$, this results in a constant larger than $0$, indicating that the clustering of a small-world network does not vanish for large networks. For large values of $k$, the clustering coefficient approaches $3/4$, that is, the clustering is very high. Note that for a regular two-dimensional grid, the clustering by definition is zero, since no triangles exist. However, it is clear that the grid has a neighborhood structure. `[2]`\n", + "$$ C = \\frac{\\frac{3}{2}k(k-1)}{k(2k-1)} =\\frac{3 (k-1)}{2(2k-1)} $$\n", "\n", + "For all values of $k > 1$, this results in a constant larger than $0$, indicating that the clustering of a small-world network does not vanish for large networks. For large values of $k$, the clustering coefficient approaches $\\frac{3}{4}$, signifying a very high clustering. It is important to note that for a regular two-dimensional grid, the clustering is defined as zero, since no triangles exist. However, the grid still has a neighborhood structure.\n", "\n", - "--- \n", + "---\n", "\n", - "We can compare the results of the clustering coefficient that we obtained with the standard formula, and the one that we obtained with the formula above. We can do that with the function `generalized_average_clustering_coefficient` in the `utils.py` file. The function takes as input a networkx graph object and returns a float: the average clustering coefficient of the graph." + "We can compare the results of the clustering coefficient obtained through the standard formula and the formula derived above by utilizing the function `generalized_average_clustering_coefficient` in the `utils` module. This function takes a networkx graph object as input and returns the average clustering coefficient of the graph as a float." ] }, { @@ -1359,7 +1354,7 @@ "plt.legend()\n", "\n", "plt.tight_layout()\n", - "plt.show()\n" + "plt.show()" ] }, { @@ -1367,9 +1362,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As we can see, for the graphs generated from the checkins, the two values are very similar. However, for the graphs generated from the friendships, the values are very different. This is another suggestion that the checkins graphs are more likely to be a small-world network than the friendships graphs. \n", + "As evident, for the graphs derived from the checkins, the two values are nearly identical. Conversely, for the graphs derived from the friendships, the values differ significantly. This serves as further evidence that the checkins graphs exhibit a high likelihood of being a small-world network, compared to the friendships graphs.\n", "\n", - "But this is not enough to jump to conclusions" + "However, this alone is insufficient to arrive at a definitive conclusion." ] }, { @@ -1377,7 +1372,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Conclusion: Omega coefficient\n", + "## Conclusions and the omega coefficient\n", "\n", "We have already discussed a lot in the previous sections about this measure, let's see the results that we obtained after days of computations on the server:" ] @@ -1396,7 +1391,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To give you a better idea of how time consuming is this computation, I will report below the time that it took to compute the omega coefficient for the networks generated from all this networks:\n", + "In order to illustrate the time-intensity of the Omega Coefficient computation, I present the computation time for the networks generated from all networks. This results are obtained by running the `omega_sampled_server.py` function with `niter` and `nrand` equal to 3. Take into account that this is not the parallelized version!\n", "\n", "\n", "\n", @@ -1409,7 +1404,7 @@ "| Gowalla Friendships | 2h 22m |\n", "| FourSquare Friendships | 2h 9m |\n", "\n", - "Note that due to the small size of the friendships graphs, I have been able to compute the omega coefficent for the whole networks. However, for the checkins graphs, I had to take a 50% sample of the nodes. In both cases, I used `niter` and `nrand` equal to 3." + "It is noteworthy that, due to the compact size of the friendship graphs, I successfully calculated the Omega Coefficient for the entire networks. On the other hand, for the checkins graphs, I resorted to a 50% sample of the nodes. In both cases, I set `niter` and `nrand` to 3." ] }, { @@ -1419,19 +1414,20 @@ "source": [ "---\n", "\n", - "This results are a bit of a surprise. The small-world coefficient (omega) measures how much a network is like a lattice or a random graph. Negative values mean the graph is similar to a lattice whereas positive values mean the graph is more random-like. Values close to 0 instead, should represent small-world characteristics.\n", + "The results of the Omega Coefficient computation are somewhat surprising. As stated previously the small-world coefficient (Omega) assesses the extent to which a network resembles a lattice or a random graph. Negative values indicate that the graph is similar to a lattice, while positive values suggest that the graph is more random-like. Values close to 0 should represent small-world characteristics.\n", + "\n", + "Based solely on this metric, one may infer that all the studied networks exhibit small-world characteristics. In fact, all the values of the omega coefficient are approximately $0.2$ (with the exception of the Foursquare Checkins graph, whose value is very close to 0). Nonetheless, this conclusion requires further scrutiny.\n", "\n", - "Based only on this metric, we may conclude that all the networks are small-worlds. In fact, all the values of the omega coefficient are ~$0.2$ (with the exception of the foursquare checkins graph, whose value is very close to $0$). However, I don't think this is the case. \n", + "As demonstrated in the preceding section, the Omega Coefficient can be misleading for networks with low clustering coefficients. In this case, the networks generated from friendships exhibit a low clustering coefficient and, as a result, bias the Omega Coefficient. This hypothesis of mine is reinforced by the results of other measures such as the Betweenness Centrality and the Clustering Coefficient, which suggest that the networks generated from friendships are not small-world networks.\n", "\n", - "We have seen in the previous section that the $\\omega$ coefficient can be tricked by networks that have a very low clustering coefficient, and in my opinion this is exactly what is happening here. The networks generated from the friendships have a very low clustering coefficient, and therefore they are biasing the $\\omega$ coefficient. This conclusion is supported by the fact the measures like the betweenness centrality and the clustering coefficient that we have shown before, suggest that the networks generated from the friendships are not small-world networks. \n", + "Moreover, the graphs represent a social network in 2010, a time when social networks were not as prevalent as they are today. Hence, it is plausible that these networks are not small-worlds. I would have no problems believing that it's more likely to have visited the same place of a random person on the network, rather then being friends with them. \n", "\n", - "Furthermore, on a more heuristic level, those graphs represent a social network with data taken in 2010, a time when social networks were not as popular as they are today. Therefore, I would not be surprised if those networks are not small-worlds. \n", + "From a technical standpoint, using `niter` and `nrand` equal to 3 is insufficient to reach a definitive conclusion. However, increasing the values would have significantly prolonged the computation time of the Omega Coefficient and reducing the number of nodes in the sample would have reduced the accuracy of the results.\n", "\n", - "On the other hand, on a more technical level, I think that using `niter` and `nrand` equal to $3$ is not enough to reach a definitive conclusion. However, choosing bigger values would have exponentially increased the time needed to compute the $\\omega$ coefficient and reducing the number of nodes in the sample would have reduced the accuracy of the results. \n", "\n", "---\n", "\n", - "To summarize the work done: this study evidences why the characterization of the small-world propriety of a real-world network is still subject of debate. Even if we have used the most reliable techniques that the literature has to offer, we still have not been able to reach a definitive conclusion and specific observations on the single networks were necessary. For real networks, we still have not reached the completeness (in a metaphorical way, not topological) of the theoretical models firstly proposed in the 60s by Erdős and Rényi." + "This study highlights the challenges in characterizing the small-world property of real-world networks. Despite utilizing established techniques, the results may still be inconclusive. In such cases, it is necessary to examine the specific properties of individual networks. This is exemplified by the Omega Coefficient, which can be influenced by unique characteristics of real-world networks, rendering the general results unreliable. In contrast, the results obtained from random networks are more predictable and reliable, as they do not exhibit the same complexities as real-world networks." ] }, {