small fixex, but more problem created

main
Luca Lombardo 2 years ago
parent 303ad5ea0d
commit b429ea34e5

@ -0,0 +1,73 @@
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <utility>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/small_world_generator.hpp>
#include <boost/random/linear_congruential.hpp>
using namespace std;
using namespace boost;
typedef adjacency_list<vecS, vecS, undirectedS, no_property, no_property> Graph;
typedef small_world_iterator<minstd_rand, Graph> SWGen;
vector<pair<int, int>> lattice_reference(const string& edge_list_file, int niter, bool connectivity) {
vector<pair<int, int>> edges;
int num_nodes = 0;
// Read in the edge list from the input file
ifstream in(edge_list_file);
string line;
while (getline(in, line)) {
int u, v;
sscanf(line.c_str(), "%d\t%d", &u, &v);
edges.emplace_back(u, v);
num_nodes = max(num_nodes, max(u, v));
}
// Construct the graph from the edge list
Graph g(edges.begin(), edges.end(), num_nodes + 1);
// Create the small-world generator
minstd_rand gen;
SWGen sw_gen(g, niter);
// Generate the lattice reference and store the resulting edge list
vector<pair<int, int>> lattice_edges;
for (int i = 0; i < num_nodes; ++i) {
auto [u, v] = *sw_gen;
lattice_edges.emplace_back(u, v);
++sw_gen;
}
// convert the vector of pairs in a .tsv file called "lattice_reference.tsv"
ofstream out("lattice_reference.tsv");
for (const auto& [u, v] : lattice_edges) {
out << u << "\t" << v << endl;
}
// return the vector of pairs
return lattice_edges;
}
// main
int main(int argc, char* argv[]) {
if (argc != 4) {
cerr << "Usage: " << argv[0] << " <edge_list_file> <niter> <connectivity>" << endl;
return 1;
}
string edge_list_file = argv[1];
int niter = atoi(argv[2]);
bool connectivity = atoi(argv[3]);
lattice_reference(edge_list_file, niter, connectivity);
return 0;
}

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

@ -790,158 +790,19 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Small-World Model"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ## The Small-World Model\n",
"\n",
"It should be clarified that real networks are not random. Their formation and development are dictated by a combination of many different processes and influences. These influencing conditions include natural limitations and processes, human considerations such as optimal performance and robustness, economic considerations, natural selection and many others. Controversies still exist regarding the measure to which random models represent real-world networks. However, in this chapter we will focus on random network models and attempt to show if their properties may still be used to study properties of real-world networks. \n",
"\n",
"Many real-world networks have many properties that cannot be explained by the ER model. One such property is the high clustering observed in many real-world networks. This led Watts and Strogatz to develop an alternative model, called the “small-world” model. Their idea was to begin with an ordered lattice, such as the \\emph{k-}ring (a ring where each site is connected to its $2k$ nearest neighbors - k from each side) or the two-dimensional lattice. For each site, each of the links emanating from it is removed with probability $\\varphi$ and is rewired to a randomly selected site in the network. A variant of this process is to add links rather than rewire, which simplifies the analysis without considerably affecting the results. The obtained network has the desirable properties of both an ordered lattice (large clustering) and a random network (small world), as we will discuss below.\n",
"\n",
"\n",
"## Clustering in a small-world network\n",
"\n",
"The simplest way to treat clustering analytically in a small-world network is to use the link addition, rather than the rewiring model. In the limit of large network size, $N \\to \\infty$, and for a fixed fraction of shortcuts $\\phi$, it is clear that the probability of forming triangle vanishes as we approach $1/N$, so the contribution of the shortcuts to the clustering is negligible. Therefore, the clustering of a small-world network is determined by its underlying ordered lattice. For example, consider a ring where each node is connected to its $k$ closest neighbors from each side. A node's number of neighbors is therefore $2k$, and thus it has $2k(2k - 1)/2 = k(2k - 1)$ pairs of neighbors. Consider a node, $i$. All of the $k$ nearest nodes on $i$'s left are connected to each other, and the same is true for the nodes on $i$'s right. This amounts to $2k(k - 1)/2 = k(k - 1)$ pairs. Now consider a node located $d$ places to the left of $k$. It is also connected to its $k$ nearest neighbors from each side. Therefore, it will be connected to $k - d$ neighbors on $i$'s right side. The total number of connected neighbor pairs is\n",
"\n",
"\\begin{equation}\n",
" k(k-1) + \\sum_{d=1}^k (k-d) = k(k-1) + \\frac{k(k-1)}{2} = \\frac{{3}{2}} k (k-1)\n",
"\\end{equation}\n",
"\n",
"and the clustering coefficient is:\n",
"\n",
"\\begin{equation}\n",
" C = \\frac{\\frac{3}{2}k(k-1)}{k(2k-1)} =\\frac{3 (k-1)}{2(2k-1)}\n",
"\\end{equation}\n",
"\n",
"For every $k > 1$, this results in a constant larger than $0$, indicating that the clustering of a small-world network does not vanish for large networks. For large values of $k$, the clustering coefficient approaches $3/4$, that is, the clustering is very high. Note that for a regular two-dimensional grid, the clustering by definition is zero, since no triangles exist. However, it is clear that the grid has a neighborhood structure.\n",
"\n",
"## Distances in a small-world network}\n",
"\n",
"The second important property of small-world networks is their small diameter, i.e., the small distance between nodes in the network. The distance in the underlying lattice behaves as the linear length of the lattice, L. Since $N \\sim L^d$ where $d$ is the lattice dimension, it follows that the distance between nodes behaves as:\n",
"\n",
"\\begin{equation}\n",
" l \\sim L \\sim N^{1/d}\n",
"\\end{equation}\n",
"\n",
"Therefore, the underlying lattice has a finite dimension, and the distances on it behave as a power law of the number of nodes, i.e., the distance between nodes is large. However, when adding even a small fraction of shortcuts to the network, this behavior changes dramatically. \n",
"\n",
"Let's try to deduce the behavior of the average distance between nodes. Consider a small-world network, with dimension d and connecting distance $k$ (i.e., every node is connected to any other node whose distance from it in every linear dimension is at most $k$). Now, consider the nodes reachable from a source node with at most $r$ steps. When $r$ is small, these are just the \\emph{r-th} nearest neighbors of the source in the underlying lattice. We term the set of these neighbors a “patch”. the radius of which is $kr$ , and the number of nodes it contains is approximately $n(r) = (2kr)d$. \n",
"\n",
"We now want to find the distance r for which such a patch will contain about one shortcut. This will allow us to consider this patch as if it was a single node in a randomly connected network. Assume that the probability for a single node to have a shortcut is $\\Phi$. To find the length for which approximately one shortcut is encountered, we need to solve for $r$ the following equation: $(2kr)^d \\Phi = 1$. The correlation length $\\xi$ defined as the distance (or linear size of a patch) for which a shortcut will be encountered with high probability is therefore,\n",
"\n",
"\\begin{equation}\n",
" \\xi = \\frac{1}{k \\Phi^{1/d}}\n",
"\\end{equation}\n",
"\n",
"Note that we have omitted the factor 2, since we are interested in the order of magnitude. Let us denote by $V(r)$ the total number of nodes reachable from a node by at most $r$ steps, and by $a(r)$, the number of nodes added to a patch in the \\emph{r-th} step. That is, $a(r) = n(r) - n(r-1)$. Thus,\n",
"\n",
"\\begin{equation}\n",
" a(r) \\sim \\frac{\\text{d} n(r)}{\\text{d} r} = 2kd(2kr)^{d-1}\n",
"\\end{equation}\n",
"\n",
"When a shortcut is encountered at the r step from a node, it leads to a new patch \\footnote{It may actually lead to an already encountered patch, and two patches may also merge after some steps, but this occurs with negligible probability when $N \\to \\infty$ until most of the network is reachable}. This new patch occurs after $r'$ steps, and therefore the number of nodes reachable from its origin is $V (r - r')$. Thus, we obtain the recursive relation\n",
"\n",
"\\begin{equation} \n",
" V(r) = \\sum_{r'=0}^r a(r') [1 + \\xi^{-d}V(r-r')]\n",
"\\end{equation}\n",
"\n",
"where the first term stands for the size of the original patch, and the second term is derived from the probability of hitting a shortcut, which is approximately $\\xi -d $ for every new node encountered. To simplify the solution of \\ref{eq:recursion}, it can be approximated by a differential equation. The sum can be approximated by an integral, and then the equation can be differentiated with respect to $r$ . For simplicity, we will concentrate here on the solution for the one-dimensional case, with $k = 1$, where $a(r) = 2$. Thus, one obtains\n",
"\n",
"\\begin{equation}\n",
" \\frac{\\text{d} V(r)}{\\text{d} r} = 2 [1 + V(r)/\\xi]\n",
"\\end{equation}\n",
"\n",
"the solution of which is:\n",
"\n",
"\\begin{equation} \n",
" V(r) = \\xi \\left(e^{2r/\\xi} -1\\right)\n",
"\\end{equation}\n",
"\n",
"For $r \\ll \\xi$, the exponent can be expanded in a power series, and one obtains $V(r) \\sim 2r = n(r)$, as expected, since usually no shortcut is encountered. For $r \\ gg \\xi$, $V(r)$. An approximation for the average distance between nodes can be obtained by equating $V(r)$ from to the total number of nodes, $V(r) = N$. This results in\n",
"\n",
"\\begin{equation}=\n",
" r \\sim \\frac{\\xi}{2} \\ln \\frac{N}{\\xi}\n",
"\\end{equation}\n",
"\n",
"As apparent from \\ref{eq:average distance}, the average distance in a small-world network behaves as the distance in a random graph with patches of size $\\xi$ behaving as the nodes of the random graph. -->\n"
]
},
{
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Detecting Small-Worldness" "# Analysis of the results"
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"As we have seen, many real technological, biological, social, and information networks fall into the broad class of _small-world_ networks, a middle ground between regular and random networks: they have high local clustering of elements, like regular networks, but also short path lengths between elements, like random networks. Membership of the small-world network class also implies that the corresponding systems have dynamic properties different from those of equivalent random or regular networks. \n", "### Distribution of Degree\n",
"\n",
"However, the existing _small-world_ definition is a categorical one, and breaks the continuum of network topologies into the three classes of regular, random, and small-world networks, with the latter being the broadest. It is unclear to what extent the real-world systems in the small-world class have common network properties and to what specific point in the \\emph{middle-ground} (between random and regular) a network generating model must be tuned to genuinely capture the topology of such systems. \n",
"\n", "\n",
"The current _state of the art_ algorithm in the field of small-world network analysis is based on the idea that small-world networks should have some topological structure, reflected by properties such as an high clustering coefficient. On the other hand, random networks (as the Erd ̋os-Rényi model) have no such structure and, usually, a low clustering coefficient. The current \\emph{state of the art} algorithms can be empirically described in the following steps:\n",
"\n", "\n",
"\n",
"* Compute the average shortest path length $L$ and the average clustering coefficient $C$ of the target system.\n",
"* Create an ensemble of random networks with the same number of nodes and edges as the target system. Usually, the random networks are generated using the Erd ̋os-Rényi model.\n",
"* Compute the average shortest path length $L_r$ and the average clustering coefficient $C_r$ of each random network in the ensemble.\n",
"* Compute the normalized average shortest path length $\\lambda := L/L_n$ and the normalized average clustering coefficient $\\gamma := C/C_n$\n",
"* If $\\lambda$ and $\\gamma$ are close to 1, then the target system is a small-world network.\n",
"\n",
"\n",
"One of the problems with this interpretations is that we have no information on how the average shortest path scales with the network size. Specifically, a small-world network is defined to be a network where the typical distance $L$ between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes $N$ in the network.\n",
"$$ L \\propto N $$\n",
"But since we are working with a real-world network, there is no such thing as \"same network with different number of nodes\". So this definition, can't be applied in this case. \n",
"\n",
"Furthermore, let's try to take another approach. We can consider a definition of small-world network that it's not directly depend of $\\gamma$ and $\\lambda$, e.g:\n",
"\n",
"> _A small-world network is a spatial network with added long-range connections_\n",
"\n",
"Then we still cannot make robust implications as to whether such a definition is fulfilled just using $\\gamma$ and $\\lambda$ (or in fact other network measures). The interpretation of many studies assumes that all networks are a realization of the Watts-Strogatz model for some rewiring probability, which is not justified at all! We know many other network models, whose realizations are entirely different from the Watts-Strogatz model. \n",
"\n",
"The above method is not robust to measurement errors. Small errors when establishing a network from measurements suffice to make, e.g., a lattice look like a small-world network. \n",
"\n",
"<!-- See \\cite{https://doi.org/10.48550/arxiv.1111.4570} and \\cite{10.3389/fnhum.2016.00096}. -->"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# open the dataframe object\n",
"analysis_results = pd.read_pickle('analysis_results.pkl')\n",
"analysis_results"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution of Degree\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The Erdős-Rényi model has traditionally been the dominant subject of study in the field of random graphs. Recently, however, several studies of real-world networks have found that the ER model fails to reproduce many of their observed properties. One of the simplest properties of a network that can be measured directly is the degree distribution, or the fraction $P(k)$ of nodes having k connections (degree $k$). A well-known result for ER networks is that the degree distribution is Poissonian,\n", "The Erdős-Rényi model has traditionally been the dominant subject of study in the field of random graphs. Recently, however, several studies of real-world networks have found that the ER model fails to reproduce many of their observed properties. One of the simplest properties of a network that can be measured directly is the degree distribution, or the fraction $P(k)$ of nodes having k connections (degree $k$). A well-known result for ER networks is that the degree distribution is Poissonian,\n",
"\n", "\n",
"\\begin{equation}\n", "\\begin{equation}\n",
@ -980,7 +841,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"for G in checkins_graphs:\n", "for G in checkins_graphs:\n",
" degree_distribution(G, log=True)" " degree_distribution(G)"
] ]
}, },
{ {
@ -994,7 +855,6 @@
] ]
}, },
{ {
"attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
@ -1011,15 +871,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# for each network, create a erdos-renyi graph with the same number of nodes and edges \n", "# for each network, create a erdos-renyi model of the original. If you want to test it with the watts-strogatz model, uncomment the code below and comment the first 2 lines of the for loop\n",
"\n", "\n",
"for graph in checkins_graphs:\n", "for graph in checkins_graphs:\n",
" G = nx.erdos_renyi_graph(graph.number_of_nodes(), graph.number_of_nodes()/graph.number_of_edges())\n", "\n",
" G.name = graph.name + \" Erdos-Renyi\"\n", " p = G.number_of_edges() / (G.number_of_nodes())\n",
" avg_degree = int(np.mean([d for n, d in G.degree()]))\n",
" G = nx.watts_strogatz_graph(G.number_of_nodes(), avg_degree, p)\n",
" G.name = graph.name + \" Watts-Strogatz\"\n",
"\n",
" # G = nx.erdos_renyi_graph(graph.number_of_nodes(), nx.density(graph))\n",
" # G.name = graph.name + \" Erdos-Renyi\"\n",
" print(G.name)\n", " print(G.name)\n",
" print(\"Number of nodes: \", G.number_of_nodes())\n", " print(\"Number of nodes: \", G.number_of_nodes())\n",
" print(\"Number of edges: \", G.number_of_edges())\n", " print(\"Number of edges: \", G.number_of_edges())\n",
" degree_distribution(G, log=False)" " degree_distribution(G, log=False)\n",
" "
] ]
}, },
{ {
@ -1028,21 +895,47 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# for each network, create a erdos-renyi model of the original graph. If you want to test it with the watts-strogatz model, uncomment the code below and comment the first 2 lines of the for loop\n",
"\n",
"for graph in friendships_graph:\n", "for graph in friendships_graph:\n",
" G = nx.erdos_renyi_graph(graph.number_of_nodes(), graph.number_of_nodes()/graph.number_of_edges())\n", "\n",
" G.name = graph.name + \" Erdos-Renyi\"\n", " p = G.number_of_edges() / (G.number_of_nodes())\n",
" avg_degree = int(np.mean([d for n, d in G.degree()]))\n",
" G = nx.watts_strogatz_graph(G.number_of_nodes(), avg_degree, p)\n",
" G.name = graph.name + \" Watts-Strogatz\"\n",
"\n",
" # G = nx.erdos_renyi_graph(graph.number_of_nodes(), nx.density(graph))\n",
" # G.name = graph.name + \" Erdos-Renyi\" \n",
"\n",
" print(G.name)\n", " print(G.name)\n",
" print(\"Number of nodes: \", G.number_of_nodes())\n", " print(\"Number of nodes: \", G.number_of_nodes())\n",
" print(\"Number of edges: \", G.number_of_edges())\n", " print(\"Number of edges: \", G.number_of_edges())\n",
" degree_distribution(G, log=False)" " degree_distribution(G, log=False)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a Poissonian distribution, as expected.\n",
"\n",
"The degree distribution alone is not enough to characterize the network. There are many other quantities, such as the degree-degree correlation (between connected nodes), the spatial correlations, the clustering coefficient, the betweenness or central-ity distribution, and the self-similarity exponents."
]
},
{ {
"attachments": {}, "attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"This is a Poissonian distribution, as expected." "## The Small-World Model\n",
"\n",
"It should be clarified that real networks are not random. Their formation and development are dictated by a combination of many different processes and influences. These influencing conditions include natural limitations and processes, human considerations such as optimal performance and robustness, economic considerations, natural selection and many others. Controversies still exist regarding the measure to which random models represent real-world networks. However, in this section we will focus on random network models and attempt to show if their properties may still be used to study properties of our real-world networks. \n",
"\n",
"Many real-world networks have many properties that cannot be explained by the ER model. One such property is the high clustering observed in many real-world networks. This led Watts and Strogatz to develop an alternative model, called the “small-world” model. Quoting their paper:\n",
"\n",
"> \"highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs\"\n",
"\n",
"Their idea was to begin with an ordered lattice, such as the $k$-ring (a ring where each site is connected to its $2k$ nearest neighbors - $k$ from each side) or the two-dimensional lattice. For each site, each of the links emanating from it is removed with probability $\\varphi$ and is rewired to a randomly selected site in the network. In other words, small-world networks have the unique ability to have specialized nodes or regions within a network while simultaneously exhibiting shared or distributed processing across all of the communicating nodes within a network. "
] ]
}, },
{ {
@ -1050,11 +943,56 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The degree distribution alone is not enough to characterize the network. There are many other quantities, such as the degree-degree correlation (between connected nodes), the spatial correlations, the clustering coefficient, the betweenness or central-ity distribution, and the self-similarity exponents.\n", "## Small-Worldness"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Given the unique processing or information transfer capabilities of small-world networks, it is vital to determine whether this is a universal property of naturally occurring networks or whether small-world properties are restricted to specialized networks. An overly liberal definition of small-worldness might miss the specific benefits of these networks\n",
"\n", "\n",
"--- \n", "> high clustering and low path length\n",
"\n",
"and obscure them with networks more closely associated with regular lattices and random networks. A possible definition of a small-world network is that it has clustering similar to a regular lattice and path length similar to a random network. However, in practice, networks are typically defined as small-world by comparing clustering and path length to those of a comparable random network _(Humphries et al., 2006)_. Unfortunately, this means that networks with very low clustering can be, and indeed are, defined as small-world. We need a method that is able to distinguish true small-world networks from those that are more closely aligned with random or lattice structures and overestimates the occurrence of small-world networks. Networks that are more similar to random or lattice structures are interesting in their own right, but they do not behave like small-world networks\n",
"\n",
"## Identifying small-world networks\n",
"\n",
"Small-world networks are distinguished from other networks by two specific properties, the first being high clustering (C) among nodes. High clustering supports specialization as local collections of strongly interconnected nodes readily share information or resources. Conceptually, clustering is quite straightforward to comprehend. In a real-world analogy, clustering represents the probability that ones friends are also friends of each other. Small-world networks also have short path lengths (L) as is commonly observed in random networks. Path length is a measure of the distance between nodes in the network, calculated as the mean of the shortest geodesic distances between all possible node pairs. Small values of $L$ ensure that information or resources easily spreads throughout the network. This property makes distributed information processing possible on technological networks and supports the six degrees of separation often reported in social networks.\n",
"\n",
"Watts and Strogatz developed a network model (WS model) that resulted in the first-ever networks with clustering close to that of a lattice and path lengths similar to those of random networks. The WS model demonstrates that random rewiring of a small percentage of the edges in a lattice results in a precipitous decrease in the path length, but only trivial reductions in the clustering. Across this rewiring probability, there is a range where the discrepancy between clustering and path length is very large, and it is in this area that the benefits of small-world networks are realized.\n",
"\n",
"### A first approach: the $\\sigma$ coefficient\n",
"\n",
"In 2006, Humphries and colleagues introduced a quantitative metric, small-world coefficient $\\sigma$, that uses a ratio of network clustering and path length compared to its random network equivalent. In this quantitative approach, $C$ and $L$ are measured against those of their equivalent derived random networks ($C_{rand}$ and $L_{rand}$, respectively) to generate the ratios $c = C/C_{rand}$ and $k = L/L_{rand}$. These ratios are then used to calculate the small-coefficient as:\n",
"$$ \\sigma = \\frac{C/C_{rand}}{L/L_{rand}} = \\frac{\\gamma}{\\sigma} $$\n",
"The conditions that must be met for a network to be classified as small-world are $C \\gg C_{rand}$ and $L \\approx L_{rand}$, which results in $\\sigma > 1$. As it turns out, a major issue with $\\sigma$ is that the clustering coefficient of the equivalent random network greatly influences the small-world coefficient. In the small-world coefficient equation, $\\sigma$ uses the relationship between $C$ and $C_{rand}$ to determine the value of $\\gamma$. Because clustering in a random network is typically extremely low (Humphries and Gurney, 2008; Watts and Strogatz, 1998) the value of $\\gamma$ can be unduly influenced by only small changes in $C_{rand}$. \n",
"\n",
"### A more solid approach: the $\\omega$ coefficient\n",
"\n",
"Given a graph with characteristic path length, $L$, and clustering, $C$, the small-world measurement, $\\omega$, is defined by comparing the clustering of the network to that of an equivalent lattice network, $C_latt$, and comparing path length to that of an equivalent random network, $L_rand$; the relationship\n",
"is simply the difference of two ratios defined as:\n",
"$$ \\omega = \\frac{L_{rand}}{L} - \\frac{C}{C_{latt}} $$\n",
"In using the clustering of an equivalent lattice network rather than a random network, this metric is less susceptible to the fluctuations seen with $C_rand$. Moreover, values of $\\gamma$ are restricted to the interval $-1$ to $1$ regardless of network size. Values close to zero are considered small world.\n",
"\n",
"Positive values indicate a graph with more random characteristics, while negative values indicate a graph with more regular, or lattice-like, characteristics.\n",
"\n",
"#### Lattice network construction\n",
"\n",
"In the paper [1] the lattice network was generated by using a modified version of the latticization algorithm (Sporns and Zwi,2004) found in the brain connectivity toolbox (Rubinov and Sporns, 2010). The procedure is based on a Markov-chain algorithm that maintains node degree and swaps edges with uniform probability; however, swaps are carried out only if the resulting matrix has entries that are closer to the main diagonal. To optimize the clustering coefficient of the lattice network, the latticization procedure is performed over several user-defined repetitions. Storing the initial adjacency matrix and its clustering coefficient, the latticization procedure is performed on the matrix. If the clustering coefficient of the resulting matrix is lower, the initial matrix is kept and latticization is performed again on the same matrix; if the clustering coefficient is higher, then the initial adjacency matrix is replaced. This latticization process is repeated until clustering is maximized. This process results in a highly clustered network with long path length approximating a lattice topology. To decrease the processing time in larger networks, a sliding window procedure was developed. Smaller sections of the matrix are sampled along the main diagonal, latticized, and reinserted into the larger matrix in a step-wise fashion.\n",
"\n",
"#### Limitations\n",
"\n",
"The length of time it takes to generate lattice networks, particularly for large networks.Although\n",
"latticization is fast in smaller networks, large networks such as functional brain networks and the Internet can take several\n",
"hours to generate and optimize. The latticization procedure described here uses an algorithm developed by Sporns and\n",
"Zwi in 2004, but the algorithm was used on much smaller datasets. \n",
"\n", "\n",
"Now let's try to compute the same analysis made before for this random models" "Furthermore, $\\omega$ is limited by networks that have very low clustering that cannot be appreciably increased, such as networks with super hubs or hierarchical networks. In hierarchical networks, the nodes are often configured in branches\n",
"that contain little to no clustering. In networks with super hubs, the network may contain a hub that has a node with\n",
"a degree that is several times in magnitude greater than the next most connected hub. In both these networks, there are\n",
"fewer configurations to increase the clustering of the network. Moreover, in a targeted assault of these networks, the topology is easily destroyed (Albert et al., 2000). Such vulnerability to attack signifies a network that may not be small-world."
] ]
} }
], ],
@ -1074,7 +1012,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.8" "version": "3.10.9"
}, },
"orig_nbformat": 4, "orig_nbformat": 4,
"vscode": { "vscode": {

@ -30,7 +30,168 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Graph</th>\n",
" <th>Number of Nodes</th>\n",
" <th>Number of Edges</th>\n",
" <th>Average Degree</th>\n",
" <th>Average Clustering Coefficient</th>\n",
" <th>log N</th>\n",
" <th>Average Shortest Path Length</th>\n",
" <th>betweenness centrality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Brightkite Checkins Graph</td>\n",
" <td>7191</td>\n",
" <td>3663807</td>\n",
" <td>1018.997914</td>\n",
" <td>0.702854</td>\n",
" <td>8.880586</td>\n",
" <td>2.411011</td>\n",
" <td>0.00022</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Gowalla Checkins Graph</td>\n",
" <td>10702</td>\n",
" <td>303104</td>\n",
" <td>56.644366</td>\n",
" <td>0.505597</td>\n",
" <td>9.278186</td>\n",
" <td>5.222903</td>\n",
" <td>0.000301</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Foursquare EU Checkins Graph</td>\n",
" <td>20282</td>\n",
" <td>7430376</td>\n",
" <td>732.706439</td>\n",
" <td>0.597097</td>\n",
" <td>9.917489</td>\n",
" <td>2.2843</td>\n",
" <td>0.000089</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Foursquare IT Checkins Graph</td>\n",
" <td>3730</td>\n",
" <td>629749</td>\n",
" <td>337.667024</td>\n",
" <td>0.683565</td>\n",
" <td>8.224164</td>\n",
" <td>2.185477</td>\n",
" <td>0.000428</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brightkite Friendship Graph</td>\n",
" <td>5928</td>\n",
" <td>34673</td>\n",
" <td>11.698043</td>\n",
" <td>0.219749</td>\n",
" <td>8.687442</td>\n",
" <td>5.052162</td>\n",
" <td>0.000448</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>(Filtered) Gowalla Friendship Graph</td>\n",
" <td>8396</td>\n",
" <td>29122</td>\n",
" <td>6.937113</td>\n",
" <td>0.217544</td>\n",
" <td>9.035511</td>\n",
" <td>4.558532</td>\n",
" <td>0.000357</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Foursquare IT Friendship Graph</td>\n",
" <td>2073</td>\n",
" <td>6217</td>\n",
" <td>5.99807</td>\n",
" <td>0.148489</td>\n",
" <td>7.636752</td>\n",
" <td>19.530752</td>\n",
" <td>0.000879</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Foursquare EU Friendship Graph</td>\n",
" <td>16491</td>\n",
" <td>59419</td>\n",
" <td>7.206234</td>\n",
" <td>0.167946</td>\n",
" <td>9.710570</td>\n",
" <td>23.713864</td>\n",
" <td>0.000272</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Graph Number of Nodes Number of Edges \\\n",
"0 Brightkite Checkins Graph 7191 3663807 \n",
"1 Gowalla Checkins Graph 10702 303104 \n",
"2 Foursquare EU Checkins Graph 20282 7430376 \n",
"3 Foursquare IT Checkins Graph 3730 629749 \n",
"4 Brightkite Friendship Graph 5928 34673 \n",
"5 (Filtered) Gowalla Friendship Graph 8396 29122 \n",
"6 Foursquare IT Friendship Graph 2073 6217 \n",
"7 Foursquare EU Friendship Graph 16491 59419 \n",
"\n",
" Average Degree Average Clustering Coefficient log N \\\n",
"0 1018.997914 0.702854 8.880586 \n",
"1 56.644366 0.505597 9.278186 \n",
"2 732.706439 0.597097 9.917489 \n",
"3 337.667024 0.683565 8.224164 \n",
"4 11.698043 0.219749 8.687442 \n",
"5 6.937113 0.217544 9.035511 \n",
"6 5.99807 0.148489 7.636752 \n",
"7 7.206234 0.167946 9.710570 \n",
"\n",
" Average Shortest Path Length betweenness centrality \n",
"0 2.411011 0.00022 \n",
"1 5.222903 0.000301 \n",
"2 2.2843 0.000089 \n",
"3 2.185477 0.000428 \n",
"4 5.052162 0.000448 \n",
"5 4.558532 0.000357 \n",
"6 19.530752 0.000879 \n",
"7 23.713864 0.000272 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# import the graphs from the saved files\n", "# import the graphs from the saved files\n",
"G_brighkite_checkins = nx.read_gpickle(os.path.join('data', 'brightkite', 'brightkite_checkins_graph.gpickle'))\n", "G_brighkite_checkins = nx.read_gpickle(os.path.join('data', 'brightkite', 'brightkite_checkins_graph.gpickle'))\n",
@ -43,9 +204,9 @@
"G_foursquareEU_friends = nx.read_gpickle(os.path.join('data', 'foursquare', 'foursquareEU_friendships_graph.gpickle'))\n", "G_foursquareEU_friends = nx.read_gpickle(os.path.join('data', 'foursquare', 'foursquareEU_friendships_graph.gpickle'))\n",
"G_foursquareIT_friends = nx.read_gpickle(os.path.join('data', 'foursquare', 'foursquareIT_friendships_graph.gpickle'))\n", "G_foursquareIT_friends = nx.read_gpickle(os.path.join('data', 'foursquare', 'foursquareIT_friendships_graph.gpickle'))\n",
"\n", "\n",
"# # open the dataframe object\n", "# open the dataframe object\n",
"# analysis_results = pd.read_pickle('analysis_results.pkl')\n", "analysis_results = pd.read_pickle('analysis_results.pkl')\n",
"# analysis_results" "analysis_results"
] ]
}, },
{ {
@ -58,11 +219,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 3,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"analysis_results = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n", "# analysis_results = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n",
"\n", "\n",
"checkins_graphs = [G_brighkite_checkins, G_gowalla_checkins, G_foursquareEU_checkins, G_foursquareIT_checkins]\n", "checkins_graphs = [G_brighkite_checkins, G_gowalla_checkins, G_foursquareEU_checkins, G_foursquareIT_checkins]\n",
"friendships_graph = [G_brighkite_friends, G_gowalla_friends, G_foursquareIT_friends, G_foursquareEU_friends]\n", "friendships_graph = [G_brighkite_friends, G_gowalla_friends, G_foursquareIT_friends, G_foursquareEU_friends]\n",
@ -74,8 +235,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Original Graphs\n", "# Random shit"
"\n"
] ]
}, },
{ {
@ -84,235 +244,144 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for graph in graphs_all:\n", "# analysis_results_erods = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n",
" # add basic graph statistics\n", "\n",
" analysis_results = analysis_results.append(\n", "# analysis_results_ws = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n",
" {'Graph': graph.name, \n", "\n",
" 'Number of Nodes': graph.number_of_nodes(), \n", "# for graph in graphs_all:\n",
" 'log N': np.log(graph.number_of_nodes()),\n", "# print(\"\\nCreating random graph for graph: \", graph.name)\n",
" 'Number of Edges': graph.number_of_edges()}, \n", "# G_erd = create_random_graphs(graph, model='erdos', save=False)\n",
" ignore_index=True)\n", "# G_ws = create_random_graphs(graph, model='watts_strogatz', save=False)\n",
"\n", " \n",
" # add average degree\n", "# # add the basic information to the dataframe\n",
" print(\"Computing average degree for graph: \", graph.name)\n", "# analysis_results_erods = analysis_results_erods.append({\n",
" avg_deg = np.mean([d for n, d in graph.degree()])\n", "# 'Graph': G_erd.name,\n",
" analysis_results.loc[analysis_results['Graph'] == graph.name, 'Average Degree'] = avg_deg\n", "# 'Number of Nodes': G_erd.number_of_nodes(),\n",
"\n", "# 'Number of Edges': G_erd.number_of_edges(),\n",
" # add average clustering coefficient\n", "# 'log N': np.log(G_erd.number_of_nodes())\n",
" print(\"Computing average clustering coefficient for graph: \", graph.name)\n", "# }, ignore_index=True)\n",
" avg_clustering = nx.average_clustering(graph)\n", "\n",
" analysis_results.loc[analysis_results['Graph'] == graph.name, 'Average Clustering Coefficient'] = avg_clustering\n", "# # add the basic information to the dataframe\n",
"\n", "# analysis_results_ws = analysis_results_ws.append({\n",
" # add average shortest path length\n", "# 'Graph': G_ws.name,\n",
" print(\"Computing average shortest path length for graph: \", graph.name)\n", "# 'Number of Nodes': G_ws.number_of_nodes(),\n",
" average_shortest_path_length = average_shortest_path(graph)\n", "# 'Number of Edges': G_ws.number_of_edges(),\n",
" analysis_results.loc[analysis_results['Graph'] == graph.name, 'Average Shortest Path Length'] = average_shortest_path_length\n", "# 'log N': np.log(G_ws.number_of_nodes())\n",
"\n", "# }, ignore_index=True)\n",
" # add betweenness centrality\n", "\n",
" print(\"Computing betweenness centrality for graph: \", graph.name)\n", "# # compute the average degree and add it to the dataframes\n",
" betweenness_centrality = np.mean(list(betweenness_centrality_parallel(graph, 6).values()))\n", "# avg_deg_erd = np.mean([d for n, d in G_erd.degree()])\n",
" analysis_results.loc[analysis_results['Graph'] == graph.name, 'betweenness centrality'] = betweenness_centrality\n", "# avg_deg_ws = np.mean([d for n, d in G_ws.degree()])\n",
" print()\n", "# analysis_results_erods.loc[analysis_results_erods['Graph'] == G_erd.name, 'Average Degree'] = avg_deg_erd\n",
"\n", "# analysis_results_ws.loc[analysis_results_ws['Graph'] == G_ws.name, 'Average Degree'] = avg_deg_ws\n",
"\n", "\n",
"analysis_results\n", "# # compute the average clustering coefficient and add it to the dataframes\n",
"analysis_results.to_pickle('analysis_results.pkl')" "# avg_clustering_erd = average_clustering_coefficient(G_erd, k = 0.9)\n",
"# avg_clustering_ws = average_clustering_coefficient(G_ws, k = 0.9)\n",
"# analysis_results_erods.loc[analysis_results_erods['Graph'] == G_erd.name, 'Average Clustering Coefficient'] = avg_clustering_erd\n",
"# analysis_results_ws.loc[analysis_results_ws['Graph'] == G_ws.name, 'Average Clustering Coefficient'] = avg_clustering_ws\n",
"\n",
"# # compute the average shortest path length and add it to the dataframes\n",
"# average_shortest_path_length_erd = average_shortest_path(G_erd, k = 0.9)\n",
"# average_shortest_path_length_ws = average_shortest_path(G_ws, k = 0.9)\n",
"# analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'Average Shortest Path Length'] = average_shortest_path_length_erd\n",
"# analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'Average Shortest Path Length'] = average_shortest_path_length_ws\n",
"\n",
"# # compute the betweenness centrality and add it to the dataframes\n",
"# betweenness_centrality_erd = np.mean(list(betweenness_centrality_parallel(G_erd, 4, k = 0.9).values()))\n",
"# betweenness_centrality_ws = np.mean(list(betweenness_centrality_parallel(G_ws, 4, k = 0.9).values()))\n",
"# analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'betweenness centrality'] = betweenness_centrality_erd\n",
"# analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'betweenness centrality'] = betweenness_centrality_ws\n",
"\n",
"# # save memory\n",
"# del G_erd, G_ws\n",
"\n",
"# analysis_results_erods.to_pickle('analysis_results_erods.pkl')\n",
"# analysis_results_ws.to_pickle('analysis_results_ws.pkl')"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Random shit" "# Small Worldness\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have already computed the average clusting coefficient and the average shortesh path len for our networks. We can save a lot of time by skipping this computations"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 4,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [],
{
"ename": "AttributeError",
"evalue": "'NoneType' object has no attribute 'name'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[6], line 10\u001b[0m\n\u001b[1;32m 6\u001b[0m G \u001b[38;5;241m=\u001b[39m create_random_graphs(graph, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124merods\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 8\u001b[0m \u001b[38;5;66;03m# add the basic information to the dataframe\u001b[39;00m\n\u001b[1;32m 9\u001b[0m analysis_results_erods \u001b[38;5;241m=\u001b[39m analysis_results_erods\u001b[38;5;241m.\u001b[39mappend({\n\u001b[0;32m---> 10\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mGraph\u001b[39m\u001b[38;5;124m'\u001b[39m: \u001b[43mG\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m,\n\u001b[1;32m 11\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mNumber of Nodes\u001b[39m\u001b[38;5;124m'\u001b[39m: G\u001b[38;5;241m.\u001b[39mnumber_of_nodes(),\n\u001b[1;32m 12\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mNumber of Edges\u001b[39m\u001b[38;5;124m'\u001b[39m: G\u001b[38;5;241m.\u001b[39mnumber_of_edges(),\n\u001b[1;32m 13\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mlog N\u001b[39m\u001b[38;5;124m'\u001b[39m: np\u001b[38;5;241m.\u001b[39mlog(G\u001b[38;5;241m.\u001b[39mnumber_of_nodes())\n\u001b[1;32m 14\u001b[0m }, ignore_index\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 16\u001b[0m \u001b[38;5;66;03m# compute the average degree and add it to the dataframe\u001b[39;00m\n\u001b[1;32m 17\u001b[0m avg_deg \u001b[38;5;241m=\u001b[39m np\u001b[38;5;241m.\u001b[39mmean([d \u001b[38;5;28;01mfor\u001b[39;00m n, d \u001b[38;5;129;01min\u001b[39;00m G\u001b[38;5;241m.\u001b[39mdegree()])\n",
"\u001b[0;31mAttributeError\u001b[0m: 'NoneType' object has no attribute 'name'"
]
}
],
"source": [ "source": [
"analysis_results_erods = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n", "def omega(G, C_og, L_og, niter, nrand):\n",
"\n", " randMetrics = {\"C\": [], \"L\": []}\n",
"# read all the graphs gpickle files in the data/random/erdos folder. Then run the same analysis as before for this graphs\n",
"\n",
"for graph in graphs_all:\n",
" G = create_random_graphs(graph, \"erods\")\n",
"\n", "\n",
" # add the basic information to the dataframe\n", " # Calculate initial average clustering coefficient which potentially will\n",
" analysis_results_erods = analysis_results_erods.append({\n", " # get replaced by higher clustering coefficients from generated lattice\n",
" 'Graph': G.name,\n", " # reference graphs\n",
" 'Number of Nodes': G.number_of_nodes(),\n", " Cl = C_og\n",
" 'Number of Edges': G.number_of_edges(),\n",
" 'log N': np.log(G.number_of_nodes())\n",
" }, ignore_index=True)\n",
"\n", "\n",
" # compute the average degree and add it to the dataframe\n", " niter_lattice_reference = niter\n",
" avg_deg = np.mean([d for n, d in G.degree()])\n", " niter_random_reference = niter * 2\n",
" analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'Average Degree'] = avg_deg\n",
"\n", "\n",
" # compute the average clustering coefficient and add it to the dataframe\n", " for _ in range(nrand):\n",
" avg_clustering = nx.average_clustering(G)\n", " \n",
" analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'Average Clustering Coefficient'] = avg_clustering\n", " # Generate random graph\n",
" Gr = nx.random_reference(G, niter=niter_random_reference, seed=42)\n",
" randMetrics[\"L\"].append(nx.average_shortest_path_length(Gr))\n",
"\n", "\n",
" # compute the average shortest path length and add it to the dataframe\n", " # Generate lattice graph\n",
" average_shortest_path_length = average_shortest_path(G)\n", " Gl = nx.lattice_reference(G, niter=niter_lattice_reference, seed=42)\n",
" analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'Average Shortest Path Length'] = average_shortest_path_length\n",
"\n", "\n",
" # compute the betweenness centrality and add it to the dataframe\n", " # Replace old clustering coefficient, if clustering is higher in\n",
" betweenness_centrality = np.mean(list(betweenness_centrality_parallel(G, 6).values()))\n", " # generated lattice reference\n",
" analysis_results_erods.loc[analysis_results_erods['Graph'] == G.name, 'betweenness centrality'] = betweenness_centrality\n", " Cl_temp = nx.average_clustering(Gl)\n",
" if Cl_temp > Cl:\n",
" Cl = Cl_temp\n",
"\n", "\n",
" # save memory\n", " C = C_og\n",
" del G\n", " L = L_og\n",
" Lr = np.mean(randMetrics[\"L\"])\n",
"\n", "\n",
"analysis_results_erods.to_pickle('analysis_results_erods.pkl')\n", " omega = (Lr / L) - (C / Cl)"
"analysis_results_erods\n"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 5,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"\tNumber of edges in the original graph: 3663807\n", "Brightkite Checkins Graph\n"
"\tNumber of edges in the random graph: 3660219\n"
]
},
{
"ename": "UnboundLocalError",
"evalue": "local variable 'G_copy' referenced before assignment",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mUnboundLocalError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[7], line 25\u001b[0m\n\u001b[1;32m 22\u001b[0m analysis_results_ws\u001b[38;5;241m.\u001b[39mloc[analysis_results_ws[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mGraph\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m==\u001b[39m G\u001b[38;5;241m.\u001b[39mname, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mAverage Clustering Coefficient\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m avg_clustering\n\u001b[1;32m 24\u001b[0m \u001b[38;5;66;03m# compute the average shortest path length and add it to the dataframe\u001b[39;00m\n\u001b[0;32m---> 25\u001b[0m average_shortest_path_length \u001b[38;5;241m=\u001b[39m \u001b[43maverage_shortest_path\u001b[49m\u001b[43m(\u001b[49m\u001b[43mG\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 26\u001b[0m analysis_results_ws\u001b[38;5;241m.\u001b[39mloc[analysis_results_ws[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mGraph\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m==\u001b[39m G\u001b[38;5;241m.\u001b[39mname, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mAverage Shortest Path Length\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m average_shortest_path_length\n\u001b[1;32m 28\u001b[0m \u001b[38;5;66;03m# compute the betweenness centrality and add it to the dataframe\u001b[39;00m\n",
"File \u001b[0;32m~/github/small-worlds/utils.py:497\u001b[0m, in \u001b[0;36maverage_shortest_path\u001b[0;34m(G, k)\u001b[0m\n\u001b[1;32m 494\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39m\\t\u001b[39;00m\u001b[39mNumber of edges after removing \u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m% o\u001b[39;00m\u001b[39mf nodes: \u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m.\u001b[39mformat((k)\u001b[39m*\u001b[39m\u001b[39m100\u001b[39m, G_copy\u001b[39m.\u001b[39mnumber_of_edges()))\n\u001b[1;32m 496\u001b[0m tmp \u001b[39m=\u001b[39m \u001b[39m0\u001b[39m\n\u001b[0;32m--> 497\u001b[0m connected_components \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(nx\u001b[39m.\u001b[39mconnected_components(G_copy))\n\u001b[1;32m 498\u001b[0m \u001b[39m# remove all the connected components with less than 10 nodes\u001b[39;00m\n\u001b[1;32m 499\u001b[0m connected_components \u001b[39m=\u001b[39m [c \u001b[39mfor\u001b[39;00m c \u001b[39min\u001b[39;00m connected_components \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(c) \u001b[39m>\u001b[39m \u001b[39m10\u001b[39m]\n",
"\u001b[0;31mUnboundLocalError\u001b[0m: local variable 'G_copy' referenced before assignment"
] ]
} }
], ],
"source": [ "source": [
"# do the same with the watts strogatz graphs\n", "analysis_results = pd.read_pickle('analysis_results.pkl')\n",
"\n", "\n",
"analysis_results_ws = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n", "omega_results = pd.DataFrame(columns=['Graph', 'omega'])\n",
"\n", "\n",
"for graph in graphs_all:\n", "for G in checkins_graphs:\n",
" G = create_random_graphs(graph, 'watts_strogatz', save=False)\n", " print(G.name)\n",
" C_og = analysis_results.loc[analysis_results['Graph'] == G.name, 'Average Clustering Coefficient'].values[0]\n",
" L_og = analysis_results.loc[analysis_results['Graph'] == G.name, 'Average Shortest Path Length'].values[0]\n",
"\n", "\n",
" # add the basic information to the dataframe\n", " omega = omega(G, C_og, L_og, 2, 3)\n",
" analysis_results_ws = analysis_results_ws.append({\n", " \n",
" omega_results = omega_results.append({\n",
" 'Graph': G.name,\n", " 'Graph': G.name,\n",
" 'Number of Nodes': G.number_of_nodes(),\n", " 'omega': omega\n",
" 'Number of Edges': G.number_of_edges(),\n", " }, ignore_index=True)"
" 'log N': np.log(G.number_of_nodes())\n",
" }, ignore_index=True)\n",
"\n",
" # compute the average degree and add it to the dataframe\n",
" avg_deg = np.mean([d for n, d in G.degree()])\n",
" analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'Average Degree'] = avg_deg\n",
"\n",
" # compute the average clustering coefficient and add it to the dataframe\n",
" avg_clustering = nx.average_clustering(G)\n",
" analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'Average Clustering Coefficient'] = avg_clustering\n",
"\n",
" # compute the average shortest path length and add it to the dataframe\n",
" average_shortest_path_length = average_shortest_path(G)\n",
" analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'Average Shortest Path Length'] = average_shortest_path_length\n",
"\n",
" # compute the betweenness centrality and add it to the dataframe\n",
" betweenness_centrality = np.mean(list(betweenness_centrality_parallel(G, 6).values()))\n",
" analysis_results_ws.loc[analysis_results_ws['Graph'] == G.name, 'betweenness centrality'] = betweenness_centrality\n",
"\n",
" # save memory\n",
" del G\n",
"\n",
"analysis_results_ws.to_pickle('analysis_results_ws.pkl')\n",
"analysis_results_ws"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"G = nx.watts_strogatz_graph(1000, 4, 0.1)\n",
"adj = nx.to_scipy_sparse_array(G)\n",
"# print info about the graph and the matrix\n",
"print(\"Number of nodes: \", G.number_of_nodes())\n",
"print(\"Number of edges: \", G.number_of_edges())\n",
"print(\"Average degree: \", np.mean([d for n, d in G.degree()]))\n",
"print(\"Average clustering coefficient: \", nx.average_clustering(G))\n",
"print(\"Average shortest path length: \", nx.average_shortest_path_length(G))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import scipy.sparse as sp\n",
"\n",
"# randomly swap edges, but keep the degree of each node the same (i.e. the degree sequence is preserved)\n",
"def random_swap_edges(adj, nswap=1, max_tries=100):\n",
" # use numpy and scipy to speed up the process\n",
" adj = sp.csr_matrix(adj)\n",
" n, m = adj.shape \n",
" assert n == m # make sure the adjacency matrix is square\n",
" adj_triu = sp.triu(adj) # only consider the upper triangular part of the adjacency matrix\n",
" adj_tuple = sp.find(adj_triu) # get the indices and values of the non-zero elements\n",
" adj_edges = np.array(list(zip(adj_tuple[0], adj_tuple[1]))) # get the edges\n",
" adj_data = adj_tuple[2] # get the edge weights\n",
" nnz = adj_edges.shape[0] # number of non-zero elements\n",
" assert nnz == adj_data.shape[0] # make sure the number of edges and edge weights are the same\n",
" for _ in range(nswap): # repeat nswap times\n",
" # choose random edges to swap\n",
" edge_idx = np.random.choice(nnz, size=2, replace=False) # choose two random edges\n",
" edge1 = adj_edges[edge_idx[0]] # get the first edge\n",
" edge2 = adj_edges[edge_idx[1]] # get the second edge\n",
" # make sure the edges are not self-loops and not already connected\n",
" if edge1[0] == edge2[0] or edge1[0] == edge2[1] or edge1[1] == edge2[0] or edge1[1] == edge2[1] or adj[edge1[0], edge2[1]] or adj[edge2[0], edge1[1]]: \n",
" continue # if the edges are self-loops or already connected, try again\n",
" # swap the edges\n",
" adj[edge1[0], edge1[1]] = 0 \n",
" adj[edge2[0], edge2[1]] = 0 \n",
" adj[edge1[0], edge2[1]] = 1\n",
" adj[edge2[0], edge1[1]] = 1\n",
" # update adj_edges and adj_data\n",
" adj_edges[edge_idx[0]] = [edge1[0], edge2[1]]\n",
" adj_edges[edge_idx[1]] = [edge2[0], edge1[1]]\n",
" adj_data[edge_idx[0]] = 1\n",
" adj_data[edge_idx[1]] = 1\n",
" return adj\n",
"\n",
"adj_swapped = random_swap_edges(adj, nswap=1)"
] ]
}, },
{ {
@ -321,14 +390,17 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# create a new graph from the swapped adjacency matrix\n", "for G in friendships_graphs:\n",
"G_swapped = nx.from_scipy_sparse_matrix(adj_swapped)\n", " print(G.name)\n",
"# print info about the graph and the matrix\n", " C_og = analysis_results.loc[analysis_results['Graph'] == G.name, 'Average Clustering Coefficient'].values[0]\n",
"print(\"Number of nodes: \", G_swapped.number_of_nodes())\n", " L_og = analysis_results.loc[analysis_results['Graph'] == G.name, 'Average Shortest Path Length'].values[0]\n",
"print(\"Number of edges: \", G_swapped.number_of_edges())\n", "\n",
"print(\"Average degree: \", np.mean([d for n, d in G_swapped.degree()]))\n", " omega = omega(G, C_og, L_og, 2, 3)\n",
"print(\"Average clustering coefficient: \", nx.average_clustering(G_swapped))\n", " \n",
"print(\"Average shortest path length: \", nx.average_shortest_path_length(G_swapped))" " omega_results = omega_results.append({\n",
" 'Graph': G.name,\n",
" 'omega': omega\n",
" }, ignore_index=True)"
] ]
} }
], ],
@ -348,7 +420,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.8" "version": "3.10.9"
}, },
"orig_nbformat": 4, "orig_nbformat": 4,
"vscode": { "vscode": {

Binary file not shown.

@ -18,7 +18,7 @@ It should be clarified that real networks are not random. Their formation and de
\nd The simplest way to treat clustering analytically in a small-world network is to use the link addition, rather than the rewiring model. In the limit of large network size, $N \to \infty$, and for a fixed fraction of shortcuts $\phi$, it is clear that the probability of forming triangle vanishes as we approach $1/N$, so the contribution of the shortcuts to the clustering is negligible. Therefore, the clustering of a small-world network is determined by its underlying ordered lattice. For example, consider a ring where each node is connected to its $k$ closest neighbors from each side. A node's number of neighbors is therefore $2k$, and thus it has $2k(2k - 1)/2 = k(2k - 1)$ pairs of neighbors. Consider a node, $i$. All of the $k$ nearest nodes on $i$'s left are connected to each other, and the same is true for the nodes on $i$'s right. This amounts to $2k(k - 1)/2 = k(k - 1)$ pairs. Now consider a node located $d$ places to the left of $k$. It is also connected to its $k$ nearest neighbors from each side. Therefore, it will be connected to $k - d$ neighbors on $i$'s right side. The total number of connected neighbor pairs is \nd The simplest way to treat clustering analytically in a small-world network is to use the link addition, rather than the rewiring model. In the limit of large network size, $N \to \infty$, and for a fixed fraction of shortcuts $\phi$, it is clear that the probability of forming triangle vanishes as we approach $1/N$, so the contribution of the shortcuts to the clustering is negligible. Therefore, the clustering of a small-world network is determined by its underlying ordered lattice. For example, consider a ring where each node is connected to its $k$ closest neighbors from each side. A node's number of neighbors is therefore $2k$, and thus it has $2k(2k - 1)/2 = k(2k - 1)$ pairs of neighbors. Consider a node, $i$. All of the $k$ nearest nodes on $i$'s left are connected to each other, and the same is true for the nodes on $i$'s right. This amounts to $2k(k - 1)/2 = k(k - 1)$ pairs. Now consider a node located $d$ places to the left of $k$. It is also connected to its $k$ nearest neighbors from each side. Therefore, it will be connected to $k - d$ neighbors on $i$'s right side. The total number of connected neighbor pairs is
\begin{equation} \begin{equation}
k(k-1) + \sum_{d=1}^k (k-d) = k(k-1) + \frac{k(k-1)}{2} = \frac{{3}{2}} k (k-1) k(k-1) + \sum_{d=1}^k (k-d) = k(k-1) + \frac{k(k-1)}{2} = \frac{3}{2} k (k-1)
\end{equation} \end{equation}
\nd and the clustering coefficient is: \nd and the clustering coefficient is:

@ -509,41 +509,63 @@ def average_shortest_path(G: nx.Graph, k=None) -> float:
def average_clustering_coefficient(G: nx.Graph, k=None) -> float: def average_clustering_coefficient(G: nx.Graph, k=None) -> float:
""" """
This function takes in input a networkx graph and returns the average clustering coefficient of the graph. This works also for disconnected graphs. This function takes in input a networkx graph and returns the average clustering coefficient of the graph. This works also for disconnected graphs.
Parameters
----------
G : networkx graph
The graph to compute the average clustering coefficient of.
k : int
percentage of nodes to remove from the graph. If k is None, the average clustering coefficient of each connected component is computed using all the nodes of the connected component.
Returns
-------
float
The average clustering coefficient of the graph.
Raises
------
ValueError
If k is not between 0 and 1
"""
if k is not None and (k < 0 or k > 1):
raise ValueError("k must be between 0 and 1")
elif k is None:
return nx.average_clustering(G)
else:
G_copy = G.copy()
G_copy.remove_nodes_from(random.sample(list(G_copy.nodes()), int((k)*G_copy.number_of_nodes())))
print("\tNumber of nodes after removing {}% of nodes: {}" .format((k)*100, G_copy.number_of_nodes()))
return nx.average_clustering(G_copy)
# ------------------------------------------------------------------------# Parameters
----------
G : networkx graph
The graph to compute the average clustering coefficient of.
k : int
percentage of nodes to remove from the graph. If k is None, the average clustering coefficient of each connected component is computed using all the nodes of the connected component.
Returns
-------
float
The average clustering coefficient of the graph.
Raises
------
ValueError
If k is not between 0 and 1
"""
if k is not None and (k < 0 or k > 1):
raise ValueError("k must be between 0 and 1")
elif k is None:
return nx.average_clustering(G)
else:
G_copy = G.copy()
G_copy.remove_nodes_from(random.sample(list(G_copy.nodes()), int((k)*G_copy.number_of_nodes())))
print("\tNumber of nodes after removing {}% of nodes: {}" .format((k)*100, G_copy.number_of_nodes()))
return nx.average_clustering(G_copy)
def generalized_average_clustering_coefficient(G: nx.Graph) -> float:
"""
Generalized definition of the average clustering coefficient of a graph. It better applies to small world networks and it's way more efficient than the average_clustering_coefficient function with the standard definition of the clustering coefficient.
Parameters
----------
G : networkx graph
The graph to compute the generalized average clustering coefficient of.
Returns
-------
float
The generalized average clustering coefficient of the graph.
"""
C = 0
for node in G.nodes():
k = G.degree(node)
C += (3*(k-1))/(2*(2*k - 1))
return C/G.number_of_nodes()
# ------------------------------------------------------------------------#
def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph: def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph:
@ -601,52 +623,4 @@ def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph:
return G_random return G_random
# elif model == "regular":
# G_random = nx.random_regular_graph(1, G.number_of_nodes())
# print("\tNumber of edges in the original graph: {}" .format(G.number_of_edges()))
# print("\tNumber of edges in the random graph: {}" .format(G_random.number_of_edges()))
# G_random.name = G.name + "regular"
# if save:
# # check if the folder exists, otherwise create it
# if not os.path.exists(os.path.join('data', 'random', 'regular')):
# os.makedirs(os.path.join('data', 'random', 'regular'))
# nx.write_gpickle(G_random, os.path.join('data', 'random', 'regular', "regular_" + str(G.number_of_nodes()) + "_" + str(G_random.number_of_edges()) + ".gpickle"))
# print("\tThe file graph has been saved in the folder data/random/regular with the syntax regular_n_nodes_n_edges.gpickle")
# return G_random
# elif model == "reference":
# G_random = nx.random_reference(G)
# print("\tNumber of edges in the original graph: {}" .format(G.number_of_edges()))
# print("\tNumber of edges in the random graph: {}" .format(G_random.number_of_edges()))
# G_random.name = G.name + "reference"
# if save:
# # check if the folder exists, otherwise create it
# if not os.path.exists(os.path.join('data', 'random', 'reference')):
# os.makedirs(os.path.join('data', 'random', 'reference'))
# nx.write_gpickle(G_random, os.path.join('data', 'random', 'reference', "reference_" + str(G.number_of_nodes()) + "_" + str(G_random.number_of_edges()) + ".gpickle"))
# print("\tThe file graph has been saved in the folder data/random/reference with the syntax reference_n_nodes_n_edges.gpickle")
# return G_random
# #lattice
# elif model == "lattice":
# G_random = nx.lattice_reference(G, 1)
# print("\tNumber of edges in the original graph: {}" .format(G.number_of_edges()))
# print("\tNumber of edges in the random graph: {}" .format(G_random.number_of_edges()))
# G_random.name = G.name + "lattice"
# if save:
# # check if the folder exists, otherwise create it
# if not os.path.exists(os.path.join('data', 'random', 'lattice')):
# os.makedirs(os.path.join('data', 'random', 'lattice'))
# nx.write_gpickle(G_random, os.path.join('data', 'random', 'lattice', "lattice_" + str(G.number_of_nodes()) + "_" + str(G_random.number_of_edges()) + ".gpickle"))
# print("\tThe file graph has been saved in the folder data/random/lattice with the syntax lattice_n_nodes_n_edges.gpickle")
# return G_random

Loading…
Cancel
Save