Small changes, still work to do

main
Luca Lombardo 2 years ago
parent ab99e446d0
commit 51d21ae1a1

@ -23,8 +23,6 @@
"import geopandas as gpd\n",
"import multiprocessing\n",
"\n",
"\n",
"\n",
"# ignore warnings\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
@ -369,7 +367,9 @@
" sep='\\t', \n",
" header=False, \n",
" index=False, \n",
" columns=['user id', 'location id'])"
" columns=['user id', 'location id'])\n",
"\n",
"# I prefer not to delete the full dataset, since it's bad practice in my opinion. If you want to delete it, uncomment the following line"
]
},
{
@ -433,10 +433,11 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This is still a bit too much, to help us in the next sections, let's take a subset of the European are"
"This is still a bit too much, to help us in the next sections, let's take a subset of the European area"
]
},
{
@ -471,7 +472,7 @@
"gdf_gowalla.plot(marker='o', color='red', markersize=1)\n",
"\n",
"df_gowalla = gdf_gowalla\n",
"print(\"Number of unique users in the UE area: \", len(df_gowalla['user id'].unique()))\n",
"print(\"Number of unique users in the EU area: \", len(df_gowalla['user id'].unique()))\n",
"\n",
"# remove from memory the geopandas dataframe, it was only used for plotting\n",
"del gdf_gowalla"
@ -496,7 +497,11 @@
" sep='\\t', \n",
" header=False, \n",
" index=False, \n",
" columns=['user id', 'location id'])"
" columns=['user id', 'location id'])\n",
"\n",
"# I prefer not to delete the full dataset, since it's bad practice in my opinion. If you want to delete it, uncomment the following line\n",
"\n",
"# os.remove(os.path.join('test_data', 'brightkite', 'brightkite_checkins_full.txt'))"
]
},
{
@ -597,7 +602,11 @@
" sep='\\t',\n",
" header=False,\n",
" index=False,\n",
" columns=['user id', 'venue id'])"
" columns=['user id', 'venue id'])\n",
"\n",
"# I prefer not to delete the full dataset, since it's bad practice in my opinion. If you want to delete it, uncomment the following line\n",
"\n",
"# os.remove(os.path.join('test_data', 'foursquare', 'foursquare_checkins_full.txt'))"
]
},
{
@ -702,7 +711,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 84831/84831 [00:00<00:00, 285874.40it/s]\n"
"100%|██████████| 84831/84831 [00:00<00:00, 276398.10it/s]\n"
]
},
{
@ -718,7 +727,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 31095/31095 [00:00<00:00, 329181.10it/s]\n"
"100%|██████████| 31095/31095 [00:00<00:00, 302198.41it/s]\n"
]
},
{
@ -734,7 +743,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 40650/40650 [00:00<00:00, 131337.54it/s]\n"
"100%|██████████| 40650/40650 [00:00<00:00, 108484.63it/s]\n"
]
},
{
@ -885,7 +894,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
@ -1034,7 +1043,7 @@
}
],
"source": [
"analysis_results = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality'], index=None)\n",
"analysis_results = pd.DataFrame(columns=['Graph', 'Number of Nodes', 'Number of Edges', 'Average Degree', 'Average Clustering Coefficient', 'log N', 'Average Shortest Path Length', 'betweenness centrality', 'omega-coefficient'], index=None)\n",
"\n",
"for graph in graphs_all:\n",
" analysis_results = analysis_results.append(\n",
@ -16432,7 +16441,7 @@
"\n",
"### A more solid approach: the $\\omega$ coefficient\n",
"\n",
"Given a graph with characteristic path length, $L$, and clustering, $C$, the small-world measurement, $\\omega$, is defined by comparing the clustering of the network to that of an equivalent lattice network, $C_latt$, and comparing path length to that of an equivalent random network, $L_rand$; the relationship\n",
"Given a graph with characteristic path length, $L$, and clustering, $C$, the small-world measurement, $\\omega$, is defined by comparing the clustering of the network to that of an equivalent lattice network, $C_{latt}$, and comparing path length to that of an equivalent random network, $L_rand$; the relationship\n",
"is simply the difference of two ratios defined as:\n",
"$$ \\omega = \\frac{L_{rand}}{L} - \\frac{C}{C_{latt}} $$\n",
"In using the clustering of an equivalent lattice network rather than a random network, this metric is less susceptible to the fluctuations seen with $C_rand$. Moreover, values of $\\gamma$ are restricted to the interval $-1$ to $1$ regardless of network size. Values close to zero are considered small world.\n",
@ -16450,6 +16459,372 @@
"\n",
"Furthermore, $\\omega$ is limited by networks that have very low clustering that cannot be appreciably increased, such as networks with 'super hubs' or hierarchical networks. In hierarchical networks, the nodes are often configured in branches that contain little to no clustering. In networks with super hubs, the network may contain a hub that has a node with a degree that is several times in magnitude greater than the next most connected hub. In both these networks, there are fewer configurations to increase the clustering of the network. Moreover, in a targeted assault of these networks, the topology is easily destroyed (Albert et al., 2000). Such vulnerability to attack signifies a network that may not be small-world."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Omega coefficient computation\n",
"\n",
"The computation of the omega coefficient, a measure of small-worldness in networks, is a time-consuming process. To efficiently compare the clustering coefficient and the shortest path length, we constructed both the lattice reference network and the random reference network multiple times. However, given the limited resources and the utilization of Python, a strategy was employed to reduce computation time.\n",
"\n",
"This strategy consisted of:\n",
"\n",
"1. Generating a random sample of the network\n",
"2. Performing a specified number of rewiring operations per edge to compute the equivalent random graph\n",
"3. Computing the average clustering coefficient and average shortest path length for a specified number of random graphs and then averaging the results\n",
"4. Calculating the omega coefficient for the random sample with formula stated above\n",
" \n",
"Despite the aforementioned sampling technique, the computation of the omega coefficient remained computationally intensive. To mitigate over-sampling and potential bias, the computation was performed on a subset of the network with cardinality $|N|/2$. Additionally, both the number of rewiring operations per edge and the number of random graphs were set to $3$.\n",
"\n",
"Even with these optimizations, the computation of the omega coefficient required several days to complete. The computation was executed on a remote server and the results can be accessed in the form of a pandas dataframe. The results of the computation on the $3$ networks are as follows:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Graph</th>\n",
" <th>Number of Nodes</th>\n",
" <th>Number of Edges</th>\n",
" <th>Average Degree</th>\n",
" <th>Average Clustering Coefficient</th>\n",
" <th>log N</th>\n",
" <th>Average Shortest Path Length</th>\n",
" <th>betweenness centrality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Brightkite Checkins Graph</td>\n",
" <td>6493</td>\n",
" <td>292973</td>\n",
" <td>90.242723</td>\n",
" <td>0.713999</td>\n",
" <td>8.778480</td>\n",
" <td>3.013369</td>\n",
" <td>0.000534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Gowalla Checkins Graph</td>\n",
" <td>3073</td>\n",
" <td>62790</td>\n",
" <td>40.865604</td>\n",
" <td>0.548372</td>\n",
" <td>8.030410</td>\n",
" <td>3.508031</td>\n",
" <td>0.001277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Foursquare Checkins Graph</td>\n",
" <td>2324</td>\n",
" <td>246702</td>\n",
" <td>212.30809</td>\n",
" <td>0.65273</td>\n",
" <td>7.751045</td>\n",
" <td>2.186112</td>\n",
" <td>0.000938</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brightkite Friendship Graph</td>\n",
" <td>5420</td>\n",
" <td>14690</td>\n",
" <td>5.420664</td>\n",
" <td>0.218571</td>\n",
" <td>8.597851</td>\n",
" <td>5.231807</td>\n",
" <td>0.000664</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>(Filtered) Gowalla Friendship Graph</td>\n",
" <td>2294</td>\n",
" <td>5548</td>\n",
" <td>4.836966</td>\n",
" <td>0.234293</td>\n",
" <td>7.738052</td>\n",
" <td>5.396488</td>\n",
" <td>0.001331</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Foursquare Friendship Graph</td>\n",
" <td>1397</td>\n",
" <td>5323</td>\n",
" <td>7.620616</td>\n",
" <td>0.183485</td>\n",
" <td>7.242082</td>\n",
" <td>6.45841</td>\n",
" <td>0.001531</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Graph Number of Nodes Number of Edges \\\n",
"0 Brightkite Checkins Graph 6493 292973 \n",
"1 Gowalla Checkins Graph 3073 62790 \n",
"2 Foursquare Checkins Graph 2324 246702 \n",
"3 Brightkite Friendship Graph 5420 14690 \n",
"4 (Filtered) Gowalla Friendship Graph 2294 5548 \n",
"5 Foursquare Friendship Graph 1397 5323 \n",
"\n",
" Average Degree Average Clustering Coefficient log N \\\n",
"0 90.242723 0.713999 8.778480 \n",
"1 40.865604 0.548372 8.030410 \n",
"2 212.30809 0.65273 7.751045 \n",
"3 5.420664 0.218571 8.597851 \n",
"4 4.836966 0.234293 7.738052 \n",
"5 7.620616 0.183485 7.242082 \n",
"\n",
" Average Shortest Path Length betweenness centrality \n",
"0 3.013369 0.000534 \n",
"1 3.508031 0.001277 \n",
"2 2.186112 0.000938 \n",
"3 5.231807 0.000664 \n",
"4 5.396488 0.001331 \n",
"5 6.45841 0.001531 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analysis_results = pd.read_pickle('analysis_results.pkl')\n",
"analysis_results"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In the repository there is a python program `omega_sampled_server.py` that can be used to compute the omega coefficient for a network as described above. You can run it as follows:\n",
"\n",
"```python\n",
"./omega_sampled_server.py graph k niter nrand\n",
"\n",
"# Example:\n",
"./omega_sampled_server.py gowalla 0.5 3 3\n",
"```\n",
"\n",
"Where: \n",
"\n",
"- `graph` is the name of the graph\n",
"- `k` Percentage of nodes to be remove\n",
"- `niter` Number of rewiring operations per edge\n",
"- `nrand` Number of random graphs to be generated\n",
"\n",
"For further details run `./omega_sampled_server.py --help`\n",
"\n",
"> **NOTE:** This are slow operations, do not try to run them with higher values of k, niter or nrand. The computation of this networks with k=0.5, niter=3 and nrand=3 requires from 3 to 10 days to complete. If you want to test it out, you can use the `gowalla` graph with k=0.1, niter=1 and nrand=1.\n",
"\n",
"The advantage of using an external script rather then a block in the notebook is the ease of parallelization. You can run more scripts in parallel for different datasets. This can easily be automated with a bash script. I won't report the code since it's note relevant to the topic of this project.\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Are our networks small-world?\n",
"\n",
"There are multiple factors to take into consideration. Let's try to recap what we know about the networks we are working with:\n",
"\n",
"- Degree distribution\n",
"- Average clustering coefficient\n",
"- Average shortest path length\n",
"- Betweenness centrality\n",
"- Omega coefficient\n",
"\n",
"## Degree distribution\n",
"\n",
"The degree distribution of a real-world network can characterize the small-world property by showing a balance between the number of highly connected nodes (high degree) and the number of less connected nodes (low degree). A network with a small-world property will have a few highly connected nodes (hubs) and a large number of nodes with a relatively low number of connections. This creates a balance between the number of highly connected nodes and the number of less connected nodes, which allows for efficient information flow and rapid spreading of information throughout the network. Additionally, the degree distribution of a small-world network will typically follow a power-law distribution, with a few highly connected nodes and a large number of less connected nodes, further emphasizing the small-world property.\n",
"\n",
"As we have seen from the sections before, the distribution presented is far form Poissonian, and very close to a power law. However, the degree distribution alone is not enough to state that a real-world network is a social network because it does not take into account the specific relationships and interactions between the nodes in the network. A random network can also have a similar degree distribution, but the relationships between the nodes would be different from those in a social network.\n",
"\n",
"For example, a random network could be generated by randomly connecting nodes together without considering any specific relationships between them. In this case, the degree distribution may be similar to that of a social network, but the relationships between the nodes would be different.\n",
"\n",
"Additionally, to recreate this degree distribution with a random network, you can use the Barabasi-Albert model. This model generates a random network with a power-law degree distribution, which is similar to the degree distribution found in many real-world networks, including social networks. This model simulates the growth process of a network, where new nodes are added to the network and they preferentially connect to the existing nodes that have a high degree, this leads to a power-law degree distribution which is similar to the degree distribution of social networks.\n",
"\n",
"## Betweenness centrality\n",
"\n",
"The betweenness centrality of a node in a network measures the number of times that node acts as a bridge or intermediary between other nodes in the network. In a small-world network, nodes have a high betweenness centrality because they often act as intermediaries between distant nodes, allowing for short paths and efficient communication between distant parts of the network. Therefore, a high degree of betweenness centrality in a network can be used to characterize its small-world propriety.\n",
"\n",
"An high value of betweenness centrality means that the node in question acts as a bridge or intermediary between many other nodes in the network. This means that it connects many different groups of nodes together, making it an important node for communication and information flow in the network.\n",
"\n",
"To determine if the average betweenness centrality of a network is high or not we can compare it with the theoretical values of random networks. As the betweenness centrality is a measure of how much a node is used as a bridge between other nodes, random networks tend to have a low value of betweenness centrality. If the average betweenness centrality of your network is higher than the theoretical values of a random network, it can be considered a high value and therefore the network is more likely to be a small-world network.\n",
"\n",
"Let's test it out with our networks:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating a random graph with the Erdos-Renyi model Brightkite Checkins Graph\n",
"Number of edges in the original graph: 292973\n",
"Number of edges in the random graph: 293689\n",
"Random graph created for Brightkite Checkins Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 1299\n",
"\tNumber of edges after removing 80.0% of nodes: 11902\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.0013599876353935505\n",
"\n",
"Creating a random graph with the Erdos-Renyi model Gowalla Checkins Graph\n",
"Number of edges in the original graph: 62790\n",
"Number of edges in the random graph: 62950\n",
"Random graph created for Gowalla Checkins Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 615\n",
"\tNumber of edges after removing 80.0% of nodes: 2476\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.003776806412685814\n",
"\n",
"Creating a random graph with the Erdos-Renyi model Foursquare Checkins Graph\n",
"Number of edges in the original graph: 246702\n",
"Number of edges in the random graph: 246397\n",
"Random graph created for Foursquare Checkins Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 465\n",
"\tNumber of edges after removing 80.0% of nodes: 9834\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.002003085581852006\n",
"\n",
"Creating a random graph with the Erdos-Renyi model Brightkite Friendship Graph\n",
"Number of edges in the original graph: 14690\n",
"Number of edges in the random graph: 14641\n",
"Random graph created for Brightkite Friendship Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 1084\n",
"\tNumber of edges after removing 80.0% of nodes: 598\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.00017627409365578086\n",
"\n",
"Creating a random graph with the Erdos-Renyi model Gowalla Friendship Graph\n",
"Number of edges in the original graph: 5548\n",
"Number of edges in the random graph: 5463\n",
"Random graph created for Gowalla Friendship Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 459\n",
"\tNumber of edges after removing 80.0% of nodes: 243\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.0001254691117717444\n",
"\n",
"Creating a random graph with the Erdos-Renyi model Foursquare Friendship Graph\n",
"Number of edges in the original graph: 5323\n",
"Number of edges in the random graph: 5412\n",
"Random graph created for Foursquare Friendship Graph Starting computation of betweenness centrality...\n",
"\tNumber of nodes after removing 80.0% of nodes: 280\n",
"\tNumber of edges after removing 80.0% of nodes: 215\n",
"\tBetweenness centrality for Erdos-Renyi random graph: 0.012060305672512683\n",
"\n"
]
}
],
"source": [
"# for each graph, create a new random graphs using the create_random_graphs function. Save in a dictionary the results\n",
"\n",
"erdos_renyi_graphs = {}\n",
"for graph in graphs_all:\n",
" G = create_random_graphs(graph, model='erdos', save = False)\n",
" print(\"Random graph created for \", graph.name, \"Starting computation of betweenness centrality...\")\n",
" betweenness_centrality = np.mean(list(betweenness_centrality_parallel(G, 6, k = 0.8).values()))\n",
" print(\"\\tBetweenness centrality for Erdos-Renyi random graph: \", betweenness_centrality)\n",
" erdos_renyi_graphs[graph.name] = betweenness_centrality\n",
" print(\"\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Brightkite Checkins Graph': 0.0013599876353935505,\n",
" 'Gowalla Checkins Graph': 0.003776806412685814,\n",
" 'Foursquare Checkins Graph': 0.002003085581852006,\n",
" 'Brightkite Friendship Graph': 0.00017627409365578086,\n",
" 'Gowalla Friendship Graph': 0.0001254691117717444,\n",
" 'Foursquare Friendship Graph': 0.012060305672512683}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"erdos_renyi_graphs"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"ename": "IndexError",
"evalue": "index 0 is out of bounds for axis 0 with size 0",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[19], line 10\u001b[0m\n\u001b[1;32m 5\u001b[0m fig \u001b[39m=\u001b[39m go\u001b[39m.\u001b[39mFigure()\n\u001b[1;32m 7\u001b[0m \u001b[39mfor\u001b[39;00m graph \u001b[39min\u001b[39;00m graphs_all:\n\u001b[1;32m 8\u001b[0m fig\u001b[39m.\u001b[39madd_trace(go\u001b[39m.\u001b[39mBar(\n\u001b[1;32m 9\u001b[0m x\u001b[39m=\u001b[39m[graph\u001b[39m.\u001b[39mname],\n\u001b[0;32m---> 10\u001b[0m y\u001b[39m=\u001b[39m[analysis_results\u001b[39m.\u001b[39;49mloc[analysis_results[\u001b[39m'\u001b[39;49m\u001b[39mGraph\u001b[39;49m\u001b[39m'\u001b[39;49m] \u001b[39m==\u001b[39;49m graph\u001b[39m.\u001b[39;49mname, \u001b[39m'\u001b[39;49m\u001b[39mbetweenness centrality\u001b[39;49m\u001b[39m'\u001b[39;49m]\u001b[39m.\u001b[39;49mvalues[\u001b[39m0\u001b[39;49m]],\n\u001b[1;32m 11\u001b[0m name\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mOriginal graph\u001b[39m\u001b[39m'\u001b[39m,\n\u001b[1;32m 12\u001b[0m marker_color\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mrgb(55, 83, 109)\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 13\u001b[0m ))\n\u001b[1;32m 14\u001b[0m fig\u001b[39m.\u001b[39madd_trace(go\u001b[39m.\u001b[39mBar(\n\u001b[1;32m 15\u001b[0m x\u001b[39m=\u001b[39m[graph\u001b[39m.\u001b[39mname],\n\u001b[1;32m 16\u001b[0m y\u001b[39m=\u001b[39m[erdos_renyi_graphs[graph\u001b[39m.\u001b[39mname]],\n\u001b[1;32m 17\u001b[0m name\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mRandom graph\u001b[39m\u001b[39m'\u001b[39m,\n\u001b[1;32m 18\u001b[0m marker_color\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mrgb(26, 118, 255)\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 19\u001b[0m ))\n",
"\u001b[0;31mIndexError\u001b[0m: index 0 is out of bounds for axis 0 with size 0"
]
}
],
"source": [
"# plot with plotly an histogram of the betweenness centrality of the random graphs. For each graph (x-axis) we have two bars: one for the betweenness centrality of the original graph and one for the betweenness centrality of the random graph\n",
"\n",
"import plotly.graph_objects as go\n",
"\n",
"fig = go.Figure()\n",
"\n",
"for graph in graphs_all:\n",
" fig.add_trace(go.Bar(\n",
" x=[graph.name],\n",
" y=[analysis_results.loc[analysis_results['Graph'] == graph.name, 'betweenness centrality'].values[0]],\n",
" name='Original graph',\n",
" marker_color='rgb(55, 83, 109)'\n",
" ))\n",
" fig.add_trace(go.Bar(\n",
" x=[graph.name],\n",
" y=[erdos_renyi_graphs[graph.name]],\n",
" name='Random graph',\n",
" marker_color='rgb(26, 118, 255)'\n",
" ))\n"
]
}
],
"metadata": {
@ -16468,7 +16843,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]"
"version": "3.10.9"
},
"orig_nbformat": 4,
"vscode": {

@ -28,9 +28,8 @@ def random_sample(graph, k):
return G_connected
if __name__ == "__main__":
# use argparse to take as input the name of the graph, the options are "foursquare", "gowalla" and "brightkite"
parser = argparse.ArgumentParser()
parser.add_argument("graph", help="Name of the graph to be used. Options are 'foursquare', 'gowalla' and 'brightkite'")
parser.add_argument("graph", help="Name of the graph to be used. Options are 'checkins-foursquare', 'checkins-gowalla', 'checkins-brightkite', 'friends-foursquare', 'friends-gowalla', 'friends-brightkite'")
parser.add_argument("k", help="Percentage of nodes to be sampled. Needs to be a float between 0 and 1")
parser.add_argument("niter", help="Number of rewiring per edge. Needs to be an integer. Default is 5")
parser.add_argument("nrand", help="Number of random graphs. Needs to be an integer. Default is 5")
@ -46,8 +45,12 @@ if __name__ == "__main__":
print("No input for nrand. Setting it to default value: 5")
args.nrand = 5
# create the graph. G = create_graph_from_checkins('name') where name is the input argument of graph
G = create_graph_from_checkins(str(args.graph))
# the name of the graph is the first part of the input string
name = args.graph.split('-')[1]
if 'checkins' in args.graph:
G = create_graph_from_checkins(name)
elif 'friends' in args.graph:
G = create_friendships_graph(name)
G.name = str(args.graph) + " Checkins Graph"
# sample the graph

@ -501,7 +501,7 @@ def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph:
G : nx.Graph
The original graph.
model : str
The model to use to generate the random graphs. It can be one of the following: "erdos", "barabasi", "watts_strogatz", "newman_watts_strog
The model to use to generate the random graphs. It can be one of the following: "erdos", "watts_strogatz"
save: bool
If True, the random graph is saved in the folder data/random/model
@ -516,8 +516,9 @@ def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph:
if model == "erdos":
G_random = nx.erdos_renyi_graph(G.number_of_nodes(), nx.density(G))
print("\tNumber of edges in the original graph: {}" .format(G.number_of_edges()))
print("\tNumber of edges in the random graph: {}" .format(G_random.number_of_edges()))
print("Creating a random graph with the Erdos-Renyi model {}" .format(G.name))
print("Number of edges in the original graph: {}" .format(G.number_of_edges()))
print("Number of edges in the random graph: {}" .format(G_random.number_of_edges()))
G_random.name = G.name + " Erdos-Renyi"
if save:
@ -534,8 +535,8 @@ def create_random_graphs(G: nx.Graph, model = None, save = True) -> nx.Graph:
p = G.number_of_edges() / (G.number_of_nodes())
avg_degree = int(np.mean([d for n, d in G.degree()]))
G_random = nx.watts_strogatz_graph(G.number_of_nodes(), avg_degree, p)
print("\tNumber of edges in the original graph: {}" .format(G.number_of_edges()))
print("\tNumber of edges in the random graph: {}" .format(G_random.number_of_edges()))
print("Number of edges in the original graph: {}" .format(G.number_of_edges()))
print("Number of edges in the random graph: {}" .format(G_random.number_of_edges()))
G_random.name = G.name + " Watts-Strogatz"
if save:

Loading…
Cancel
Save