You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
627 lines
21 KiB
Plaintext
627 lines
21 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%reload_ext autoreload\n",
|
|
"\n",
|
|
"import os\n",
|
|
"import zipfile\n",
|
|
"import wget\n",
|
|
"import networkx as nx\n",
|
|
"from main import *\n",
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Discovering the datasets"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To perform our analysis, we will use the following datasets:\n",
|
|
"\n",
|
|
"- **Brightkite**\n",
|
|
"- **Gowalla**\n",
|
|
"- **Foursquare**\n",
|
|
"\n",
|
|
"We can download the datasets using the function `download_dataset` from the `utils` module. It will download the datasets in the `data` folder, organized in sub-folders in the following way:\n",
|
|
"\n",
|
|
"```\n",
|
|
"data/\n",
|
|
"├── brightkite\n",
|
|
"│ ├── loc-brightkite_edges.txt.gz\n",
|
|
"│ ├── loc-brightkite_totalCheckins.txt.gz\n",
|
|
"├── foursquare\n",
|
|
"│ ├── loc-gowalla_edges.txt.gz\n",
|
|
"│ ├── loc-gowalla_totalCheckins.txt.gz\n",
|
|
"└── gowalla\n",
|
|
" ├── dataset_ubicomp2013_checkins.txt\n",
|
|
" ├── dataset_ubicomp2013_tags.txt\n",
|
|
" └── dataset_ubicomp2013_tips.txt\n",
|
|
"```\n",
|
|
"\n",
|
|
"If any of the datasets is already downloaded, it will not be downloaded again. For futher details about the function below, please refer to the `utils` module."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"The brightkite dataset is already downloaded and extracted as .txt file, if you want to download again the .gz file with this function, delete the .txt files in the folder\n",
|
|
"The gowalla dataset is already downloaded and extracted as .txt file, if you want to download again the .gz file with this function, delete the .txt files in the folder\n",
|
|
"The foursquare dataset is already downloaded and extracted as .txt file, if you want to download again the .gz file with this function, delete the .txt files in the folder\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"download_datasets()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's have a deeper look at them.\n",
|
|
"\n",
|
|
"## Brightkite\n",
|
|
"\n",
|
|
"[Brightkite](http://www.brightkite.com/) was once a location-based social networking service provider where users shared their locations by checking-in. The friendship network was collected using their public API. The network was originally directed but the authors of the dataset have constructed a network with undirected edges when there is a friendship in both ways. They also have also collected a total of `4491143` checking of these users over the period of Apr. 2008 - Oct. 2010.\n",
|
|
"\n",
|
|
"Here is an example of check-in information"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>user</th>\n",
|
|
" <th>check-in time</th>\n",
|
|
" <th>latitude</th>\n",
|
|
" <th>longitude</th>\n",
|
|
" <th>location_id</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-17T01:48:53Z</td>\n",
|
|
" <td>39.747652</td>\n",
|
|
" <td>-104.992510</td>\n",
|
|
" <td>88c46bf20db295831bd2d1718ad7e6f5</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-16T06:02:04Z</td>\n",
|
|
" <td>39.891383</td>\n",
|
|
" <td>-105.070814</td>\n",
|
|
" <td>7a0f88982aa015062b95e3b4843f9ca2</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-16T03:48:54Z</td>\n",
|
|
" <td>39.891077</td>\n",
|
|
" <td>-105.068532</td>\n",
|
|
" <td>dd7cd3d264c2d063832db506fba8bf79</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-14T18:25:51Z</td>\n",
|
|
" <td>39.750469</td>\n",
|
|
" <td>-104.999073</td>\n",
|
|
" <td>9848afcc62e500a01cf6fbf24b797732f8963683</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-14T00:21:47Z</td>\n",
|
|
" <td>39.752713</td>\n",
|
|
" <td>-104.996337</td>\n",
|
|
" <td>2ef143e12038c870038df53e0478cefc</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" user check-in time latitude longitude \\\n",
|
|
"0 0 2010-10-17T01:48:53Z 39.747652 -104.992510 \n",
|
|
"1 0 2010-10-16T06:02:04Z 39.891383 -105.070814 \n",
|
|
"2 0 2010-10-16T03:48:54Z 39.891077 -105.068532 \n",
|
|
"3 0 2010-10-14T18:25:51Z 39.750469 -104.999073 \n",
|
|
"4 0 2010-10-14T00:21:47Z 39.752713 -104.996337 \n",
|
|
"\n",
|
|
" location_id \n",
|
|
"0 88c46bf20db295831bd2d1718ad7e6f5 \n",
|
|
"1 7a0f88982aa015062b95e3b4843f9ca2 \n",
|
|
"2 dd7cd3d264c2d063832db506fba8bf79 \n",
|
|
"3 9848afcc62e500a01cf6fbf24b797732f8963683 \n",
|
|
"4 2ef143e12038c870038df53e0478cefc "
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"brightkite_path = os.path.join(\"data\", \"brightkite\", \"loc-brightkite_totalCheckins.txt\")\n",
|
|
"Brightkite_df = pd.read_csv(brightkite_path, sep=\"\\t\", header=None, names=[\"user\", \"check-in time\", \"latitude\", \"longitude\", \"location_id\"])\n",
|
|
"\n",
|
|
"Brightkite_df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Gowalla\n",
|
|
"\n",
|
|
"Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and was collected using their public API. The authors have collected a total of `6442890` check-ins of these users over the period of Feb. 2009 - Oct. 2010.\n",
|
|
"\n",
|
|
"Here is an example of check-in information"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>user</th>\n",
|
|
" <th>check-in time</th>\n",
|
|
" <th>latitude</th>\n",
|
|
" <th>longitude</th>\n",
|
|
" <th>location_id</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-19T23:55:27Z</td>\n",
|
|
" <td>30.235909</td>\n",
|
|
" <td>-97.795140</td>\n",
|
|
" <td>22847</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-18T22:17:43Z</td>\n",
|
|
" <td>30.269103</td>\n",
|
|
" <td>-97.749395</td>\n",
|
|
" <td>420315</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-17T23:42:03Z</td>\n",
|
|
" <td>30.255731</td>\n",
|
|
" <td>-97.763386</td>\n",
|
|
" <td>316637</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-17T19:26:05Z</td>\n",
|
|
" <td>30.263418</td>\n",
|
|
" <td>-97.757597</td>\n",
|
|
" <td>16516</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>2010-10-16T18:50:42Z</td>\n",
|
|
" <td>30.274292</td>\n",
|
|
" <td>-97.740523</td>\n",
|
|
" <td>5535878</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" user check-in time latitude longitude location_id\n",
|
|
"0 0 2010-10-19T23:55:27Z 30.235909 -97.795140 22847\n",
|
|
"1 0 2010-10-18T22:17:43Z 30.269103 -97.749395 420315\n",
|
|
"2 0 2010-10-17T23:42:03Z 30.255731 -97.763386 316637\n",
|
|
"3 0 2010-10-17T19:26:05Z 30.263418 -97.757597 16516\n",
|
|
"4 0 2010-10-16T18:50:42Z 30.274292 -97.740523 5535878"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"gowalla_path = os.path.join(\"data\", \"gowalla\", \"loc-gowalla_totalCheckins.txt\")\n",
|
|
"\n",
|
|
"Gowalla_df = pd.read_csv(gowalla_path, sep=\"\\t\", header=None, names=[\"user\", \"check-in time\", \"latitude\", \"longitude\", \"location_id\"])\n",
|
|
"\n",
|
|
"Gowalla_df.head() "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Foursquare\n",
|
|
"\n",
|
|
"[Foursquare](https://foursquare.com/) is a location-based social networking website where users share their locations by checking-in. This dataset includes long-term (about 10 months) check-in data in New York city and Tokyo collected from Foursquare from 12 April 2012 to 16 February 2013. It contains two files in tsv format. Each file contains 8 columns, which are:\n",
|
|
"\n",
|
|
"1. User ID (anonymized)\n",
|
|
"2. Venue ID (Foursquare)\n",
|
|
"3. Venue category ID (Foursquare)\n",
|
|
"4. Venue category name (Foursquare)\n",
|
|
"5. Latitude\n",
|
|
"6. Longitude\n",
|
|
"7. Timezone offset in minutes (The offset in minutes between when this check-in occurred and the same time in UTC)\n",
|
|
"8. UTC time\n",
|
|
"\n",
|
|
"Here is an example of check-in information from the New York dataset:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>UserID</th>\n",
|
|
" <th>VenueID</th>\n",
|
|
" <th>CategoryID</th>\n",
|
|
" <th>CategoryName</th>\n",
|
|
" <th>Latitude</th>\n",
|
|
" <th>Longitude</th>\n",
|
|
" <th>Timezone offset in minutes</th>\n",
|
|
" <th>UTC time</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>470</td>\n",
|
|
" <td>49bbd6c0f964a520f4531fe3</td>\n",
|
|
" <td>4bf58dd8d48988d127951735</td>\n",
|
|
" <td>Arts & Crafts Store</td>\n",
|
|
" <td>40.719810</td>\n",
|
|
" <td>-74.002581</td>\n",
|
|
" <td>-240</td>\n",
|
|
" <td>Tue Apr 03 18:00:09 +0000 2012</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>979</td>\n",
|
|
" <td>4a43c0aef964a520c6a61fe3</td>\n",
|
|
" <td>4bf58dd8d48988d1df941735</td>\n",
|
|
" <td>Bridge</td>\n",
|
|
" <td>40.606800</td>\n",
|
|
" <td>-74.044170</td>\n",
|
|
" <td>-240</td>\n",
|
|
" <td>Tue Apr 03 18:00:25 +0000 2012</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>69</td>\n",
|
|
" <td>4c5cc7b485a1e21e00d35711</td>\n",
|
|
" <td>4bf58dd8d48988d103941735</td>\n",
|
|
" <td>Home (private)</td>\n",
|
|
" <td>40.716162</td>\n",
|
|
" <td>-73.883070</td>\n",
|
|
" <td>-240</td>\n",
|
|
" <td>Tue Apr 03 18:02:24 +0000 2012</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>395</td>\n",
|
|
" <td>4bc7086715a7ef3bef9878da</td>\n",
|
|
" <td>4bf58dd8d48988d104941735</td>\n",
|
|
" <td>Medical Center</td>\n",
|
|
" <td>40.745164</td>\n",
|
|
" <td>-73.982519</td>\n",
|
|
" <td>-240</td>\n",
|
|
" <td>Tue Apr 03 18:02:41 +0000 2012</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>87</td>\n",
|
|
" <td>4cf2c5321d18a143951b5cec</td>\n",
|
|
" <td>4bf58dd8d48988d1cb941735</td>\n",
|
|
" <td>Food Truck</td>\n",
|
|
" <td>40.740104</td>\n",
|
|
" <td>-73.989658</td>\n",
|
|
" <td>-240</td>\n",
|
|
" <td>Tue Apr 03 18:03:00 +0000 2012</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" UserID VenueID CategoryID \\\n",
|
|
"0 470 49bbd6c0f964a520f4531fe3 4bf58dd8d48988d127951735 \n",
|
|
"1 979 4a43c0aef964a520c6a61fe3 4bf58dd8d48988d1df941735 \n",
|
|
"2 69 4c5cc7b485a1e21e00d35711 4bf58dd8d48988d103941735 \n",
|
|
"3 395 4bc7086715a7ef3bef9878da 4bf58dd8d48988d104941735 \n",
|
|
"4 87 4cf2c5321d18a143951b5cec 4bf58dd8d48988d1cb941735 \n",
|
|
"\n",
|
|
" CategoryName Latitude Longitude Timezone offset in minutes \\\n",
|
|
"0 Arts & Crafts Store 40.719810 -74.002581 -240 \n",
|
|
"1 Bridge 40.606800 -74.044170 -240 \n",
|
|
"2 Home (private) 40.716162 -73.883070 -240 \n",
|
|
"3 Medical Center 40.745164 -73.982519 -240 \n",
|
|
"4 Food Truck 40.740104 -73.989658 -240 \n",
|
|
"\n",
|
|
" UTC time \n",
|
|
"0 Tue Apr 03 18:00:09 +0000 2012 \n",
|
|
"1 Tue Apr 03 18:00:25 +0000 2012 \n",
|
|
"2 Tue Apr 03 18:02:24 +0000 2012 \n",
|
|
"3 Tue Apr 03 18:02:41 +0000 2012 \n",
|
|
"4 Tue Apr 03 18:03:00 +0000 2012 "
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"foursquare_NYC_path = ny = os.path.join(\"data\", \"foursquare\", \"dataset_TSMC2014_NYC.txt\")\n",
|
|
"foursquare_TKY_path = ny = os.path.join(\"data\", \"foursquare\", \"dataset_TSMC2014_TKY.txt\")\n",
|
|
"\n",
|
|
"foursquare_NYC_df = pd.read_csv(foursquare_NYC_path, sep=\"\\t\", header=None, names=[\"UserID\", \"VenueID\", \"CategoryID\", \"CategoryName\", \"Latitude\", \"Longitude\", \"Timezone offset in minutes\", \"UTC time\"], encoding=\"utf-8\", encoding_errors=\"ignore\")\n",
|
|
"\n",
|
|
"foursquare_NYC_df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# remove from memory, they were created only for aesthetic purposes in the notebook\n",
|
|
"\n",
|
|
"del Brightkite_df\n",
|
|
"del Gowalla_df\n",
|
|
"del foursquare_NYC_df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Building the networks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We are asked to construct the networks for the three datasets as un undirected grah $M = (V, E)$, where $V$ is the set of nodes and $E$ is the set of edges. The nodes represent the users and the edges indicates that two individuals visited the same location at least once.\n",
|
|
"\n",
|
|
"We can use the fucntion create_graph from the `utils` module to create the networks. It takes as input the path to an edge list file and returns a networkx graph object. For further details about the function below, please refer to the `utils` module."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"Brightkite_G = friendships_graph(\"brightkite\")\n",
|
|
"Gowalla_G = friendships_graph(\"gowalla\")\n",
|
|
"Foursquare_G = foursquare_checkins_graph(\"NYC\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can have a look at the number of nodes and edges in each network."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>dataset</th>\n",
|
|
" <th>nodes</th>\n",
|
|
" <th>edges</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>brightkite</td>\n",
|
|
" <td>58228</td>\n",
|
|
" <td>214078</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>gowalla</td>\n",
|
|
" <td>196591</td>\n",
|
|
" <td>950327</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>foursquare</td>\n",
|
|
" <td>1083</td>\n",
|
|
" <td>282405</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" dataset nodes edges\n",
|
|
"0 brightkite 58228 214078\n",
|
|
"1 gowalla 196591 950327\n",
|
|
"2 foursquare 1083 282405"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"dataset = [\"brightkite\", \"gowalla\", \"foursquare\"]\n",
|
|
"nodes = [len(Brightkite_G.nodes()), len(Gowalla_G.nodes()), len(Foursquare_G.nodes())]\n",
|
|
"edges = [len(Brightkite_G.edges()), len(Gowalla_G.edges()), len(Foursquare_G.edges())]\n",
|
|
"\n",
|
|
"df = pd.DataFrame({\"dataset\": dataset, \"nodes\": nodes, \"edges\": edges})\n",
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see, the foursquare dataset has a very small number of nodes. Even tho it has 227428 check-ins, the unique users (the nodes) are only 1083. The Tokyo dataset is about 2 times bigger, with 537703 check-ins and 2294 nodes. Since we are in the same order of magnitude, we will focus on the New York dataset, in the style of a classic Hollywood movie about aliens invasions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Analysis of the structure of the networks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3.10.6 64-bit",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
},
|
|
"orig_nbformat": 4,
|
|
"vscode": {
|
|
"interpreter": {
|
|
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|