Then we can generate the final filtered file `FilmFiltrati.txt` that has only two columns: `nconst` and `primaryName`
Then we can generate the final filtered file `FilmFiltrati.txt` that has only two columns: `tconst` and `primaryName`
---
---
@ -348,7 +348,7 @@ The crucial point of the algorithm is the definition of the lower bounds, that i
What we are changing in this code is that since $L=0$ is never updated, we do not need to definite it. We will just loop over each vertex, in the order the map prefers. We do not need to define `Q` either, as we will loop over each vertex anyway, and the order does not matter.
What we are changing in this code is that since $L=0$ is never updated, we do not need to definite it. We will just loop over each vertex, in the order the map prefers. We do not need to define `Q` either, as we will loop over each vertex anyway, and the order does not matter.
#### Multi-threaded BFS
#### Multi-threaded implementation
We are working on a web-scale graph, multi-threading was a must. At first, we definite a `vector<thread>` and a mutex to prevent simultaneous accesses to the `top_actors` vector. Then preallocate the number of threads we want to use.
We are working on a web-scale graph, multi-threading was a must. At first, we definite a `vector<thread>` and a mutex to prevent simultaneous accesses to the `top_actors` vector. Then preallocate the number of threads we want to use.
top_actors.reserve(k+1);// We need exactly k items, no more and no less.
top_actors.reserve(k+1);// We need exactly k items, no more and no less.
vector<thread>threads;
vector<thread>threads;
mutextop_actors_mutex;// To prevent simultaneous accesses to top_actors
mutextop_actors_mutex;// The threads write to top_actors, so another thread reading top_actors at the same time may find it in an invalid state (if the read happens while the other thread is still writing)
intr=0;// |R|, where R is the set of vertices reachable from our vertex
intr=0;// |R|, where R is the set of vertices reachable from our vertex
longlongintsum_distances=0;// Sum of the distances to other nodes
longlongintsum_distances=0;// Sum of the distances to other nodes
intprev_distance=0;// Previous distance, to see when we get to a deeper level of the BFS
intprev_distance=0;// Previous distance, to see when we get to a deeper level of the BFS
q.push(make_pair(actor_id,0));
q.push(make_pair(actor_id,0));// This vertex, which is at distance 0
enqueued[actor_id]=true;
enqueued[actor_id]=true;
boolskip=false;
boolskip=false;
while(!q.empty()){
while(!q.empty()){
auto[bfs_actor_id,distance]=q.front();
auto[bfs_actor_id,distance]=q.front();// Prendo l'elemento in cima alla coda
q.pop();
q.pop();
// Try to set a lower bound on the farness
// Try to set a lower bound on the farness
if(distance>prev_distance){
if(distance>prev_distance){
constlock_guard<mutex>top_actors_lock(top_actors_mutex);// Acquire ownership of the mutex, wait if another thread already owns it. Release the mutex when destroyed.
top_actors_mutex.lock();// Acquire ownership of the mutex, wait if another thread already owns it
if(top_actors.size()==k){// We are in the first item of the next exploration level
if(top_actors.size()==k){// We are in the first item of the next exploration level
// We assume r = A.size(), the maximum possible value
// We assume r = A.size(), the maximum possible value
// Insert the actor in top_actors, before the first element with farness >= than our actor's (i.e. sorted insert)
constlock_guard<mutex>top_actors_lock(top_actors_mutex);// Acquire ownership of the mutex, wait if another thread already owns it. Release the mutex when destroyed.
top_actors_mutex.lock();// Acquire ownership of the mutex, wait if another thread already owns it
vector<pair<int,double>>harmonic(constsize_tk){// NON RIESCO AD INVERTIRE L'ARGOMENTO DELLA SOMMA
vector<pair<int,double>>harmonic(constsize_tk){//
vector<pair<int,double>>top_actors;// Each pair is (actor_index, harmonic centrality).
vector<pair<int,double>>top_actors;// Each pair is (actor_index, harmonic centrality).
top_actors.reserve(k+1);// We need exactly k items, no more and no less.
top_actors.reserve(k+1);// We need exactly k items, no more and no less.
@ -316,15 +302,16 @@ vector<pair<int, double>> harmonic(const size_t k) { // NON RIESCO AD INVERTIRE
q.pop();
q.pop();
// Try to set an upper bound on the centrality
// Try to set an upper bound on the centrality
if(distance>prev_distance){
if(distance>prev_distance){
constlock_guard<mutex>top_actors_lock(top_actors_mutex);// Acquire ownership of the mutex, wait if another thread already owns it. Release the mutex when destroyed.
top_actors_mutex.lock();// Acquire ownership of the mutex, wait if another thread already owns it
if(top_actors.size()==k){// We are in the first item of the next exploration level
if(top_actors.size()==k){// We are in the first item of the next exploration level
// The adjacent vertices have distance +1 w.r.t. the current vertex
// The adjacent vertices have distance +1 with respect to the current vertex
q.push(make_pair(adj_actor_id,distance+1));
q.push(make_pair(adj_actor_id,distance+1));
enqueued[adj_actor_id]=true;
enqueued[adj_actor_id]=true;
}
}
@ -349,17 +336,16 @@ vector<pair<int, double>> harmonic(const size_t k) { // NON RIESCO AD INVERTIRE
doubleharmonic_centrality=sum_reverse_distances;
doubleharmonic_centrality=sum_reverse_distances;
if(!isfinite(harmonic_centrality))
if(!isfinite(harmonic_centrality))
continue;
continue;
// Insert the actor in top_actors, before the first element with farness >= than our actor's (i.e. sorted insert)
constlock_guard<mutex>top_actors_lock(top_actors_mutex);// Acquire ownership of the mutex, wait if another thread already owns it. Release the mutex when destroyed.
top_actors_mutex.lock();// Acquire ownership of the mutex, wait if another thread already owns it