minor changes

main
Luca Lombardo 3 years ago
parent 4153d47bfa
commit 7ce8e7162d

@ -22,9 +22,10 @@ In this section we are going to analyze the performance of the algorithm in func
\centering
\includegraphics[width=12cm]{actors_time.png}
\caption{\emph{CPU time} in relation to the \texttt{MIN\textunderscore ACTORS} variable}
\label{fig:actors_time}
\end{figure}
\nd In the analysis it's only taken into consideration the \emph{CPU time} (divided by the number of threads). However, \emph{system time} is in the order of a few minutes in the worst case.
\nd In figure \ref{fig:actors_time} it's only taken into consideration the \emph{CPU time} (divided by the number of threads). However, \emph{system time} is in the order of a few minutes in the worst case.
@ -37,19 +38,20 @@ We want to analyze how truthful our results are while varying \texttt{MIN\textun
\lstinputlisting[language=c++]{code/closeness_analysis.py}
\nd Visualizing it we obtain this
\nd Visualizing it we obtain the matrix \ref{fig:matrix-a}
\begin{figure}[h!] \label{matrix-a}
\begin{figure}[h!]
\centering
\includegraphics[width=11.5cm]{Figure_1.png}
\caption{Discrepancy of the results on the actors graph in function of the minimum number of movies required to be considered as a node}
\label{fig:matrix-a}
\end{figure}
\nd As expected, the matrix is symmetrical and the elements on the diagonal are all equal to zero. We can see clearly that with a lower value of \texttt{MIN\textunderscore ACTORS} the results are more precise. The discrepancy with \texttt{MIN\textunderscore ACTORS=10} is 14\% while being 39\% when \texttt{MIN\textunderscore ACTORS=70}. \s
\nd As expected, the matrix \ref{fig:matrix-a} is symmetrical and the elements on the diagonal are all equal to zero. We can see clearly that with a lower value of \texttt{MIN\textunderscore ACTORS} the results are more precise. The discrepancy with \texttt{MIN\textunderscore ACTORS=10} is 14\% while being 39\% when \texttt{MIN\textunderscore ACTORS=70}. \s
\nd This is what we obtain confronting the top-k results when $k=100$. It's interesting to se how much the discrepancy changes with different values of $k$. However, choosing a lower value for $k$ would not be useful for this type of analysis. Since we are looking at the not common elements of two lists, with a small length, we would get results biased by statistical straggling. \s
\textsc{Da fare: test con k=500 e k=1000}
\textsc{Per fare i test con altri valori di k ho bisogno di un server. Da decidere}
\s
\newpage
@ -60,12 +62,13 @@ As seen during the analysis of the actors graph in \ref{actors-graph}, varying t
\subsubsection{Time of execution}
As seen in \ref{time-actors} we are going to analyze the performance of the algorithm in function of different values of \texttt{VOTES}. Low values of this variable will lead to and exponential growth of the cardinality of the nodes and edges set cardinality. And as we know, with a bigger graph there are more operations to do.
As seen in \ref{time-actors} we are going to analyze the performance of the algorithm in function of different values of \texttt{VOTES}. Low values of this variable will lead to and exponential growth of the cardinality of the nodes and edges set cardinality. And as we know, with a bigger graph there are more operations to do. The results are shown in figure \ref{fig:moves_time}
\begin{figure}[h!]
\centering
\includegraphics[width=12cm]{movies_time.png}
\caption{\emph{CPU time} in relation to the \texttt{VOTES} variable}
\label{fig:moves_time}
\end{figure}
\newpage
@ -73,16 +76,17 @@ As seen in \ref{time-actors} we are going to analyze the performance of the algo
\subsubsection{Discrepancy of the results}
All the observations made before are still valid for this case, I won't repeat them for shortness. As done before (\ref{matrix-a}), we are going to use a matrix to visualize and analyze the results
All the observations made before are still valid for this case, I won't repeat them for shortness. As done before (\ref{fig:matrix-a}), we are going to use a matrix to visualize and analyze the results
\s
% \lstinputlisting[language=c++]{code/closeness_analysis_2.py}
\nd Giving us:
\begin{figure}[H] \label{matrix-b}
\nd Giving us the matrix in figure \ref{fig:matrix-b}:
\begin{figure}[H]
\centering
\includegraphics[width=13cm]{Figure_2.png}
\caption{Discrepancy of the results on the movie graph in function of the minimum number of votes required to be considered as a node}
\label{fig:matrix-b}
\end{figure}
\newpage
\lstinputlisting[language=c++]{code/closeness_analysis_2.py}

@ -20,6 +20,7 @@ For the second one we do the opposite thing: we define an undirected graph $G=(V
\item the non oriented edges in $E$ links two movies if they have an actor or actress in common.
\end{itemize}
\clearpage
\subsection{The Problem}
Since we are dealing with a web-scale network any brute force algorithm would require years to end. The main difficulty here is caused by the computation of distance $d(v,w)$ in \eqref{closeness}. This is a well know problem known as \emph{All Pairs Shortest Paths or APSP problem}. \s

Binary file not shown.

@ -16,13 +16,15 @@ For the actors graph, we take the subset $S$ as the actors and actresses with at
\s \nd \qrcode{https://lukefleed.xyz/imdb-graph.html}
\end{center}
\begin{figure}[H] \label{imdb-a-network}
\begin{figure}[H]
\centering
\includegraphics[width=13cm]{Screenshot.png}
\includegraphics[width=12cm]{Screenshot.png}
\caption{\emph{The collaboration network of the actors and actresses with more that an 100 movies on the IMDb network}}
\label{fig:imdb-a-network}
\end{figure}
The result obtained is extremely interesting. We can clearly see how this graph it's characterized by different (and some times isolated) communities. The nodes in them are all actors and actresses of the same nationality. There are some very big clusters as the \emph{Bollywood}'s one that are almost isolated. Due to cultural and linguistic differences those actors never collaborated with anyone outside their country. \s
The result obtained is extremely interesting as shown in \ref{fig:imdb-a-network}. We can clearly see how this graph it's characterized by different (and some times isolated) communities. The nodes in them are all actors and actresses of the same nationality. There are some very big clusters as the \emph{Bollywood}'s one that are almost isolated. Due to cultural and linguistic differences those actors never collaborated with anyone outside their country. \s
A visual analysis of this graph can reflects some of the proprieties that we saw during the analysis of the results. Let's take the biggest cluster, the Bollywood one. Even if it's very dense and the nodes have a lot of links, none of them ever appeared in out top-k results during the testing. This happens due to the proprieties of closeness centrality, the one that we are taking into consideration. It can be seen as the ability of a node to transport information efficiently into the graph. But the Bollywood's nodes are efficient in transporting information only in their communities since they don't collaborate with nodes of other clusters. \s
@ -39,12 +41,13 @@ The methodology used for this graph is basically the same of \ref{actors-graph-v
\s
\begin{figure}[H] \label{imdb-m-network}
\begin{figure}[H]
\centering
\includegraphics[width=13cm]{movie-graph.png}
\includegraphics[width=12cm]{movie-graph.png}
\caption{\emph{The network of the movies with more that an 500 votes on the IMDb database}}
\label{fig:imdb-m-network}
\end{figure}
Even if at a first sight it may seem completely different from the previous one, it is not. As we can see, there are no evident communities. But some areas are more dense than other. If we zoom in in one of those areas we can see that the movies are often related. If there is a saga of popular movies, they will be very close in this graph. It's easy to find some big neighborhoods as the MCU (Marvel Cinematic Universe) one. \s
Even if at a first sight the graph in \ref{fig:imdb-m-network} may seem completely different from the previous one, it is not. As we can see, there are no evident communities. But some areas are more dense than other. If we zoom in in one of those areas we can see that the movies are often related. If there is a saga of popular movies, they will be very close in this graph. It's easy to find some big neighborhoods such as the MCU (Marvel Cinematic Universe) one. \s
\nd Since we are considering about the top thousand most popular nodes, those movies are mostly from the Hollywood scene. So it makes sense that there are not isolated clusters.

Loading…
Cancel
Save