visualiztion section started
parent
329a1f84bc
commit
7ad5e0a366
Binary file not shown.
After Width: | Height: | Size: 86 KiB |
Binary file not shown.
After Width: | Height: | Size: 517 KiB |
@ -0,0 +1,10 @@
|
||||
dfs = {
|
||||
i: pd.read_csv(f"top_movies_{i:02d}_c.txt", sep='\t', usecols=[1], names=["movie"])
|
||||
for i in [500, 1000, 5000, 10000, 25000, 50000, 75000, 100000]}
|
||||
sets = {i: set(df["movie"]) for i, df in dfs.items()}
|
||||
|
||||
diff = []
|
||||
for i in sets.keys():
|
||||
diff.append([len(sets[i]) - len(sets[i] & sets[j]) for j in sets.keys()])
|
||||
diff = np.array(diff, dtype=float)
|
||||
diff /= len(next(iter(sets.values())))
|
Binary file not shown.
@ -0,0 +1,23 @@
|
||||
\section{Visualization of the graphs}
|
||||
Graphs are fascinating structures, visualizing them can give us a more deep understanding of their proprieties. To do that we need to make some sacrifices. We are dealing with millions of nodes, displaying them all would be impossibile, especially on a web page as I did. \s
|
||||
|
||||
\nd For each case we need to find a small (in the order of 1000) subset of nodes $S \subset V$ that we want to display. It's important to take into consideration, as far as we can, nodes that are "important" in the graph \s
|
||||
|
||||
\nd All this section is implemented in python using the library \texttt{pyvis}. The goal of this library is to build a python based approach to constructing and visualizing network graphs in the same space. A pyvis network can be customized on a per node or per edge basis. Nodes can be given colors, sizes, labels, and other metadata. Each graph can be interacted with, allowing the dragging, hovering, and selection of nodes and edges. Each graph's layout algorithm can be tweaked as well to allow experimentation with rendering of larger graphs. It is designed as a wrapper around the popular Javascript \texttt{visJS} library
|
||||
|
||||
\subsection{Actors Graph}
|
||||
For the actors graph we choose the subset $S$ as the actors with at least 100 movies made in their carrier. We can immediately deduct that this subset will be characterized by actors and actresses of a certain age. But as we have seen, having an high number of movies made it's a good estimator for the closeness centrality. It's important to keep in mind that the graph will only show the relations nodes in this subset. This means that even if an actor has 100 movies made in his carrier, in this graph may have just a few relations. We can see this graph as collaboration network between the most popular actors and actresses. \s
|
||||
|
||||
\nd An interactive version can be found at this web page. It will take a few seconds to render, it's better to use a computer and not a smartphone. \s
|
||||
|
||||
\textsc{Interactive version}: \url{https://lukefleed.xyz/imdb-graph.html}
|
||||
|
||||
\begin{center}
|
||||
\s \nd \qrcode{https://lukefleed.xyz/imdb-graph.html}
|
||||
\end{center}
|
||||
|
||||
\begin{figure}[H] \label{imdb-a-network}
|
||||
\centering
|
||||
\includegraphics[width=13cm]{Screenshot.png}
|
||||
\caption{The collaboration network of the actors and actresses with more that an 100 movies on the IMDb network}
|
||||
\end{figure}
|
Loading…
Reference in New Issue