MIE 451/1513 Lab and Assignment 5: Social Network Analysis solution




Rate this product

1 The Twitter Data
We will work on the data of tweets collected in one month of 2009.
The data is stored in a csv files. Each tweet is a line with three fields: time, user and text of
the tweet. Here is a snippet:
2009-06-11 16:59:45, ibbored, amberback #squarespace does? Hot damn. Now I want
to win more.
Twitter users abbreviate topics with hash-tags (#squarespace) and mention others with @ sign
(answerback). In this assignment, you will need to extract records like user names and hash-tags
from the text, and analyze different patterns of communication in the given dataset.
2 In the Introductory lab
In this introductory lab, we will build a mention graph for a given hash-tag, and explore different
ways to visualize graphs, and incorporate additional information.
We will cover graph creation, manipulation, and analysis using NetworkX, and visualization
using Plotly.
3 Main Assignment (Assessed in Code Review Session)
Q1. Choose a hash-tag
You are required to choose a hash-tag to perform the social network analysis on. The chosen hashtag should be unique and not shared with the any of your classmates. In order to keep track on the
hash-tags being used by the other students, please refer to the discussion board on Piazza. Once
you choose a hash-tag, please make sure it is still available and then post your chosen hash-tag on
the discussion board.
• The first one to post a hash-tag will be the one to perform the analysis on this hash-tag.
• It is your responsibility to make sure the hash-tag is still available when you post it to Piazza.
• Make sure the hash-tag will be suitable for the analysis required in this assignment.
• If you want to change the hash-tag, please post another message declaring the old hash-tag
(now available), and the new one (now allocated to you). As before, it is your responsibility
to make sure the new hash-tag is available.
Q2. Build a Mention Graph
In this question you are required to build a mention graph for your chosen hash-tag. The mention
graph is the mention relations between users. In this graph, each user is viewed as a node. If a
user Alice mentions another user Bob in her tweet(s) with the @ sign, then an undirected edge
connects Alice and Bob. The edge weight is the number of mentions in the tweets.
(a) How many nodes and how many edges in your mention graph?
(b) Build a histogram of the graph nodes’ degree (i.e., the degree distribution of the graph).
What can you learn from the degree distribution?
(c) Provide a list of top 5 edges with highest weights (edges are identified be the two nodes they
connects, e.g., hnode1, node2i).
(d) Provide a visualization of the mention graph in which the edge color reflects its weight (i.e.,
the number of mentions).
Q3. Content Analysis
In the this question you are asked to perform a basic content analysis on your chosen hash-tag.
(a) Analyze the most common words in all the tweets with the chosen hash-tag, and provide a
basic description of the main themes.
(b) In the visualization of the mention graph, add hover information for the nodes, describing
the 3 most common words for this user. Add any other hover information that may help you
understand the social network better.
Q4. Centrality Analysis
In this question, your need to analyze centrality of users in the mention graph.
Here is a list of networkx functions that calculate the different centrality measures:
Note that PageRank lives in a different place in networkx:
https://networkx.github.io/documentation/stable/reference/algorithms/link analysis.html
(a) Choose two centrality measures and calculate the centrality of the nodes on your graph based
on each of the measures.
(b) Provide a visualization that demonstrates the centrality of each node using a visual property
(size, color, etc) for each of the centrality measures.
(c) Identify the key players in the mentioned graph based on the centrality measures.
1. Are the results similar or different? Explain what can be the reason for the observed
similarity or difference.
2. What centrality measure produced a more meaningful interpretation? Why?
Q5. Connectivity Patterns
In this question, you will analyze the cliques in a graph. A clique in an undirected graph is a subset
of the nodes, such that every two different nodes are adjacent (directly connected with an edge).
Here is a list of networkx functions that calculate the different measures related to cliques in
graph: https://networkx.github.io/documentation/stable/reference/algorithms/clique.html
(a) Calculate two or more of the following measures to analyze cliques in your mention graph:
1. Number of maximal cliques in your graph.
2. The graph’s clique number (size of the largest clique in the graph).
3. Number of maximal cliques for each node
4. Size of the largest maximal clique containing each given node.
(b) Provide some insights on the connectivity patterns of your mention graph based on the information calculated in (a). What was the largest clique? How large was it? What else do
your clique properties and cliques themselves tell you about the social network?
4 Helpful Links
4.1 NetworkX Documentation
4.2 Plotly