Project 2 – Social Networks solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

The purpose of this mini-project1
is to learn the importance of sparse linear algebra algorithms to solve fundamental
questions in social network analyses. We will use the coauthor graph from the Householder Meeting and the social
network of friendships from Zachary’s karate club [2]. These two graphs are among the first examples where matrix
methods were used in computational social network analysis.
The Householder XII Meeting
In June, 1993, the UCLA Lake Arrowhead Conference Center in the San Bernardino Mountains, 90 miles east of Los
Angeles, was the site of the Householder XII Symposium on Numerical Linear Algebra, organized by Gene Golub2
and Tony Chan. Nick Trefethen posted a flip chart with Gene Golub’s name in the center. He invited everyone present
to add their name to the chart and draw lines connecting their name with the names of all their coauthors. The diagram
grew denser throughout the week. At the end it was a graph with 104 vertices (or people) and 211 edges. John Gilbert
entered the names and coauthor connections into Matlab, creating an adjacency matrix A. Using Julia, we can start
by retrieving the names and matrix stored in the PARC archive (housegraph.mat), which is available on iCorsi.
using MAT, SparseArrays, Plots, SymRCM
using Graphs, GraphPlot, Cairo, Fontconfig, Compose
include(“drawit.jl”)
file = matread(“housegraph.mat”);
The variable who shows which variables are stored in the data file.
who = collect(keys(file));
julia> who
117-element Vector{String}:
“Sameh”
“Elden”
“nameperm”
“Borges”
“Greenbaum”
“Arioli”
“Chandrasekaran”

and name shows the participants in the order they were added to the list:
name = file[“name”];
julia> name
104-element Vector{String}:
“Golub”
“Wilkinson”
“TChan”
1This document is originally based on a blog from Cleve Moler, who wrote a fantastic blog post about the Lake Arrowhead graph, and John
Gilbert, who initially created the coauthor graph from the 1993 Householder Meeting. You can find more information at http://blogs.
mathworks.com/cleve/2013/06/10/lake-arrowhead-coauthor-graph/. Most of this assignment is derived from this
archived work.
2http://en.wikipedia.org/wiki/Gene_H._Golub
1
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 1: [Left] The coauthor matrix from the meeting (created with spy(A), [Right] The coauthor circle from the
meeting.
“He”
“Varah”
“Kenney”
“Ashby”

A = file[“A”];
julia> size(A)
(104, 104)
Matrix A is 104-by-104 and symmetric. Elements Ai,j and Aj,i are both equal to one if the people associated with
indices i and j are coauthors and equal to zero otherwise. Matrix A is sparse, since most of the attendees are not
coauthors and, thus, their corresponding entries have value 0. In order to check the matrix sparsity – i.e., the fraction
of nonzero entries over the total number of elements – we can use the following command:
sparsity = nnz(A)/prod(size(A));
julia> sparsity
0.0486
As we can observe, only roughly 5% of the entries of matrix A are different from 0 or, in other words, only 5%
of all the possible pairs of attendees actually identify coauthors. In the left panel of Figure 1, we show a graphical
representation of matrix A by means of a spy plot. The latter shows the location of the non-zeros, with Gene Golub
in the first row/column, followed by the other authors in the order in which they appeared on Trefethen’s flip chart.
Most Prolific Author
As the adjacency matrix is loaded from the archive, the most prolific coauthor is already in the first column. It was
not a surprise to the conference participants that the most prolific coauthor is Gene Golub, one of the organizers.
m = findmax(sum(A, dims=1))[2][2];
julia> m
1
julia> name[m]
“Golub”
2
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 2: [Left] The reordered coauthor matrix P APT
from the meeting, [Right] The reordered coauthor circle.
Circular plot of all the Authors
In the right panel of Figure 1, we created a circular plot with Golub in the center, the other authors around the
circumference, and edges connecting the coauthors. If we place the authors around the circumference in the order we
retrieve them from the chart, the edges would cross the circle pretty much randomly, as shown in Figure 1.
m = findmax(sum(A, dims=1))[2][2];
drawit(A, m, name, “authors.png”)
Reorder the Authors
We want to rearrange the authors so that the coauthor connections are as close as possible to the circumference of
the circle. This corresponds to a symmetric permutation of matrix A, aimed at minimizing, or at least reducing, its
bandwidth. In 1969, Elizabeth Cuthill and John McKee described a heuristic for reordering the rows and columns of
a matrix to reduce its bandwidth, now known as the Reverse Cuthill McKee ordering.
r = symrcm(A[2:end, 2:end]);
prcm = vcat(1, r .+ 1);
spy(A[prcm, prcm])
drawit(A[prcm, prcm], m, name, “authors_reorder.png”)
Figure 2 shows the impact of the reordering P on both matrix P APT
and on the circular plot.
Spectral Graph Partitioning and the Laplacian with Julia
In this segment, we will plant an artificial partition in a graph, and then use the eigenvector v2, which is associated
to the second smallest eigenvalue λ2, to find it. The eigenvector v2 (or Fiedler vector) plays an important role in
graph partitioning. Its entries indicate a partitioning of the connectivity graph of the matrix it is associated with. In
practice, this means that all indices corresponding to vector entries larger than zero belong to one set and all indices
corresponding to vector entries smaller than zero belong to the other. The arising partition minimises the number of
edges between the two sets. In the rest of this section, we will walk you through the necessary steps and present a
practical application. If you would like to read more about this topic, a starting reference can be Fiedlers theory of
spectral graph partitioning, by Slininger, available on iCorsi. We will start by generating a dataset. In this example,
we will be a little more ambitious and use a larger number of vertices.
using PrettyTables
using LinearAlgebra
n = 1000;
3
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 3: An almost dense artificial dataset.
To plant an artificial partition in our test dataset, we need to determine the vertices in each of the two groups. To do
this, we randomly generate a permutation of all the vertices and select one set as group 1 and the rest as group 2. The
variable gs controls the size of the first group.
x = shuffle(collect(1:n));
gs = 450;
group1 = x[1:gs];
group2 = x[gs+1:end];
Now, we need to decide on the probabilities of edges within each group and between the two groups. Because we are
planting a partition, the probabilities of edges between the two groups should be much lower than the probability of
edges within each group. Here, we suppose that group 1 is slightly more connected than group 2.
p_group1 = 0.5;
p_group2 = 0.4;
p_between = 0.1;
With these probabilities in hand, we can construct our graph. The last few operations symmetrize the matrix.
A = sparse(zeros(n, n));
A[group1, group1] = rand(gs, gs) .< p_group1;
A[group2, group2] = rand(n-gs, n-gs) .< p_group2;
A[group1, group2] = rand(gs, n-gs) .< p_between;
Next, let us try to visualize the partition with the spy command:
spy(A);
The results are reported in Figure 3. While some might claim to see a partition in this data, we could argue that it
is not particularly evident. Now, let us investigate what the eigenvector v2 tells us about this graph. We will use the
eigen command in Julia to compute all eigenvectors and eigenvalues, and then store them as columns of matrix V
and as diagonal entries of matrix D, respectively.
A = triu(A, 1);
A = A + A’;
deg = sum(A, dims=1);
L = diagm(vec(deg)) – A;
D, V = eigen(L);
D = diagm(D);
julia> D[2,2]
46.469
The second smallest eigenvalue λ2 is greater than 0 if and only if we are considering a connected graph, and its
magnitude gives us an indication on how well-connected the graph is. To visualize our findings, let us plot the
associated eigenvector v2, corresponding to the column V (:, 2) of the matrix defined above.
4
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 4: [left] The second smallest eigenvector, [right] The second smallest sorted eigenvector.
plot(V[:, 2])
The result is shown in the left part of Figure 4. The picture is not very useful in helping us understand how the graph
is connected. However, we can try to sort the entries of v2 and see if we can extract more information. The result is
shown in the right part of Figure 4.
plot(sort(V[:, 2]))
Now, this picture is much more informative than the previous one! We can notice a large gap in the middle of the
sorted values. Interestingly enough, the number of points to the right of the gap is the same as gs, the size of our
planted group. Let us see what happens when we permute the vertices of the graph according to this ordering.
p = sortperm(V[:, 2]);
A = A[p, p];
In Figure 5 we can finally clearly observe our partitioning. In the spectral analysis of a graph, the second smallest
eigenvalue λ2 is very helpful in finding a partitioning that splits the graph into two subgraphs, while simultaneously
minimizing the edges between these two subgraphs.
Figure 5: The results after the partitioning of our artificial dataset.
5
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
1. Solve the following Social Networks problems:
(i) Reading [0 points]: Read section 5.7 on Permutations and ordering strategies, pp. 122-127 from the book A
First Course in Numerical Methods.
1. The Reverse Cuthill McKee Ordering [10 points]: Load matrix ”A SymPosDef.mat” from the dataset. You
can reorder (permute) the matrix using the “Reverse Cuthill McKee Ordering” via the Julia SymRCM function
symrcm(), see SymRCM Julia package documentation for more information. Visualize both the original and
Reverse Cuthill McKee permuted matrix and comment on what you observe. Compute the Cholesky factors
of both the original matrix and of the permuted matrix. Visualize the Cholesky factors and comment on the
number of nonzeros.
2. Sparse Matrix Factorization [20 points]: Let A ∈ R
n×n be symmetric and positive definite, with entries Aij
defined as follows:
Aij =



1, if i = 1 or i = n or j = 1 or j = n and i 6= j
n + i − 1, if i = j
0, otherwise
(1)
Please note that the increasingly larger values on the diagonal are necessary to ensure the positive definiteness
of matrix A. Answer the following questions:
a) Construct matrix A for the case n = 10 and explicitly write down its entries. How many non-zero elements
does it have?
b) We now want to derive a general formula to compute the number of non-zero entries. Show that, for a
given matrix A ∈ R
n×n with this structure, the number of non-zero elements is 5n − 6.
c) Write a function A construct(), which takes as input n and returns, as output, the matrix A defined
in Eq. 1 and its number of non-zero elements nz. Test your function in a script ex3c.jl for n = 10 and
compare your results with those you obtained in point (a). Furthermore, within the same script, visualise
the non-zero structure of matrix A by using the command spy().
d) Using again the spy() command, visualize side by side the original matrix A and the result of the
Cholesky factorization (cholesky() in Julia).
e) Explain why, for n = 100, 000, using cholesky() to solve Ax = b for a given right-hand-side vector
b would be problematic. Are there ways to mitigate this issue?
(ii) Reading [0 points]: Read the introductory section 1 (pp. 1 − 9), and section 3.3 (pp. 28 − 33) on Spectral
Bisection from the report ”Graph Partitioning: A survey” by Ulrich Elsner. This reference will be useful also
for the next assignment, where we will consider various graph partitioning techniques in more detail.
3. Degree Centrality [5 points]: In graph theory and network analysis, centrality refers to indicators which identify the most important vertices within a graph. Applications include identifying the most influential person(s)
in a social network, key infrastructure nodes in the Internet or urban networks, and super spreaders of disease.
Here we are interested in the Degree centrality, which is conceptually simple. It is defined as the number of
links incident upon a node (i.e., the number of vertices that a node has). The degree centrality of a vertex v, for
a given graph G := (V, E) with |V | vertices and |E| edges, is defined as the numbers of edges of vertex v.
Compute the degree centrality for the top 5 authors. Include them in an ordered list, and show the authors, the
their coauthors and the degree centrality.
6
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 6: The social network of a karate club at a US university. Image from M. E. J. Newman and M. Girvan, Phys.
Rev. E 69,026113(2004)
4. The Connectivity of the Coauthors [5 points]: How many coauthors have the authors in common? Think
about a general procedure that allows you to compute the list of common coauthors of two authors and express
it in matrix notation. Use the formula you derived to compute the common coauthors of the pairs (Golub,
Moler), (Golub, Saunders), and (TChan, Demmel). Who are these common coauthors? Report their names.
5. PageRank of the Coauthor Graph [5 points]: Compute the PageRank value (e.g., by using a modified version
of pagerank.jl from Project 1) for all authors and provide a graph of all authors in descending order
according to the PageRank. Include your script in the submission.
6. Zachary’s karate club: social network of friendships between 34 members [40 points]: Figure 6 shows the
social network of a karate club at a US university in the 1970s. At the end of the study, a conflict between club
members arose. As a consequence, the club formally split into two separate organizations (white circles vs.
grey squares). Please use the adjacency matrix karate.txt and answer the following numerical questions.
a) Write a Julia code that ranks the five nodes with the largest degree centrality. What are their degrees?
b) Rank the five nodes with the largest eigenvector centrality.
3 What are their (properly normalized) eigenvector centralities?
c) Are the rankings in (a) and (b) identical? Briefly explain the similarities and differences.
d) Use spectral graph partitioning to find a near-optimal split of the network into two groups of 16 and 18
nodes, respectively. List the nodes belonging to the two groups. How does spectral bisection compare to
the real split observed by Zachary (see Fig. 6)?
Quality of the code and of the report [15 Points]
The highest possible score of each project is 85 points and up to 15 additional points can be awarded based on the
quality of your report and code (maximum possible grade: 100 points). Your report should be a coherent document,
structured according to the template provided on iCorsi. If there are theoretical questions, provide a complete and
detailed answer. All figures must have a caption and must be of sufficient quality (include them either as .eps or .pdf).
If you made a particular choice in your implementation that might be out of the ordinary, clearly explain it in the
3Eigenvector centrality is defined by the principle introduced in Project 1. The eigenvector returned by the PageRank algorithm corresponds
to the weights associated with each node.
7
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
report. The code you submit must be self-contained and executable, and must include the set-up for all the results that
you obtained and listed in your report. It has to be readable and, if particularly complicated, well-commented.
Additional notes and submission details
Summarize your results and experiments for all exercises by writing an extended LaTeX report, by using the template provided on iCorsi (https://www.icorsi.ch/course/view.php?id=14666). Submit your gzipped
archive file (tar, zip, etc.) – named project number lastname firstname.zip/tgz – on iCorsi (see [NC
2022] Project 2 – Submission Social Networks) before the deadline. Submission by email or through any other channel will not be considered. Late submissions will not be graded and will result in a score of 0 points for that project.
You are allowed to discuss all questions with anyone you like, but: (i) your submission must list anyone you discussed
problems with and (ii) you must write up your submission independently. Please remember that plagiarism will result
in a harsh penalization for all involved parties.
In-class assistance
If you experience difficulties in solving the problems above, please join us in class either on Tuesdays (16:30-18:15)
or on Wednesdays (14:30-16:15) in room C1.03.
References
[1] “A First Course on Numerical Methods”, C. Greif, U. Ascher, SIAM books, pp. 122-127.
[2] The social network of a karate club at a US university, M. E. J. Newman and M. Girvan, Phys. Rev. E 69,026113
(2004) pp. 219-229.
8