Suppose you had a graph between sets of nouns in which an arc A -> B indicates that the set of nouns A is a more specific
case of the set of nouns B. Such a graph would allow us to reason about similarities and differences. The graph below is
an example of what this might look like.
Measuring the semantic relatedness of two nouns. Semantic relatedness refers to the degree to which two concepts are
related. Measuring semantic relatedness is a challenging problem. For example, you consider George W. Bush and John
F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It
might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However,
both George W. Bush and Eric Arthur Blair (aka George Orwell) are famous communicators and, therefore, closely
WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use
extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
Given a directed acyclic graph, find the shortest path between any two nodes, the length of the path, and the shortest
Shortest common ancestor. An ancestral path between two vertices (v and w) in a rooted DAG is a directed path
from v to a common ancestor x, together with a directed path from w to the same ancestor x. A shortest ancestral path is
an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest
common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex.
Note also that an ancestral path is a path, but not a directed path.
We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of
vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B.
Additional performance requirements. For full credit, the methods length() and ancestor() should take time
proportional to the number of vertices and edges reachable from the argument vertices (or better). For example, to
compute the shortest common ancestor of v and w in the digraph below, your algorithm should examine only the
highlighted vertices and edges.
Every ancestor of a node v (in the above example) needs to know (1) the length of the shortest path to it from v (2) the
predecessor responsible for that shortest path. If you do a breadth first traversal from v and recorded the best length, you
wouldn’t have to keep changing the length of the best path. Storing the predecessor help you it recreating the best path.
BONUS feature: Outcast detection. Given a list of nouns x1, x2, …, xn, which noun is the least related to the others? To
identify an outcast, compute the sum of the lengths between each noun and every other one:
di = length(xi, x1) + length (xi, x2) + … + length xi, xn)
and return a noun xt for which dt is maximum. Note that because length (xi, xi) = 0, it will not contribute to the sum.
The input will be the number of nodes, number of edges, and each edge (A-> B is represented by A B)
a. Allow the user to select which file he/she will use for input:digraph1.txt, digraph2,txt, or digraph3.txt. The first two
files are provided. digraph3.txt is one you make up.
b. All sets of nodes will be entered by a list of node numbers followed by a negative number.
c. Allow the user to select from the following commands:
i. (10 points) Given two nodes in the graph, find the shortest common ancestor, shortest ancestral path, and
ii. (10 points) Given two subsets of nodes, find the shortest common ancestor, shortest ancestral path, and
iii. (5 points) BONUS: Given a set of nodes find the outcast
Here are the two provided input files and their associated graphs.
ADDITIONAL BONUS (10 points) If you have time, try doing this for actual noun sets. See the instructions on
the assignment page.