CS228 Assignment #4: Exact Inference solved

$29.99

Original Work ?

Download Details:

  • Name: PA4.zip
  • Type: zip
  • Size: 563.82 KB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

1 Overview
In Programming Assignment 1, you implemented a rudimentary inference engine that could correctly answer queries over small networks. Unfortunately, in Programming Assignments 2 and 3, your inference engine got passed over in favor of third-party inference engines. It’s time to bring your inference engine back! We have provided you with the correct code for Assignment 1’s ComputeMarginal and all of the functions that it calls, as well as a genetic inheritance network for a 9-person pedigree from Assignment 2. At your own risk, try nding the marginal of variable 1 in this network by running the following code:
load(‘PA4Sample.mat’, ‘NinePersonPedigree’); ComputeMarginal(1, NinePersonPedigree, []);
As you can see, the brute force inference engine that you implemented in Programming Assignment 1 is unable to handle anything larger than tiny networks; its running time is proportional to the number of entries in the joint distribution over the entire network, which even in this small 9-person example is on the order of 107. Where brute force inference fails, however, other forms of exact inference can still be very ecient. We will explore clique tree message passing in this assignment, and by its end, you will have created an inference engine powerful enough to handle probabilistic queries and nd MAP assignments over the genetic inheritance networks from Programming Assignment 2 and the OCR networks from Programming Assignment 3, respectively.
2 Belief Propagation
Let’s start by taking a look at the belief propagation algorithm, specialized to exact inference over clique trees:
1. Construct a clique tree from a given set of factors Φ. 2. Assign each factor φk ∈ Φ to a clique Cα(k) such that Scope[φk] ⊆ Cα(k). α(k) returns the index of the clique to which φk is assigned. 3. Compute initial potentials ψi(Ci) =Qk:α(k)=i φk. 4. Designate an arbitrary clique as the root, and pass messages upwards from the leaves towards the root clique.
5. Pass messages from the root down towards the leaves. 6. Compute the beliefs for each clique: βi(Ci) = ψi ×Qk∈Ni δk→i
CS228 Programming Assignment #4 2
As an example, consider the following genetic inheritance network:
Figure 1: A pedigree of 6 people and its corresponding Bayesian network.
From the list of factors corresponding to this network, we can create the following clique tree:
3,9
1,3,4
4,10 5,11
6,12 2,5,6
1,7 1,2,3
2,8
Root
Figure 2: One possible clique tree for the genotype network above.
Next, we initialize the clique potentials in this tree by assigning each of the original factors to a clique. Arbitrarily choosing the clique with variables 1, 2, and 3 to be the root, we can start passing messages up from the leaves to the root, and then down from the root to the leaves. In this clique tree, there are 9 cliques, so 2·(9−1) = 16 messages suce to correctly compute all beliefs. The dashed lines show the messages from the leaves to the root, and the dotted lines show the messages from the root to the leaves. Finally, we can use the calibrated clique beliefs to answer probabilistic queries on the original network. In the subsequent sections we will work our way through the steps of this algorithm, implementing two variants: sum-product message passing and max-sum message passing.
CS228 Programming Assignment #4 3
3 Clique Tree Construction
In this and future assignments, a CliqueTree is a struct with the following elds: • cliqueList  This is a struct array of cliques. The representation of a clique is exactly like that of a factor (which you have seen in Assignments 1-3). The .val for each clique i should be initialized to its initial potentials ψi. • edges  This is an adjacency matrix representing edges between cliques in the clique tree. It is an n×n matrix, where n is the number of edges. If clique i and clique j share an edge, then .edges[i,j] and .edges[j,i] are both 1; else, they are 0.
To perform clique tree message passing, we rst need to take in a list of factors and construct an appropriate clique tree. We have provided a CreateCliqueTree function that creates a clique tree using a modied variable elimination procedure, as described in the Clique Trees and VE lecture. CreateCliqueTree relies on the ComputeInitialPotentials function, which you will have to write: • ComputeInitialPotentials.m (10 points)  This function should take in a set of factors (possibly already reduced by evidence, which is handled earlier in CreateCliqueTree) and a clique tree skeleton corresponding to those factors, and returns a cliqueTree with the .val elds correctly lled in. Concretely, it should assign each factor to a clique, and for each clique do a factor product over all factors assigned to it. For debugging, you can look at the sample input clique tree InitPotential.INPUT and the sample output clique tree InitPotential.RESULT in PA4Sample.mat.
The clique tree skeleton that CreateCliqueTree produces satises both family preservation and the running intersection property, so you will not have to worry about writing error-checking / input-validation code.
4 Message Passing in a Clique Tree
4.1 Message Ordering
With our correctly-initialized clique tree in hand, we now need to come up with a correct message passing order. Recall that a clique is ready to transmit a message upstream toward the root after it has received all of the messages from its downstream neighbors, and vice versa; it is ready to transmit a message downstream after it has received its upstream message. (Since we’re working with clique trees, each clique will only receive one upstream message.) More generally, we say that a clique Ci is ready to transmit to its neighbor Cj when Ci has received messages from all of its neighbors except from Cj. In clique tree message passing, each message is passed exactly once. The easiest way to come up with a valid message passing order is to write the following function: • GetNextCliques.m (10 points)  This function looks at all the messages that have been passed so far, and returns indices i and j such that clique Ci is ready to pass a message to its neighbor Cj. Do not return i and j if Ci has already passed a message to Cj. For debugging, you can look at the sample input clique trees GetNextC.INPUTx and the sample output clique trees GetNextC.RESULTx in PA4Sample.mat.
CS228 Programming Assignment #4 4
Note that there are computationally cheaper but slightly more involved methods of obtaining the correct message passing order over the tree. The main computational bottleneck in our message passing scheme does not lie in calculating the message passing ordering, though, so this has little eect in practice.
4.2 Sum-Product Message Passing
We can now begin calibrating the clique tree. Recall that a clique tree is calibrated if all pairs of adjacent cliques are calibrated. Two adjacent cliques Ci and Cj are calibrated if they agree on the marginals over their shared variables, i.e.,: PCi−Si,j βi(Ci) =PCj−Si,j βj(Cj) = µi,j(Si,j) ∀Si,j Write the following function: • CliqueTreeCalibrate.m (20 points)  This function should perform clique tree calibration by running the sum-product message passing algorithm. It takes in an uncalibrated CliqueTree, calibrates it (i.e., setting the .val elds of the cliqueList to the nal beliefs βi(Ci)), and returns it. To avoid numerical underow, normalize each message as it is passed, such that it sums to 1. For consistency with our autograder, do not normalize the initial potentials nor the nal cluster beliefs. This function also takes in an isMax ag that toggles between sum-product and max-sum message passing. For this part, it will always be set to 0. For debugging, you can look at SumProdCalibrate.INPUT and SumProdCalibrate.RESULT in PA4Sample.mat.
Finally, let’s bring together all the steps described above to compute the exact marginals for the variables in our network. • ComputeExactMarginalsBP.m (15 points)  This function should take a network, a set of initial factors, and a vector of evidence and compute the marginal probability distribution for each individual variable in the network. A network is dened by a set of factors. You should be able to extract all the network information that you need from the list of factors. Marginals for every variable should be normalized at the end, since they represent valid probability distributions. As before, this function takes in an isMax ag, which should be set to 0 for now (at this point, you need to write the function for only when isMax=0). For debugging, you can look at ExactMarginal.INPUT and ExactMarginal.RESULT in PA4Sample.mat and use Evidence = [] (the sample case makes this assumption). You may also try dierent values for Evidence if you like.
Congratulations! You have successfully implemented an ecient exact inference algorithm. Now you can use the factors for the genotype network from PA2 and run inference using your own inference engine. You can compare the marginals that you get to the ones from the inference engines that were provided in PA2. To test your inference engine on the genotype network shown in the gure above, run
load(‘PA4Sample.mat’, ‘SixPersonPedigree’); ComputeExactMarginalsBP(SixPersonPedigree, [], 0);.
This 6-variable network is small enough that we can use the brute-force implementation in Assignment 1 to double check the results. Use
CS228 Programming Assignment #4 5
ComputeMarginal(1, SixPersonPedigree, []);
for instance. Play around with observing evidence, and make sure that it works!
4.3 Max-Sum Message Passing
A typical OCR task could be to scan in a hand-written manuscript, decipher the writing, and give others a typed version of the manuscript that they can easily read. In such a scenario, we would want to output the most likely text corresponding to a set of images, instead of calculating marginal probabilities over individual characters. To do this, we will use max-sum message passing to calculate the MAP assignment. Our aim is to output the most probable instantiation of the variables in the network instead of solving the marginals for each of them. As mentioned in lecture, we will be working in logspace for this part of the assignment; the probability of any single assignment in a large enough network, even the MAP assignment, is typically low enough to cause numerical issues if we do not work in log space. Note that this algorithm is nearly identical to sum-product message passing, with the sums replaced by maxima and the products replaced by sums in the denitions. You should therefore have to write relatively little code. Just as we used FactorMarginalization in the previous algorithm, let’s start by writing a function FactorMaxMarginalization: • FactorMaxMarginalization.m (10 points)  Similar to FactorMarginalization (but with sums replaced by maxima), this function takes in a factor and a set of variables to marginalize out. For each assignment to the remaining variables, it nds the maximum factor value over all possible assignments to the marginalized variables. For debugging, in PA4Sample.mat, the sample input factor is FactorMax.INPUT1, the sample input variables are in FactorMax.INPUT2, and the sample output factor is FactorMax.RESULT.
With this, modify CliqueTreeCalibrate to do max-sum message passing when isMax=1: • CliqueTreeCalibrate.m (20 points)  This function should perform clique tree calibration by running the max-sum message passing algorithm when the isMax ag is set to 1. It takes in an uncalibrated CliqueTree, does a log-transform of the values in the factors/cliques, max-calibrates it (i.e., setting the .val elds of the cliqueList to the nal beliefs βi(Ci)), and returns it in log-space. We are working in log-space, so do not normalize each message as it is passed. For consistency with our autograder, do not normalize the initial potentials nor the nal cluster beliefs. This function takes in an isMax ag that toggles between sum-product and max-sum message passing. For this part, it will always be set to 1, but make sure that it still does sum-product message passing correctly when isMax=0. In sum-product message-passing, we accomplished the product part of sum-product by running FactorProduct. You might want to write a similar helper function, FactorSum, that accomplishes the sum part of max-sum message passing. You can use if statements involving the isMax ag to decide when to run FactorProduct and when to run FactorSum. For debugging, you may look at MaxSumCalibrate.INPUT and MaxSumCalibrate.RESULT in PA4Sample.mat.
CS228 Programming Assignment #4 6
Now we are ready to compute the max-marginals for the variables in our network. Modify ComputeExactMarginalsBP.m to support max-marginal calculation: • ComputeExactMarginalsBP.m (5 points)  This function should take a network, a set of initial factors, and a vector of evidence and compute the max-marginal for each individual variable in the network (including variables for which there is evidence). Max-marginals for every variable should not be normalized at the end. Leave the max-marginals in log-space; do not re-exponentiate them. As before, this function takes in an isMax ag, which will be set to 1 now; make sure it still works for computing the (non-max) marginals when it is set to 0. For debugging, you can look at MaxMarginals.INPUT and MaxMarginals.RESULT in PA4Sample.mat.
Finally, write a function that takes the max-marginals and extracts the MAP assignment: • MaxDecoding (10 points)  This function should take a list of max-marginals returned by ComputeExactMarginalsBP and return a vector A, with A(i) the value of ith variable in the MAP assignment. You may assume that the max-marginals passed into this function will be unambiguous (so no backtracking etc. is necessary for decoding). In PA4Sample.mat, the sample input is MaxDecoded.INPUT, and the sample output is MaxDecoded.RESULT.
To test out your MAP estimation pipeline, let’s try it on a sample word from Assignment 3. Run the following commands:
load(‘PA4Sample.mat’, ‘OCRNetworkToRun’); maxMarginals = ComputeExactMarginalsBP(OCRNetworkToRun, [], 1); MAPAssignment = MaxDecoding(maxMarginals); PredictedWord = DecodedMarginalsToChars(MAPAssignment)
You should see the word mornings!
5 Conclusion
Congratulations! You have successfully built an inference engine that can take relatively large networks, such as the OCR network and the genetic inheritance network, and run exact inference on them. Give yourself a pat on the back – in just 4 weeks, you have written end-to-end programs that can do credit scoring, genetic counseling, and OCR. However, the powers of exact inference are limited. Programming Assignment 5 will teach you more about that, and explore approximate inference methods that will let us scale to signicantly larger networks. Stay tuned!