## Description

In this project, you will implement the following three algorithms and test their performance on the 10 datasets available on MS Teams. Each dataset has three files: “.ts.data”

(training data); “.valid.data” (validation data) and “.test.data” (test data).

## 1. Tree Bayesian networks. (20 points)

Use the Chow-Liu algorithm to learn the

structure and parameters of the Bayesian network. Use 1-Laplace smoothing to

ensure that you don’t have any zeros when computing the mutual information as

well as zero probabilities in the model. See section 2 in [Meila and Jordan, 2001].

An implementation of Chow-Liu tree is provided in the code base provided to you.

## 2. Mixtures of Tree Bayesian networks using EM. (40 points)

The model is

defined as follows. We have one latent variable having k values and each mixture

component is a Tree Bayesian network. Thus, the distribution over the observed

variables, denoted by X (variables in the data) is given by:

P(X = x) = X

k

i=1

piTi(X = x)

where pi

is the probability of the i-th mixture component and Ti

is the distribution represented by the i-th Tree Bayesian network.

Write code to learn the structure and parameters of the model using the EM-algorithm (in the M-step each

mixture component is learned using the Chow-Liu algorithm). Run the EM algorithm until convergence or until 50 iterations whichever is earlier. See section 3 in

[Meila and Jordan, 2001]. Use the following values for k ∈ {2, 5, 10, 20}. Test performance using the “test set.”

In the code provided, see file MIXTURE CLT.py, you have to write two functions

“learn(..)” and “computeLL(…)”

## 3. Mixtures of Tree Bayesian networks using Random Forests. (40 points)

The model is defined as above (see Item (3)). Learn the structure and parameters of the model using the following Random-Forests style approach. Given two

hyper-parameters (k, r), generate k sets of Bootstrap samples and learn the i-th Tree

Bayesian network using the i-th set of the Bootstrap samples by randomly setting

exactly r mutual information scores to 0 (as before use the Chow-Liu algorithm with

r mutual information scores set to 0 to learn the structure and parameters of the

Tree Bayesian network).

Select k and r using the validation set and use 1-Laplace

smoothing. You can either set pi = 1/k for all i or use any reasonable method (reasonable method is extra credit). Describe your (reasonable) method precisely in your

report. Does it improve over the baseline approach that uses pi = 1/k.

## 4. (Extra credit, 10 points)

Describe an algorithm (you don’t have to implement the

algorithm) for learning mixtures of tree Bayesian networks using the gradient Boosting approach. As a starting point, you can refer to a paper by Rosset and Segal

[Rosset and Segal, 2002]. Your description should be formal enough so that a graduate student with knowledge of machine learning and basic math should be able to

implement and understand your algorithm.

Report Test-set Log-Likelihood (LL) score on the 10 datasets available on MS Teams.

For EM and Random Forests (since they are randomized algorithms), choose the hyperparameters (k and r) using the validation set and then run the algorithms 5 times and

report the average and standard deviation. Can you rank the algorithms in terms of accuracy (measured using test set LL) based on your experiments? Comment on why you

think the ranking makes sense.

#### What to turn in:

• Your report in PDF format describing your experimental evaluation.

• Code along with a Readme file on how to compile and run your code.

• Note: Please submit a single zip file containing your report and your code (rar/gzip/tar

and other formats are not allowed).

#### References

[Meila and Jordan, 2001] Meila, M. and Jordan, M. I. (2001). Learning with mixtures of

trees. Journal of Machine Learning Research, 1:1–48.

[Rosset and Segal, 2002] Rosset, S. and Segal, E. (2002). Boosting density estimation.

In Proceedings of the 15th International Conference on Neural Information Processing

Systems, NIPS’02, page 657–664, Cambridge, MA, USA. MIT Press.

2