Description

5/5 - (1 vote)

Diffusion models have become one of the most exciting ideas in modern generative AI,
and at their core is a beautiful interplay between randomness and structure described by
stochastic differential equations (SDEs). In this assignment, we dive into that world: from
exploring classical processes like Brownian motion and the Ornstein–Uhlenbeck dynamics,
to simulating trajectories with Euler–Maruyama, to building neural networks that learn
how data is gradually corrupted and then reconstructed. By moving from simple 1D SDEs
all the way to UNet-based image models, we get to experience how mathematical theory,
numerical simulation, and deep learning come together to form the foundation of today’s
powerful diffusion and flow-based generative models.
Your tasks are to
• Get familiar with SDEs
• Set up a training/inference pipeline for generative models
• Sample from a 2D distribution using a flow model
• Define a UNet architecture for image generation
• Generate images from a distribution of your choice
1. The theoretical basis for diffusion models is stochastic differential equations (SDE). An
SDE is a differential equation symbolically of the form
dXt = ut(Xt)dt + σtdWt, X0 = x0, (1)
where Xt is the trajectory, ut is the vector field or drift coefficient, σt is the diffusion
coefficient determining the amount of noise, and Wt is a Brownian motion. The term
σtdWt is what gives rise to the stochasticity (randomness) in the system – without it
the system reduces to an ordinary differential equation (ODE) dXt
dt = ut(Xt).
(a) Different choices of ut yield different stochastic processes that can be used to model (1p)
many different physical phenomena. The Ornstein-Uhlenbeck (OU) process is given
by ut(Xt) = −θXt, σt = σ, for some constants θ, σ. It can be used to model massive
particles under the influence of friction. Show that the OU process is equivalent to
Langevin dynamics defined by the drift coefficient
ut(Xt) = 1
2
σ
2
t ∇ log p(Xt), σt = σ,
when p(Xt) = N (0,
σ
2
2θ
).
1
(b) Being able to simulate SDEs will be needed to sample new data points from gener- (1p)
ative models. The simplest solver for SDEs is the Euler-Maruyama method defined
as
Xt+h = Xt + hut(Xt) + √
hσtϵt, ϵt ∼ N (0, I), (2)
where h is the step size. This reduces to the Euler method for ODEs when σ = 0.
Implement the Euler-Maruyama method.
(c) Simulate 10 Brownian motions defined by σt = σ, ut = 0, X0 = 0. What happens (1p)
for different values of σ?
(d) Simulate the OU process defined above. Run it with a variety of different initial (2p)
points X0. Pay attention to the ratio σ
2
2θ
, and comment on the convergence behavior
of the solutions. Are they approaching a particular point or a distribution?
2. A diffusion model is an SDE where the vector field ut is parametrized by a learnable
neural network u
θ
t
. If there is no diffusion term, and the SDE is reduced to an ODE,
we get a so-called flow model.
The network u
θ
t
is trained on data points X1 ∼ pdata that have been corrupted with
varying amounts of noise corresponding to different time steps Xt, t ∈ [0, 1], of the
SDE/ODE. By simulating the SDE forwards from t = 0 to t = 1 with our trained
vector field, starting with pure noise X0 ∼ pnoise, we can generate new data samples
from pdata.
In a common type of generative model, called denoising diffusion model, the conditional
path (which describes how a data point z is corrupted) is given by
xt = αtz + βtϵ, ϵ ∼ N (0, I)
where αt, βt are continuously differentiable and monotonic noise schedulers with α0 =
β1 = 0 and α1 = β0 = 1. This implies xt ∼ pt(· | z) = N (αtz, β2
t
I).
(a) Implement linear noise schedulers αt = t, βt = 1 − t. (1p)
(b) Implement the conditional path that given a data point z and a time t returns the (1p)
corrupted data point xt = αtz + βtϵ.
(c) Define a 2D toy distribution (e.g. a Gaussian mixture) pdata. Simulate the corrup- (1p)
tion process on 1000 samples from pdata for t = 0, 0.25, 0.50, 0.75, 1, and plot one
2D histogram per time point.
(d) Modify the noise schedulers so that they are non-linear (while still satisfying the (1p)
requirements stated above). Then, generate a plot analogous to panel (c) using the
same data points z corrupted under this new schedule, and discuss how the change
in scheduling influences the corruption process.
3. There are many ways to train generative models. In this case, we will want to minimize
the conditional flow matching loss given by
L(θ) = Et∼U[0,1],z∼pdata,x∼pt(·|z)
[

u
θ
t
(x) − ut(x | z)

2
].
(a) The conditional vector field is given by (1p)
ut(x | z) =
α˙ t −
β˙
t
βt
αt
!
z +
β˙
t
βt
x.
Find a simplified expression for ut(xt | z) when xt is drawn from the conditional
path pt(· | z), and implement it.
Page 2
(b) Implement a MLP architecture that takes (x, t) ∈ R (1p) 3 as inputs and outputs the
estimated vector field u
θ
t
(xt) ∈ R
2
.
(c) Implement a training loop according to Algorithm 1, and train an MLP with 4 (1p)
hidden layers of dimension 64 on the 2D toy dataset. Choose noise schedulers
αt, βt according to your liking (that fulfills the needed criteria), and state what
you used. Does the loss converge?
(d) Let the number of time steps nt = 1000, and sample 300 realizations from pdata (1p)
using the Euler method (σt = 0) applied to the ODE parametrized by your trained
vector field. Plot a 2D scatter plot of 300 corrupted data points according to the
true conditional path for t = 0, 0.25, 0.50, 0.75, 1. Provide a similar plot for the 300
generated data points together with a plot of the simulated trajectories Xt. How
does the choice of nt affect the sampling?
Algorithm 1: Conditional flow matching
Require: data set with samples pdata, vector field u
θ
t
for each batch do
Sample data point z ∼ pdata.
Sample random time t ∼ U[0, 1].
Sample noise ϵ ∼ N (0, I).
Set xt = αtz + βtϵ.
Compute loss L(θ) =

u
θ
t
(xt) − ut(xt | z)

2
.
Update model parameters θ using gradient step on L(θ).
end
4. Let us now turn to image generation. To handle high-dimensional image data, we
need another architecture than an MLP to parameterize our vector field. We will use
the famous UNet architecture with some modification to allow for the time embedding
(the network must be informed about the current time t and for CNNs it is not as
straightforward to do this as for MLPs where we just fed it as an additional input).
(a) Select a few data points z from your choice of image data set, and provide a plot of (1p)
the corresponding corrupted data points xt for time points t = 0, 0.25, 0.50, 0.75, 1.
(b) Implement the residual layer of the UNet architecture and add the pre-defined time (3p)
embedding according to figure 1. Train a vector field over 5000 epochs on the image
data with batch size of 250 using your UNet.
(c) Sample a couple of images from the image distribution by solving the flow ODE (1p)
dXt = u
θ
t
(Xt)dt while using your trained UNet as the drift term.
(d) Play around with different noise schedulers and comment on how they affect the (1p)
sample quality.
5. So far we have only trained flow models (no diffusion term). Via the Fokker-Planck
equation, it can be shown that the following SDE
dXt =

u
θ
t
(Xt) + σ
2
t
2
∇ log pt(Xt)

dt + σtdWt (3)
have the same probability paths pt as the flow ODE dXt
dt = u
θ
t
(Xt). Hence, to convert
the flow model into a diffusion model, we have to learn the second drift term ∇ log pt(Xt)
called the score function. This can be done by training a network s
θ
t using conditional
score matching
L(θ) = Et∼U[0,1],z∼pdata,x∼pt(·|z)
[

s
θ
t
(x) − ∇ log pt(x | z)

2
].
Page 3
Initial Conv
Encoder
Encoder
Encoder
Bottleneck
Decoder
Decoder
Decoder
Final Conv
𝑥𝑡 𝑡 𝑢𝑡
𝜃
(𝑥𝑡)
Embed
Residual connection
Residual connection
Residual connection(s)
SiLU + BatchNorm +
Conv
SiLU + BatchNorm +
Conv
MLP
𝑡
Residual
layer
Figure 1: UNet architecture together with a detailed overview of the residual layer. Each
encoder consists of two residual layers followed by downsampling. Each decoder consists of
upsampling followed by two residual layers. The bottleneck module consist of three residual
layers.
(a) Derive an explicit formula for the conditional score function ∇ log pt(x | z) from (3p)
the conditional path and implement it. Then train a score network s
θ
t on the image
dataset using the conditional score matching loss.
(b) Sample new data using the diffusion model by combining the trained vector field (3p)
u
θ
t and the score network s
t
θ
and simulate the SDE in (3). How does the value of
σt affect the samples?
Final question: Did you use an AI tool (other than the machine learning models you
trained in this exercise) for anything else than information searching, when solving this
problem set? If so, please write a brief statement of how you used AI.
Total number of points: 25
Motivate your answers wherever applicable.
Good luck!

Solved ASSIGNMENT III Generative modeling SH2150

Download Details:

Description

Solved ASSIGNMENT III Generative modeling SH2150

Download Details:

Description

Related products

Solved ASSIGNMENT I Regression and Classification SH2150

Solved ASSIGNMENT II Convolutional Neural Networks SH2150