Project 5 Results

Part A: The Power of Diffusion Models!

Part 0: Setup

For these and all following results, I use a seed of 180. I noticed that the results, while not the highest quality of images, tended to match the text prompts well. I didn't notice a massive difference in quality of output when running with 10 versus 20 inference steps.

Centered Image

Inference Step = 20

Unconditioned Training Losses

Inference Step = 10

Part 1: Sampling Loops

1.1 Implementing the Forward Process

Centered Image

Campanile with Noise at Different Timesteps

1.2 Classical Denoising

Centered Image

Noisy and Gaussian-Denoised Side by Side

1.3 One-Step Denoising

Low Quality Image 1

Original, Noisy, One-Step Denoised
Timestep 250

Low Quality Image 2

Original, Noisy, One-Step Denoised
Timestep 500

Low Quality Image 3

Original, Noisy, One-Step Denoised
Timestep 750

1.4 Iterative Denoising

Centered Image

Iterative Steps

Centered Image

Original, Iteratively Denoised, One-Step Denoised, Gaussian Blurred

1.5 Diffusion Model Sampling

Centered Image

5 Sampled Images

1.6 Classifier-Free Guidance (CFG)

Centered Image

5 CFG Sampled Images

1.7 Image to Image Translation

Centered Image

Starting with Prompt "a high quality photo" to Campanile test image

Centered Image

Starting with Prompt "a high quality photo" to Golden Gate test image

Centered Image

Starting with Prompt "a high quality photo" to Macbook test image

1.7.1 Handdrawn and Web Images

Larger Image

Starting with Prompt "a high quality photo" to Lebron Test Image

Smaller Image

Original Lebron Test Image

Larger Image

Starting with Prompt "a high quality photo" to Handrawn House Test Image

Smaller Image

Original Handdrawn House Test Image

1.7.2 Inpainting

Larger Image

Original, Mask, To Replace

Smaller Image

Inpainted Campanile

1.7.3 Text-Conditional Image-to-image Translation

Centered Image

"a rocket ship" to Campanile
"a photo of a man" to Lebron
"an oil painting of a snowy mountain village" to Handdrawn House

1.8 Visual Anagrams

High Quality Image 1

An Oil Painting of an Old Man

High Quality Image 2

An Oil Painting of People around a Campfire

High Quality Image 1

Amalfi Coast

High Quality Image 2

Hipster Bartender

High Quality Image 1

Waterfall

High Quality Image 2

Snowy Mountain Village

1.9 Hybrid Images

High Quality Image 1

Hybrid Image of a skull and a waterfall

High Quality Image 2

Hybrid Image of a skull and a waterfall

High Quality Image 1

Hybrid Image of "a lithogram of Batman's face" and "a lithograph of a scene of bats flying", seed=180

High Quality Image 2

Hybrid Image of "a lithogram of Thanos's face" and "a lithogram of three purple flowers", seed=7

Part B: Diffusion Models from Scratch!

Part 1: Training a Single-Step Denoising UNet

Unconditioned UNet

Centered Image

Denoising

Unconditioned Training Losses

Unconditioned Training Losses

Unconditioned Training Losses

Denoised Results on Digits from Test set after 1 Epoch

Unconditioned Training Losses

Denoised Results on Digits from Test set after 5 Epochs

Centered Image

Out of Distribution Testing

Time-Conditioned UNet

Unconditioned Training Losses

Time-Conditioned Training Losses

Unconditioned Training Losses

Sampling results for Time-Conditioned after 5 Epochs

Unconditioned Training Losses

Sampling results for Time-Conditioned after 20 Epochs

Class-Conditioned UNet

Unconditioned Training Losses

Class-Conditioned Training Losses

Unconditioned Training Losses

Sampling results for Class-Conditioned after 5 Epochs

Unconditioned Training Losses

Sampling results for Class-Conditioned after 20 Epochs