参考:
1、https://zhuanlan.zhihu.com/p/573984443
2、https://zhangzhenhu.github.io/blog/aigc
3、 https://zhuanlan.zhihu.com/p/599160988
扩散概率模型(diffusion probabilistic models)
1、 扩散概率模型(diffusion probabilistic model)
2、降噪扩散概率模型(Denoising diffusion probabilistic model,DDPM)
3、基于分数的解释(Score-based DDPM)
4、扩散模型的三种等价表示
5、改进降噪扩散概率模型(Improved Denoising Diffusion Probabilistic Models,IDDPM)
6. 参考文献
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. 2015. arXiv:1503.03585.2(1,2,3,4,5,6,7)
Calvin Luo. Understanding diffusion models: a unified perspective. 2022. arXiv:2208.11970.3(1,2,3,4)
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. 2020. arXiv:2006.11239.4
Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. 2022. arXiv:2107.00630.5
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. 2019. arXiv:1907.05600.
去噪扩散隐式模型(Denoising Diffusion Implicit Models,DDIM)
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. 2022. arXiv:2010.02502.
基于分数的生成模型
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. 2019. arXiv:1907.05600.
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. 2021. arXiv:2011.13456.
Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005.
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. 2020. arXiv:2006.09011.
条件控制扩散模型
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. 2021. arXiv:2105.05233.2(1,2)
Calvin Luo. Understanding diffusion models: a unified perspective. 2022. arXiv:2208.11970.3
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. 2022. arXiv:2207.12598.4
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: towards photorealistic image generation and editing with text-guided diffusion models. 2022. arXiv:2112.10741.5
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. 2022. arXiv:2204.06125.6
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. 2022. arXiv:2205.11487.
稳定扩散模型(Stable diffusion model)
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. 2021. arXiv:2112.10752.
DDPM 模型在生成图像质量上效果已经非常好,但它也有个缺点, 那就是 尺寸是和图片一致的,元素和图片的像素是一一对应的, 所以称 DDPM 是像素(pixel)空间的生成模型。 我们知道一张图片的尺寸,如果想生成一张高尺寸的图像, 张量大小是非常大的,这就需要极大的显卡(硬件)资源,包括计算资源和显存资源。 同样的,它的训练成本也是高昂的。高昂的成本极大的限制了它在民用领用的发展
潜在扩散模型
2021年德国慕尼黑路德维希-马克西米利安大学计算机视觉和学习研究小组(原海德堡大学计算机视觉小组), 简称 CompVis 小组,发布了论文 High-Resolution Image Synthesis with Latent Diffusion Models 1,针对这个问题做了一些改进, 主要的改进点有:
- 引入一个自编码器,先对原始对象进行压缩编码,编码后的向量再应用到扩散模型。
- 通过在 UNET 中加入 Attention 机制,处理条件变量