DDIMCache: An enhanced text-to-image diffusion model on mobile devices

Wu Qifeng

DDIMCache: An enhanced text-to-image diffusion model on mobile devices

Wu Qifeng

Kybernetika (2024)

Volume: 60, Issue: 6, page 819-833
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

On June 11, 2024, OpenAI announced a collaboration with Apple to deeply integrate the ChatGPT generative language model into Apple's product lineup. With support from various generative AI models, devices like smartphones will become more intelligent. The text-to-image diffusion model, known for its stable and superior generative capabilities, has gained wide recognition in image generation and will undoubtedly play a crucial role on mobile devices. However, the large size and complex architecture of diffusion models result in high computational costs and slower execution speeds. As a result, diffusion models require high-end GPUs or cloud-based inference, which often raises personal privacy and data security. This paper presents a multiplicative effect joint optimization method for complex models such as diffusion models, enabling efficient execution on mobile devices. The method integrates multiple optimization strategies, leveraging their interactions to create synergies and enhance overall performance. Building on this multiplicative effect joint optimization approach, we have introduced DDIMCache, an enhanced text-to-image diffusion model. DDIMCache maintains image generation quality while achieving optimal speed, generating 512-512 images in approximately 6 seconds. This provides powerful image generation capabilities and an enhanced user experience for mobile users.In addition, as a foundation model, Stable Diffusion supports more applications such as image editing, inpainting, style transfer, and super-resolution, all of which can have a significant impact. The ability to run the model entirely on mobile devices without an internet connection will open up endless possibilities.

How to cite

top

MLA
BibTeX
RIS

Qifeng, Wu. "DDIMCache: An enhanced text-to-image diffusion model on mobile devices." Kybernetika 60.6 (2024): 819-833. <http://eudml.org/doc/299883>.

@article{Qifeng2024,
abstract = {On June 11, 2024, OpenAI announced a collaboration with Apple to deeply integrate the ChatGPT generative language model into Apple's product lineup. With support from various generative AI models, devices like smartphones will become more intelligent. The text-to-image diffusion model, known for its stable and superior generative capabilities, has gained wide recognition in image generation and will undoubtedly play a crucial role on mobile devices. However, the large size and complex architecture of diffusion models result in high computational costs and slower execution speeds. As a result, diffusion models require high-end GPUs or cloud-based inference, which often raises personal privacy and data security. This paper presents a multiplicative effect joint optimization method for complex models such as diffusion models, enabling efficient execution on mobile devices. The method integrates multiple optimization strategies, leveraging their interactions to create synergies and enhance overall performance. Building on this multiplicative effect joint optimization approach, we have introduced DDIMCache, an enhanced text-to-image diffusion model. DDIMCache maintains image generation quality while achieving optimal speed, generating 512-512 images in approximately 6 seconds. This provides powerful image generation capabilities and an enhanced user experience for mobile users.In addition, as a foundation model, Stable Diffusion supports more applications such as image editing, inpainting, style transfer, and super-resolution, all of which can have a significant impact. The ability to run the model entirely on mobile devices without an internet connection will open up endless possibilities.},
author = {Qifeng, Wu},
journal = {Kybernetika},
keywords = {diffusion model; text-to-image; mobile devices},
language = {eng},
number = {6},
pages = {819-833},
publisher = {Institute of Information Theory and Automation AS CR},
title = {DDIMCache: An enhanced text-to-image diffusion model on mobile devices},
url = {http://eudml.org/doc/299883},
volume = {60},
year = {2024},
}

TY - JOUR
AU - Qifeng, Wu
TI - DDIMCache: An enhanced text-to-image diffusion model on mobile devices
JO - Kybernetika
PY - 2024
PB - Institute of Information Theory and Automation AS CR
VL - 60
IS - 6
SP - 819
EP - 833
AB - On June 11, 2024, OpenAI announced a collaboration with Apple to deeply integrate the ChatGPT generative language model into Apple's product lineup. With support from various generative AI models, devices like smartphones will become more intelligent. The text-to-image diffusion model, known for its stable and superior generative capabilities, has gained wide recognition in image generation and will undoubtedly play a crucial role on mobile devices. However, the large size and complex architecture of diffusion models result in high computational costs and slower execution speeds. As a result, diffusion models require high-end GPUs or cloud-based inference, which often raises personal privacy and data security. This paper presents a multiplicative effect joint optimization method for complex models such as diffusion models, enabling efficient execution on mobile devices. The method integrates multiple optimization strategies, leveraging their interactions to create synergies and enhance overall performance. Building on this multiplicative effect joint optimization approach, we have introduced DDIMCache, an enhanced text-to-image diffusion model. DDIMCache maintains image generation quality while achieving optimal speed, generating 512-512 images in approximately 6 seconds. This provides powerful image generation capabilities and an enhanced user experience for mobile users.In addition, as a foundation model, Stable Diffusion supports more applications such as image editing, inpainting, style transfer, and super-resolution, all of which can have a significant impact. The ability to run the model entirely on mobile devices without an internet connection will open up endless possibilities.
LA - eng
KW - diffusion model; text-to-image; mobile devices
UR - http://eudml.org/doc/299883
ER -

References

top

Rombach, R., Blattmann, A., al., D. Lorenz et, , In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, pp. 10684-10695. DOI
Hou, J., Asghar, Z., , Qualcomm 24 (2023). DOI
Sarokin, Y. H. Chenm R., al., J. Lee et, , In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023. pp. 4651-4655. DOI
Shang, Y., al., Z. Yuan et, , In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023. pp. 1972-1981. DOI
Li, X., Liu, Y., al., L. Lian et, , In: Proc. IEEE/CVF International Conference on Computer Vision 2023: pp. 17535-17545. DOI
Ma, X., Fang, G., X.Wang, Llm-pruner: On the structural pruning of large language models., Adv. Neural Inform. Process. Systems 36 (2023), 21702-21720.
Li, Y., Yuan, G., al., Y. Wen et, Efficientformer: Vision transformers at mobilenet speed., Adv. Neural Inform. Process. Systems 35 (2022), 12934-12949.
Sohl-Dickstein, J., Weiss, E., al., N. Maheswaranathan et, Deep unsupervised learning using nonequilibrium thermodynamics., In: International Conference on Machine Learning PMLR, 2015, pp. 2256-2265.
Song, Jiaming, Meng, Chenlin, Ermon, Stefano, , In: arXiv preprint: DOI
Jain, S. M., , Apress, Berkeley 2022, 51-67. DOI
Ronneberger, O., Fischer, P., U-net, T. Brox, Convolutional networks for biomedical image segmentation., Medical image computing and computer-assisted interventional MICCAI 2015. In: Proc. 18th international conference, Munich 2015, part III 18. Springer International Publishing, pp. 234-241.
Lin, T. Y., Maire, M., al., S. Belongie et, Microsoft coco: Common objects in context. Computer Vision'ECCV 2014., In: Proc. 13th European Conference, Zurich 2014, Part V 13. Springer International Publishing 2014, pp. 740-755.
Nichol, A. Q., Dhariwal, P., Improved denoising diffusion probabilistic models., In: International Conference on Machine Learning, PMLR 2021, pp. 8162-8171.

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.