GitHub - huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Extracto
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - GitHub - huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generat...
Contenido
More precisely,
- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see src/diffusers/pipelines). Check this overview to see all supported pipelines and their corresponding official papers.
- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see src/diffusers/schedulers).
- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see src/diffusers/models).
- Training examples to show how to train the most popular diffusion model tasks (see examples, e.g. unconditional-image-generation).
Installation
With pip
pip install --upgrade diffusers # should install diffusers 0.2.4With conda
conda install -c conda-forge diffusers
Contributing
We
- See Good first issues for general opportunities to contribute
- See New model/pipeline to contribute exciting new diffusion models / diffusion pipelines
- See New scheduler
Also, say . We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
just hang out
Quickstart
In order to get started, we recommend taking a look at two notebooks:
New Stable Diffusion is now fully compatible with diffusers!
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the model card, read the license and tick the checkbox if you agree. You have to be a registered user in
Text-to-Image generation with Stable Diffusion
# make sure you're logged in with `huggingface-cli login` from torch import autocast from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt).images[0]
Note: If you don't want to use the token, you can also simply download the model weights
(after having accepted the license) and pass
the path to the local folder to the StableDiffusionPipeline.
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
Assuming the folder is stored locally under ./stable-diffusion-v1-4, you can also run stable diffusion
without requiring an authentication token:
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt).images[0]
If you are limited by GPU memory, you might want to consider using the model in fp16 as
well as chunking the attention computation.
The following snippet should result in less than 4GB VRAM.
pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" pipe.enable_attention_slicing() with autocast("cuda"): image = pipe(prompt).images[0]
Finally, if you wish to use a different scheduler, you can simply instantiate
it before the pipeline and pass it to from_pretrained.
from diffusers import LMSDiscreteScheduler lms = LMSDiscreteScheduler( beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear" ) pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, scheduler=lms, use_auth_token=True ) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png")
Image-to-Image text-guided generation with Stable Diffusion
The StableDiffusionImg2ImgPipeline lets you pass a text prompt and an initial image to condition the generation of new images.
from torch import autocast import requests import torch from PIL import Image from io import BytesIO from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" model_id_or_path = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( model_id_or_path, revision="fp16", torch_dtype=torch.float16, use_auth_token=True ) # or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 # and pass `model_id_or_path="./stable-diffusion-v1-4"` without having to use `use_auth_token=True`. pipe = pipe.to(device) # let's download an initial image url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" response = requests.get(url) init_image = Image.open(BytesIO(response.content)).convert("RGB") init_image = init_image.resize((768, 512)) prompt = "A fantasy landscape, trending on artstation" with autocast("cuda"): images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images images[0].save("fantasy_landscape.png")
You can also run this example on colab
In-painting using Stable Diffusion
The StableDiffusionInpaintPipeline lets you edit specific parts of an image by providing a mask and text prompt.
from io import BytesIO from torch import autocast import torch import requests import PIL from diffusers import StableDiffusionInpaintPipeline def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB") img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" model_id_or_path = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionInpaintPipeline.from_pretrained( model_id_or_path, revision="fp16", torch_dtype=torch.float16, use_auth_token=True ) # or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 # and pass `model_id_or_path="./stable-diffusion-v1-4"` without having to use `use_auth_token=True`. pipe = pipe.to(device) prompt = "a cat sitting on a bench" with autocast("cuda"): images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images images[0].save("cat_on_bench.png")
Tweak prompts reusing seeds and latents
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. This notebook shows how to do it step by step. You can also run it in Google Colab .
For more details, check out the Stable Diffusion notebook
and have a look into the release notes.
Examples
There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using DiffusionPipelines and Google Colab) and interactive web-tools.
Running Code
If you want to run the code yourself
# !pip install diffusers transformers from diffusers import DiffusionPipeline model_id = "CompVis/ldm-text2im-large-256" # load model and scheduler ldm = DiffusionPipeline.from_pretrained(model_id) # run pipeline in inference (sample random noise and denoise) prompt = "A painting of a squirrel eating a burger" images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images # save images for idx, image in enumerate(images): image.save(f"squirrel-{idx}.png")
# !pip install diffusers from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline model_id = "google/ddpm-celebahq-256" # load model and scheduler ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference # run pipeline in inference (sample random noise and denoise) image = ddpm().images # save image image[0].save("ddpm_generated_image.png")
Other Notebooks:
Web Demos
If you just want to play around with some web demos, you can try out the following
| Model | Hugging Face Spaces |
|---|---|
| Text-to-Image Latent Diffusion | |
| Faces generator | |
| DDPM with different schedulers | |
| Conditional generation from sketch | |
| Composable diffusion |
Definitions
Models: Neural network that models
Figure from DDPM paper (https://arxiv.org/abs/2006.11239).
Schedulers: Algorithm class for both inference and training. The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. Examples: DDPM, DDIM, PNDM, DEIS
Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239).
Diffusion Pipeline: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ... Examples: Glide, Latent-Diffusion, Imagen, DALL-E 2
Figure from ImageGen (https://imagen.research.google/).
Philosophy
- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. E.g., the provided schedulers are separated from the provided models and provide well-commented code that can be read alongside the original paper.
- Diffusers is modality independent and focuses on providing pretrained models and tools to build systems that generate continous outputs, e.g. vision and audio.
- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are Glide and Latent Diffusion.
In the works
For the first release,
- Diffusers for audio
- Diffusers for reinforcement learning (initial work happening in #105).
- Diffusers for video generation
- Diffusers for molecule generation (initial work happening in #54)
A few pipeline components are already being worked on, namely:
- BDDMPipeline for spectrogram-to-sound vocoding
- GLIDEPipeline to support OpenAI's GLIDE model
- Grad-TTS for text to audio generation / conditional audio generation
We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a GitHub issue mentioning what you would like to see.
Credits
This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
- @CompVis' latent diffusion models library, available here
- @hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
- @ermongroup's DDIM implementation, available here.
- @yang-song's Score-VE and Score-VP implementations, available here
We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here as well as @crowsonkb and @rromb for useful discussions and insights.
Fuente: GitHub
