Dreambooth:一键生成你想要的人物画像【玩转华为云】

举报
HouYanSong 发表于 2024/05/11 03:20:03 2024/05/11
【摘要】 Dreambooth是谷歌发布的一种通过向模型注入自定义的主题来fine-tune diffusion model的技术,可以生成不同场景下的图片。本文将演示在AI Gallery中使用自定义数据集微调Stable Diffusion,一键生成你想要的人物画像!

Dreambooth:一键生成你想要的人物画像

Dreambooth是谷歌发布的一种通过向模型注入自定义的主题来fine-tune diffusion model的技术,可以生成不同场景下的图片。本文将演示在AI Gallery中使用自定义数据集微调Stable Diffusion,一键生成你想要的人物画像!

stable-diffusion-cliptext-unet-autoencoder-decoder.png

1. 准备工作

首先下载3~10人像照片,最好是不同角度的人物图像,这里我从网上搜集了5张庄达菲的图片作为输入:

Collage_20240511_012224.jpg

2. 运行案例

本案例需使用 Pytorch-2.0.1 GPU-V100 及以上规格,点击Run in ModelArtsNotebook中一键体验:

屏幕截图 2024-05-10 074211.png

3. 模型训练

首先下载代码模型并配置运行环境,然后下载原始数据集wh11e.zip压缩包,替换为自己的图片并上传压缩包:

屏幕截图 2024-05-11 020448.png

模型训练配置和参数保持不变,之后启动训练,耗时10min左右:

# --pretrained_model_name_or_path: 模型路径,这里使用我下载的离线权重SD1.5
# --pretrained_vae_name_or_path: vae路径,这里使用我下载的离线权重
# --output_dir: 输出路径
# --resolution: 分辨率
# --save_sample_prompt: 保存样本的提示语
# --concepts_list: 配置json路径

!python3 ./tools/train_dreambooth.py \
  --pretrained_model_name_or_path=$model_sd \
  --pretrained_vae_name_or_path="vae-ft-mse" \
  --output_dir=$output_dir \
  --revision="fp16" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --seed=777 \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=$learning_rate \
  --lr_scheduler="constant" \
  --lr_warmup_steps=80 \
  --num_class_images=$num_class_images \
  --sample_batch_size=4 \
  --max_train_steps=$max_num_steps \
  --save_interval=10000 \
  --save_sample_prompt="a photo of wh11e person" \
  --concepts_list="./training_data/concepts_list.json"
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/accelerate/accelerator.py:401: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
  warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Loading pipeline components...: 100%|█████████████| 6/6 [00:05<00:00,  1.04it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
05/11/2024 02:05:57 - INFO - __main__ - Number of class images to sample: 60.
Generating class images: 100%|██████████████████| 15/15 [01:07<00:00,  4.52s/it]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib')}
  warn(
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: Detected CUDA version 102
CUDA SETUP: Loading binary /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda102_nocublaslt.so...
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/diffusers/configuration_utils.py:239: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Caching latents: 100%|██████████████████████████| 60/60 [00:04<00:00, 14.85it/s]
05/11/2024 02:07:12 - INFO - __main__ - ***** Running training *****
05/11/2024 02:07:12 - INFO - __main__ -   Num examples = 60
05/11/2024 02:07:12 - INFO - __main__ -   Num batches each epoch = 60
05/11/2024 02:07:12 - INFO - __main__ -   Num Epochs = 9
05/11/2024 02:07:12 - INFO - __main__ -   Instantaneous batch size per device = 1
05/11/2024 02:07:12 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
05/11/2024 02:07:12 - INFO - __main__ -   Gradient Accumulation steps = 1
05/11/2024 02:07:12 - INFO - __main__ -   Total optimization steps = 500
Steps: 100%|█████████████| 500/500 [02:59<00:00,  2.94it/s, loss=0.287, lr=1e-6]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 89.05it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .

Generating samples:   0%|                                 | 0/4 [00:00<?, ?it/s]
Generating samples:  25%|██████▎                  | 1/4 [00:02<00:06,  2.13s/it]
Generating samples:  50%|████████████▌            | 2/4 [00:03<00:03,  1.80s/it]
Generating samples:  75%|██████████████████▊      | 3/4 [00:05<00:01,  1.68s/it]
Generating samples: 100%|█████████████████████████| 4/4 [00:06<00:00,  1.69s/it]
[*] Weights saved at dreambooth_wh11e/500
Steps: 100%|█████████████| 500/500 [03:17<00:00,  2.53it/s, loss=0.287, lr=1e-6]

查看模型输出的样本:

from natsort import natsorted
from glob import glob

# 查看模型输出的样本
saved_weights_dir = natsorted(glob(output_dir + os.sep + '*'))[-1]

saved_weights_dir
'dreambooth_wh11e/500'

下载 (22).png

4. 模型推理

运行Gradio应用,修改输入提示词生成不同场景的人物画像:

import torch 
import numpy as np
import gradio as gr
from diffusers import StableDiffusionPipeline

# 加载模型
pipe = StableDiffusionPipeline.from_pretrained(saved_weights_dir, torch_dtype=torch.float16)
# 配置GPU
pipe = pipe.to('cuda')
pipe.enable_attention_slicing() # 开启注意力切片,节约显存
pipe.enable_xformers_memory_efficient_attention() # 开启Xformers的内存优化注意力,节约显存
# 更换scheduler
from diffusers import DDIMScheduler
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

negative_prompt = "bad anatomy, ugly, deformed, desfigured, distorted face, poorly drawn hands, poorly drawn face, poorly drawn feet, blurry, low quality, low definition, lowres, out of frame, out of image, cropped, cut off, signature, watermark"
num_samples = 1
guidance_scale = 7.5
num_inference_steps = 30
height = 512
width = 512

def generate_image(prompt, steps):
    image = pipe(prompt,
                 output_type='numpy',
                 negative_prompt=negative_prompt,
                 height=height, width=width,
                 num_images_per_prompt=num_samples,
                 num_inference_steps=steps,
                 guidance_scale=guidance_scale
                 ).images
    image = np.uint8(image[0] * 255)
    return image

with gr.Blocks() as demo:
    gr.HTML("""<h1 align="center">Dreambooth</h1>""")
    with gr.Tab("Generate Image"):
        with gr.Row():
            with gr.Column():
                text_input = gr.Textbox(value="a photo of wh11e person", label="prompts", lines=4)
                steps = gr.Slider(30, 50, step=1, label="steps")
                gr.Examples(
                examples=[
                    ["face portrait of wh11e in the snow, realistic, hd, vivid, sunset"],
                    ["photo of wh11e person, closeup, mountain fuji in the background, natural lighting"],
                    ["photo of wh11e person in the desert, closeup, pyramids in the background, natural lighting, frontal face"]
                ],
                inputs=[text_input]
            )
            image_output = gr.Image(height=400, width=400)
        
    image_button = gr.Button("submit")
    image_button.click(generate_image, [text_input, steps], [image_output])
    
demo.launch(share=True)
prompt = ["photo of wh11e person, closeup, mountain fuji in the background, natural lighting", # wh11e的照片,特写,背景是富士山,自然光
          "photo of wh11e person in the desert, closeup, pyramids in the background, natural lighting, frontal face", # wh11e的照片,特写,背景是金字塔,自然光,正脸
          "photo of wh11e person in the forest, natural lighting, frontal face", # wh11e的照片,背景是森林,自然光,正脸
          "photo of wh11e person as an astronaut, natural lighting, frontal face, closeup, starry sky in the background", # wh11e的照片,作为宇航员,自然光,正脸,特写,背景是星空
          "face portrait of wh11e in the snow, realistic, hd, vivid, sunset", # wh11e在雪地里的人像,逼真,高清,生动,日落
          "digital painting of wh11e in the snow, realistic, hd, vivid, sunset", # wh11e在雪地里的数字油画,逼真,高清,生动,日落
          "watercolor painting of wh11e person, realistic, blue and orange tones", # wh11e的水彩画,逼真,蓝色和橙色调
          "digital painting of wh11e person, hyperrealistic, fantasy, Surrealist, painted by Alphonse Mucha", # wh11e的数字油画,超逼真,幻想,超现实主义,阿方斯·缪夏绘制
          "painting of wh11e person in star wars, realistic, 4k ultra hd, blue and red tones", # wh11e在星球大战中的画作,逼真,4k超高清,蓝色和红色调
          "photo of wh11e person, in an armor, realistic, visible face, colored, detailed face, ultra detailed, natural lighting", # wh11e的照片,穿着盔甲,逼真,可见脸,彩色,详细的脸,超详细,自然光
          "photo of wh11e person, cyberpunk, vivid, realistic, 4k ultra hd", # wh11e的照片,赛博朋克,生动,逼真,4k超高清
          "a painting of wh11e person, realistic, by Van Gogh,", # wh11e的画作,逼真,由梵高绘制
          ]

屏幕截图 2024-05-11 024236.png

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。