Open-Sora 1.0 文生视频【玩转华为云】

举报
HouYanSong 发表于 2024/04/29 00:11:04 2024/04/29
【摘要】 不久前,OpenAI Sora 凭借其惊人的视频生成效果迅速走红,在一堆文本转视频模型中脱颖而出,成为全球关注的焦点。之后,Colossal-AI团队又推出了新的开源解决方案“Open-Sora 1.0”,涵盖了整个训练过程,包括数据处理、所有训练细节和模型检查点,与世界各地的AI爱好者携手推进视频创作的新时代。

Open-Sora 1.0 文生视频【玩转华为云】

不久前,OpenAI Sora 凭借其惊人的视频生成效果迅速走红,在一堆文本转视频模型中脱颖而出,成为全球关注的焦点。之后,Colossal-AI团队又推出了新的开源解决方案“Open-Sora 1.0”,涵盖了整个训练过程,包括数据处理、所有训练细节和模型检查点,与世界各地的AI爱好者携手推进视频创作的新时代。

视频演示

让我们来看看Open-Sora 1.0的实际视频生成结果。


mmexport1714316463364_edit_601883683989981.jpg

                      STDiT 模型结构示意图


例如,我们可以生成有关大海的视频。通过一个简单的输入提示,我们让 Open-Sore 生成一个关于珊瑚礁中海龟悠闲游泳的视频。

Prompt: A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world.

案例体验

现在,我们可以在AI Gallery中一键生成我们想要的视频,具体操作如下:

1. 运行案例

本案例需使用 Pytorch-1.8 GPU-V100 及以上规格运行,点击此处跳转到Notebook页面,然后点击Run in ModelArts一键运行,如果没有账号则需要登录并进行实名认证!

屏幕截图 2024-04-28 232005.png

2. 下载代码和模型

import os
import moxing as mox

if not os.path.exists('Open-Sora'):
    mox.file.copy_parallel('obs://modelbox-course/Open-Sora', 'Open-Sora')
    
if not os.path.exists('Open-Sora/opensora/models/pretrained-model'):
    mox.file.copy_parallel('obs://modelbox-course/pretrained-model', 'Open-Sora/opensora/models/pretrained-model')
    
if not os.path.exists('Open-Sora/opensora/models/sd-vae-ft-ema'):
    mox.file.copy_parallel('obs://modelbox-course/sd-vae-ft-ema', 'Open-Sora/opensora/models/sd-vae-ft-ema')

if not os.path.exists('Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl'):
    mox.file.copy_parallel('obs://modelbox-course/t5-v1_1-xxl', 'Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl')
    
if not os.path.exists('/home/ma-user/work/frpc_linux_amd64'):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/frpc_linux_amd64', '/home/ma-user/work/frpc_linux_amd64')
    
if not os.path.exists('/home/ma-user/work/t5.py'):
    mox.file.copy_parallel('obs://modelbox-course/t5.py', '/home/ma-user/work/t5.py')

默默等待全部下载完成...

INFO:root:Using MoXing-v2.1.6.879ab2f4-879ab2f4
INFO:root:Using OBS-Python-SDK-3.20.9.1
INFO:root:List OBS time cost: 0.95 seconds.
INFO:root:Copy parallel total time cost: 2.19 seconds.
INFO:root:List OBS time cost: 0.02 seconds.
INFO:root:Copy parallel total time cost: 21.72 seconds.
INFO:root:List OBS time cost: 0.24 seconds.
INFO:root:Copy parallel total time cost: 3.29 seconds.
INFO:root:List OBS time cost: 0.04 seconds.
INFO:root:Copy parallel total time cost: 39.24 seconds.

3. 配置运行环境

本案例依赖Python3.10.10及以上环境,因此我们首先创建虚拟环境:

!/home/ma-user/anaconda3/bin/conda clean -i
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
!/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
import json
import os

data = {
   "display_name": "python-3.10.10",
   "env": {
      "PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin"
   },
   "language": "python",
   "argv": [
      "/home/ma-user/anaconda3/envs/python-3.10.10/bin/python",
      "-m",
      "ipykernel",
      "-f",
      "{connection_file}"
   ]
}

if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"):
    os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/")

with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f:
    json.dump(data, f, indent=4)
conda env list
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
# conda environments:
#
base                  *  /home/ma-user/anaconda3
python-3.10.10           /home/ma-user/anaconda3/envs/python-3.10.10
python-3.7.10            /home/ma-user/anaconda3/envs/python-3.7.10


Note: you may need to restart the kernel to use updated packages.

创建完成后,稍等片刻,或刷新页面,点击右上角kernel选择python-3.10.10

Python 3.10.10

确认Pyhont的版本是3.10.10,保证至少拥有32GB的显存

Sun Apr 28 18:28:30 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:00:0D.0 Off |                    0 |
| N/A   31C    P0    26W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

之后安装后面的依赖包...

4. 生成视频

修改配置文件:

%%writefile configs/opensora/inference/16x256x256_test.py
num_frames = 16
fps = 24 // 3
image_size = (256, 256)

# Define model
model = dict(
    type="STDiT-XL/2",
    space_scale=0.5,
    time_scale=1.0,
    enable_flashattn=False,
    enable_layernorm_kernel=False,
    from_pretrained="./opensora/models/pretrained-model/OpenSora-v1-HQ-16x256x256.pth",
)
vae = dict(
    type="VideoAutoencoderKL",
    from_pretrained="./opensora/models/sd-vae-ft-ema",
    micro_batch_size=4,
)
text_encoder = dict(
    type="t5",
    from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
    model_max_length=120,
)
scheduler = dict(
    type="iddpm",
    num_sampling_steps=100,
    cfg_scale=7.0,
    cfg_channel=3, # or None
)
dtype = "fp16"

# Others
batch_size = 1
seed = 42
prompt_path = "./assets/texts/t2v_samples.txt"
save_dir = "./outputs/samples/"

运行命令,生成的视频保存在Open-Sora/outputs文件夹中,随机查看:

!cp /home/ma-user/work/t5.py /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py
!torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256_test.py

屏幕截图 2024-04-28 235438.png

5. Gradio App

重写推理代码实现Gradio界面:

%%writefile scripts/inference-gradio.py
import os
import gradio as gr

import torch
import colossalai
import torch.distributed as dist
from mmengine.runner import set_random_seed

from opensora.datasets import save_sample
from opensora.registry import MODELS, SCHEDULERS, build_module
from opensora.utils.config_utils import parse_configs
from opensora.utils.misc import to_torch_dtype
from opensora.acceleration.parallel_states import set_sequence_parallel_group
from colossalai.cluster import DistCoordinator

def main():
    # ======================================================
    # 1. cfg and init distributed env
    # ======================================================
    cfg = parse_configs(training=False)
    print(cfg)

    # init distributed
    colossalai.launch_from_torch({})
    coordinator = DistCoordinator()

    if coordinator.world_size > 1:
        set_sequence_parallel_group(dist.group.WORLD) 
        enable_sequence_parallelism = True
    else:
        enable_sequence_parallelism = False

    # ======================================================
    # 2. runtime variables
    # ======================================================
    torch.set_grad_enabled(False)
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
    device = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = to_torch_dtype(cfg.dtype)
    set_random_seed(seed=cfg.seed)

    # ======================================================
    # 3. build model & load weights
    # ======================================================
    # 3.1. build model
    input_size = (cfg.num_frames, *cfg.image_size)
    vae = build_module(cfg.vae, MODELS)
    latent_size = vae.get_latent_size(input_size)
    text_encoder = build_module(cfg.text_encoder, MODELS, device=device)  # T5 must be fp32
    model = build_module(
        cfg.model,
        MODELS,
        input_size=latent_size,
        in_channels=vae.out_channels,
        caption_channels=text_encoder.output_dim,
        model_max_length=text_encoder.model_max_length,
        dtype=dtype,
        enable_sequence_parallelism=enable_sequence_parallelism,
    )
    text_encoder.y_embedder = model.y_embedder  # hack for classifier-free guidance

    # 3.2. move to device & eval
    vae = vae.to(device, dtype).eval()
    model = model.to(device, dtype).eval()

    # 3.3. build scheduler
    scheduler = build_module(cfg.scheduler, SCHEDULERS)

    # 3.4. support for multi-resolution
    model_args = dict()
    if cfg.multi_resolution:
        image_size = cfg.image_size
        hw = torch.tensor([image_size], device=device, dtype=dtype).repeat(cfg.batch_size, 1)
        ar = torch.tensor([[image_size[0] / image_size[1]]], device=device, dtype=dtype).repeat(cfg.batch_size, 1)
        model_args["data_info"] = dict(ar=ar, hw=hw)

    # ======================================================
    # 4. inference
    # ======================================================
    # 4.1.inference code
    @torch.no_grad()
    def run_inference(prompt_text):
        save_dir = cfg.save_dir
        torch.cuda.empty_cache()
        print("Prompt:", prompt_text)
        os.makedirs(save_dir, exist_ok=True)
        samples = scheduler.sample(
            model,
            text_encoder,
            z_size=(vae.out_channels, *latent_size),
            prompts=[prompt_text],
            device=device,
            additional_args=model_args,
        )
        samples = vae.decode(samples.to(dtype))
        save_path = os.path.join(save_dir, "sample")
        saved_path = save_sample(samples[0], fps=cfg.fps, save_path=save_path)
        return saved_path   
     # 4.2. clear input
    def reset_user_input():
        return gr.update(value='')
    # 4.3. gradio app
    with gr.Blocks() as demo:
        gr.HTML("""<h1 align="center">Open-Sora 文生视频</h1>""")
        with gr.Row():
            with gr.Column():
                prompt_text = gr.Textbox(label="Prompt",placeholder="Describe your video here", lines=4)
                submit_button = gr.Button("Generate video")
            with gr.Column():
                output_video = gr.Video(width=512, height=512)
            submit_button.click(run_inference, [prompt_text], [output_video], show_progress=True)
            submit_button.click(reset_user_input, [], [prompt_text])
        gr.Examples(
            examples=[
                ["A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life."],
                ["A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest."],
                ["A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world."]
            ],
            inputs=[prompt_text]
        )
    demo.queue().launch(share=True, inbrowser=True)
        
if __name__ == "__main__":
    main()
Writing scripts/inference-gradio.py

运行命令,运行成功后点击 Running on public URL 后的网页链接即可体验!

!torchrun --standalone --nproc_per_node 1 scripts/inference-gradio.py configs/opensora/inference/16x256x256_test.py

屏幕截图 2024-04-28 194903.png

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.

/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel

  warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")

Config (path: configs/opensora/inference/16x256x256_test.py): {'num_frames': 16, 'fps': 8, 'image_size': (256, 256), 'model': {'type': 'STDiT-XL/2', 'space_scale': 0.5, 'time_scale': 1.0, 'enable_flashattn': False, 'enable_layernorm_kernel': False, 'from_pretrained': './opensora/models/pretrained-model/OpenSora-v1-HQ-16x256x256.pth'}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': './opensora/models/sd-vae-ft-ema', 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': './opensora/models/text_encoder/t5-v1_1-xxl', 'model_max_length': 120}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0, 'cfg_channel': 3}, 'dtype': 'fp16', 'batch_size': 1, 'seed': 42, 'prompt_path': './assets/texts/t2v_samples.txt', 'save_dir': './outputs/samples/', 'multi_resolution': False}

/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/initialize.py:48: UserWarning: `config` is deprecated and will be removed soon.

  warnings.warn("`config` is deprecated and will be removed soon.")

[04/28/24 19:47:12] INFO     colossalai - colossalai - INFO:                    

                             /home/ma-user/anaconda3/envs/python-3.10.10/lib/pyt

                             hon3.10/site-packages/colossalai/initialize.py:67  

                             launch                                             

                    INFO     colossalai - colossalai - INFO: Distributed        

                             environment is initialized, world size: 1          

Loading checkpoint shards:   0%|                          | 0/2 [00:00<?, ?it/s]/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()

  return self.fget.__get__(instance, owner)()

Loading checkpoint shards: 100%|██████████████████| 2/2 [00:33<00:00, 16.80s/it]

Missing keys: ['pos_embed', 'pos_embed_temporal']

Unexpected keys: []

Running on local URL:  http://127.0.0.1:7860

Running on public URL: https://b041abff94f403b77b.gradio.live



This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

Prompt: A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life.

100%|█████████████████████████████████████████| 100/100 [00:30<00:00,  3.33it/s]

Saved to ./outputs/samples/sample.mp4

参考文章:Open-Sora: Revealing Complete Model Parameters, Training Details, and Everything for Sora-like Video Generation Models (hpc-ai.com)





【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。