Open-Sora 1.0 文生视频【玩转华为云】
Open-Sora 1.0 文生视频【玩转华为云】
不久前,OpenAI Sora 凭借其惊人的视频生成效果迅速走红,在一堆文本转视频模型中脱颖而出,成为全球关注的焦点。之后,Colossal-AI团队又推出了新的开源解决方案“Open-Sora 1.0”,涵盖了整个训练过程,包括数据处理、所有训练细节和模型检查点,与世界各地的AI爱好者携手推进视频创作的新时代。
视频演示
让我们来看看Open-Sora 1.0的实际视频生成结果。
STDiT 模型结构示意图
例如,我们可以生成有关大海的视频。通过一个简单的输入提示,我们让 Open-Sore 生成一个关于珊瑚礁中海龟悠闲游泳的视频。
Prompt: A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world.
案例体验
现在,我们可以在AI Gallery中一键生成我们想要的视频,具体操作如下:
1. 运行案例
本案例需使用 Pytorch-1.8 GPU-V100 及以上规格运行,点击此处跳转到Notebook页面,然后点击Run in ModelArts一键运行,如果没有账号则需要登录并进行实名认证!
2. 下载代码和模型
import os
import moxing as mox
if not os.path.exists('Open-Sora'):
mox.file.copy_parallel('obs://modelbox-course/Open-Sora', 'Open-Sora')
if not os.path.exists('Open-Sora/opensora/models/pretrained-model'):
mox.file.copy_parallel('obs://modelbox-course/pretrained-model', 'Open-Sora/opensora/models/pretrained-model')
if not os.path.exists('Open-Sora/opensora/models/sd-vae-ft-ema'):
mox.file.copy_parallel('obs://modelbox-course/sd-vae-ft-ema', 'Open-Sora/opensora/models/sd-vae-ft-ema')
if not os.path.exists('Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl'):
mox.file.copy_parallel('obs://modelbox-course/t5-v1_1-xxl', 'Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl')
if not os.path.exists('/home/ma-user/work/frpc_linux_amd64'):
mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/frpc_linux_amd64', '/home/ma-user/work/frpc_linux_amd64')
if not os.path.exists('/home/ma-user/work/t5.py'):
mox.file.copy_parallel('obs://modelbox-course/t5.py', '/home/ma-user/work/t5.py')
默默等待全部下载完成...
INFO:root:Using MoXing-v2.1.6.879ab2f4-879ab2f4
INFO:root:Using OBS-Python-SDK-3.20.9.1
INFO:root:List OBS time cost: 0.95 seconds.
INFO:root:Copy parallel total time cost: 2.19 seconds.
INFO:root:List OBS time cost: 0.02 seconds.
INFO:root:Copy parallel total time cost: 21.72 seconds.
INFO:root:List OBS time cost: 0.24 seconds.
INFO:root:Copy parallel total time cost: 3.29 seconds.
INFO:root:List OBS time cost: 0.04 seconds.
INFO:root:Copy parallel total time cost: 39.24 seconds.
3. 配置运行环境
本案例依赖Python3.10.10及以上环境,因此我们首先创建虚拟环境:
!/home/ma-user/anaconda3/bin/conda clean -i
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
!/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
import json
import os
data = {
"display_name": "python-3.10.10",
"env": {
"PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin"
},
"language": "python",
"argv": [
"/home/ma-user/anaconda3/envs/python-3.10.10/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
]
}
if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"):
os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/")
with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f:
json.dump(data, f, indent=4)
conda env list
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
# conda environments:
#
base * /home/ma-user/anaconda3
python-3.10.10 /home/ma-user/anaconda3/envs/python-3.10.10
python-3.7.10 /home/ma-user/anaconda3/envs/python-3.7.10
Note: you may need to restart the kernel to use updated packages.
创建完成后,稍等片刻,或刷新页面,点击右上角kernel选择python-3.10.10
Python 3.10.10
确认Pyhont的版本是3.10.10,保证至少拥有32GB的显存
Sun Apr 28 18:28:30 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:00:0D.0 Off | 0 |
| N/A 31C P0 26W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
之后安装后面的依赖包...
4. 生成视频
修改配置文件:
%%writefile configs/opensora/inference/16x256x256_test.py
num_frames = 16
fps = 24 // 3
image_size = (256, 256)
# Define model
model = dict(
type="STDiT-XL/2",
space_scale=0.5,
time_scale=1.0,
enable_flashattn=False,
enable_layernorm_kernel=False,
from_pretrained="./opensora/models/pretrained-model/OpenSora-v1-HQ-16x256x256.pth",
)
vae = dict(
type="VideoAutoencoderKL",
from_pretrained="./opensora/models/sd-vae-ft-ema",
micro_batch_size=4,
)
text_encoder = dict(
type="t5",
from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
model_max_length=120,
)
scheduler = dict(
type="iddpm",
num_sampling_steps=100,
cfg_scale=7.0,
cfg_channel=3, # or None
)
dtype = "fp16"
# Others
batch_size = 1
seed = 42
prompt_path = "./assets/texts/t2v_samples.txt"
save_dir = "./outputs/samples/"
运行命令,生成的视频保存在Open-Sora/outputs
文件夹中,随机查看:
!cp /home/ma-user/work/t5.py /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py
!torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256_test.py
5. Gradio App
重写推理代码实现Gradio界面:
%%writefile scripts/inference-gradio.py
import os
import gradio as gr
import torch
import colossalai
import torch.distributed as dist
from mmengine.runner import set_random_seed
from opensora.datasets import save_sample
from opensora.registry import MODELS, SCHEDULERS, build_module
from opensora.utils.config_utils import parse_configs
from opensora.utils.misc import to_torch_dtype
from opensora.acceleration.parallel_states import set_sequence_parallel_group
from colossalai.cluster import DistCoordinator
def main():
# ======================================================
# 1. cfg and init distributed env
# ======================================================
cfg = parse_configs(training=False)
print(cfg)
# init distributed
colossalai.launch_from_torch({})
coordinator = DistCoordinator()
if coordinator.world_size > 1:
set_sequence_parallel_group(dist.group.WORLD)
enable_sequence_parallelism = True
else:
enable_sequence_parallelism = False
# ======================================================
# 2. runtime variables
# ======================================================
torch.set_grad_enabled(False)
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = to_torch_dtype(cfg.dtype)
set_random_seed(seed=cfg.seed)
# ======================================================
# 3. build model & load weights
# ======================================================
# 3.1. build model
input_size = (cfg.num_frames, *cfg.image_size)
vae = build_module(cfg.vae, MODELS)
latent_size = vae.get_latent_size(input_size)
text_encoder = build_module(cfg.text_encoder, MODELS, device=device) # T5 must be fp32
model = build_module(
cfg.model,
MODELS,
input_size=latent_size,
in_channels=vae.out_channels,
caption_channels=text_encoder.output_dim,
model_max_length=text_encoder.model_max_length,
dtype=dtype,
enable_sequence_parallelism=enable_sequence_parallelism,
)
text_encoder.y_embedder = model.y_embedder # hack for classifier-free guidance
# 3.2. move to device & eval
vae = vae.to(device, dtype).eval()
model = model.to(device, dtype).eval()
# 3.3. build scheduler
scheduler = build_module(cfg.scheduler, SCHEDULERS)
# 3.4. support for multi-resolution
model_args = dict()
if cfg.multi_resolution:
image_size = cfg.image_size
hw = torch.tensor([image_size], device=device, dtype=dtype).repeat(cfg.batch_size, 1)
ar = torch.tensor([[image_size[0] / image_size[1]]], device=device, dtype=dtype).repeat(cfg.batch_size, 1)
model_args["data_info"] = dict(ar=ar, hw=hw)
# ======================================================
# 4. inference
# ======================================================
# 4.1.inference code
@torch.no_grad()
def run_inference(prompt_text):
save_dir = cfg.save_dir
torch.cuda.empty_cache()
print("Prompt:", prompt_text)
os.makedirs(save_dir, exist_ok=True)
samples = scheduler.sample(
model,
text_encoder,
z_size=(vae.out_channels, *latent_size),
prompts=[prompt_text],
device=device,
additional_args=model_args,
)
samples = vae.decode(samples.to(dtype))
save_path = os.path.join(save_dir, "sample")
saved_path = save_sample(samples[0], fps=cfg.fps, save_path=save_path)
return saved_path
# 4.2. clear input
def reset_user_input():
return gr.update(value='')
# 4.3. gradio app
with gr.Blocks() as demo:
gr.HTML("""<h1 align="center">Open-Sora 文生视频</h1>""")
with gr.Row():
with gr.Column():
prompt_text = gr.Textbox(label="Prompt",placeholder="Describe your video here", lines=4)
submit_button = gr.Button("Generate video")
with gr.Column():
output_video = gr.Video(width=512, height=512)
submit_button.click(run_inference, [prompt_text], [output_video], show_progress=True)
submit_button.click(reset_user_input, [], [prompt_text])
gr.Examples(
examples=[
["A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life."],
["A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest."],
["A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world."]
],
inputs=[prompt_text]
)
demo.queue().launch(share=True, inbrowser=True)
if __name__ == "__main__":
main()
Writing scripts/inference-gradio.py
运行命令,运行成功后点击 Running on public URL 后的网页链接即可体验!
!torchrun --standalone --nproc_per_node 1 scripts/inference-gradio.py configs/opensora/inference/16x256x256_test.py
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
Config (path: configs/opensora/inference/16x256x256_test.py): {'num_frames': 16, 'fps': 8, 'image_size': (256, 256), 'model': {'type': 'STDiT-XL/2', 'space_scale': 0.5, 'time_scale': 1.0, 'enable_flashattn': False, 'enable_layernorm_kernel': False, 'from_pretrained': './opensora/models/pretrained-model/OpenSora-v1-HQ-16x256x256.pth'}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': './opensora/models/sd-vae-ft-ema', 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': './opensora/models/text_encoder/t5-v1_1-xxl', 'model_max_length': 120}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0, 'cfg_channel': 3}, 'dtype': 'fp16', 'batch_size': 1, 'seed': 42, 'prompt_path': './assets/texts/t2v_samples.txt', 'save_dir': './outputs/samples/', 'multi_resolution': False}
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/initialize.py:48: UserWarning: `config` is deprecated and will be removed soon.
warnings.warn("`config` is deprecated and will be removed soon.")
[04/28/24 19:47:12] INFO colossalai - colossalai - INFO:
/home/ma-user/anaconda3/envs/python-3.10.10/lib/pyt
hon3.10/site-packages/colossalai/initialize.py:67
launch
INFO colossalai - colossalai - INFO: Distributed
environment is initialized, world size: 1
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:33<00:00, 16.80s/it]
Missing keys: ['pos_embed', 'pos_embed_temporal']
Unexpected keys: []
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://b041abff94f403b77b.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Prompt: A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life.
100%|█████████████████████████████████████████| 100/100 [00:30<00:00, 3.33it/s]
Saved to ./outputs/samples/sample.mp4
- 点赞
- 收藏
- 关注作者
评论(0)