- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

Codegeex在华为云昇腾上实现模型推理，并部署到华为昇腾NPU的详细步骤

阳煦~ 发表于 2024/11/27 15:05:12 2024/11/27

【摘要】 CodeGeeX4-ALL-9B是最新的 CodeGeeX4 系列模型的开源版本，是由 THUDM 推出的全能开源多语言代码生成模型，该模型是在 Codegeex 基础上持续训练的多语言代码生成模型，显著提升了代码生成能力。本文的主要目的是将Codegeex项目移植到华为的 Ascend NPU、Kunpeng CPU 以及 OpenEuler 操作系统上，确保其在这些平台上的性能优化和正常运行。

CodeGeeX4-ALL-9B是最新的 CodeGeeX4 系列模型的开源版本，是由 THUDM 推出的全能开源多语言代码生成模型，该模型是在 Codegeex 基础上持续训练的多语言代码生成模型，显著提升了代码生成能力。本文的主要目的是将Codegeex项目移植到华为的 Ascend NPU、Kunpeng CPU 以及 OpenEuler 操作系统上，确保其在这些平台上的性能优化和正常运行。
CodeGeeX4-ALL-9B仓库地址

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("./codegeex4-all-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "./codegeex4-all-9b",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).to(device).eval()
inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
with torch.no_grad():
    outputs = model.generate(**inputs,max_new_tokens=2048)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

生成结果（略有差异）：

4.构建本地示例应用

4.1 查看弹性公网IP和设置开放端口

查看后得到用于登录Web应用界面的IP地址。这里为:http://1.94.197.211:7860/

4.2 使用gradio构建Web应用界面

pip install gradio

4.3 修改run.py代码，如下

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import gradio as gr
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("./codegeex4-all-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "./codegeex4-all-9b",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).to(device).eval()


def codegee_inference(text):
    inputs = tokenizer.apply_chat_template([{"role": "user", "content": text}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
    with torch.no_grad():
        outputs = model.generate(**inputs,max_new_tokens=2048)
        outputs = outputs[:, inputs['input_ids'].shape[1]:]
        return tokenizer.decode(outputs[0], skip_special_tokens=True)
    return "error"


# 创建 Gradio 接口
iface = gr.Interface(
    fn=codegee_inference,
    inputs="text",
    outputs="text",
    title="CodegeeX4模型调用",
    description="输入文字，模型调用并显示结果。"
)

# 启动服务
iface.launch(server_name="0.0.0.0", server_port=7860)

nohup 运行run.py，并在浏览器转到http://1.94.197.211:7860/打开服务页面，可以输入以下文本让模型推理（注意：推理时间较长，请耐心等待）

基于python代码编写一个快速排序代码

运行结果：

二、部署到昇腾NPU

昇腾环境：
芯片类型：昇腾910B3
CANN版本：CANN 7.0.1.5
驱动版本：23.0.6
操作系统：Huawei Cloud EulerOS 2.0

1.查看NPU硬件信息

npu-smi info

如果 Health 状态为 OK，说明 NPU 和 CANN 正常运行。

2.NPU环境下运行代码

代码拉取：

git clone https://github.com/THUDM/CodeGeeX4.git

模型参数下载

pip install modelscope
modelscope download --model ZhipuAI/codegeex4-all-9b  --local_dir .

环境配置：创建python==3.9环境并激活

conda create -n py39 python==3.9
conda activate py39

进入CodeGeeX4/local_mode文件路径下，

pip install -r requirement.txt

新建run.py,

vim run.py

添加下边代码

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch_npu
from torch_npu.contrib import transfer_to_npu
if not torch.npu.is_available():
    raise RuntimeError("NPU is not available. Please check your environment.")

# 指定设备
device = torch.device("npu")
#device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("此处更改为模型参数路径", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "此处更改为模型参数路径",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
    ).to(device).eval()
inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="
pt", return_dict=True).to(device)
with torch.no_grad():
    outputs = model.generate(**inputs,max_length=1024)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2.1 下载torch_npu插件

pip install torch==2.1.0 torch_npu==2.1.0.post8

运行代码：

python run.py

结果展示：

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  7.38it/s]
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and the other array holds values greater than the pivot value.

Here is a simple implementation of Quick Sort in Python:

```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[len(arr) // 2]
        left = [x for x in arr if x < pivot]
        middle = [x for x in arr if x == pivot]
        right = [x for x in arr if x > pivot]
        return quick_sort(left) + middle + quick_sort(right)

# Test the function
print(quick_sort([3,6,8,10,1,2,1]))
```

In this code, the function `quick_sort` takes a list `arr` as input. If the length of the list is less than or equal to 1, it returns the list as it is already sorted. Otherwise, it selects a pivot (in this case, the middle element of the list), and creates three lists: `left` (elements less than the pivot), `middle` (elements equal to the pivot), and `right` (elements greater than the pivot). It then recursively sorts the `left` and `right` lists and concatenates them with the `middle` list to get the sorted list.

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

Codegeex在华为云昇腾上实现模型推理，并部署到华为昇腾NPU的详细步骤

目录

一、华为云ECS验证

1.购买华为云虚拟私有云VPC和弹性云服务器ECS

2.环境搭建

3.准备测试

4.构建本地示例应用

4.1 查看弹性公网IP和设置开放端口

4.2 使用gradio构建Web应用界面

4.3 修改run.py代码，如下

二、部署到昇腾NPU

1.查看NPU硬件信息

2.NPU环境下运行代码

2.1 下载torch_npu插件

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

Codegeex在华为云昇腾上实现模型推理，并部署到华为昇腾NPU的详细步骤

目录

一、华为云ECS验证

1.购买华为云虚拟私有云VPC和弹性云服务器ECS

2.环境搭建

3.准备测试

4.构建本地示例应用

4.1 查看弹性公网IP和设置开放端口

4.2 使用gradio构建Web应用界面

4.3 修改run.py代码，如下

二、部署到昇腾NPU

1.查看NPU硬件信息

2.NPU环境下运行代码

2.1 下载torch_npu插件

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品