Codegeex在华为云昇腾上实现模型推理,并部署到华为昇腾NPU的详细步骤
CodeGeeX4-ALL-9B是最新的 CodeGeeX4 系列模型的开源版本,是由 THUDM 推出的全能开源多语言代码生成模型,该模型是在 Codegeex 基础上持续训练的多语言代码生成模型,显著提升了代码生成能力。本文的主要目的是将Codegeex项目移植到华为的 Ascend NPU、Kunpeng CPU 以及 OpenEuler 操作系统上,确保其在这些平台上的性能优化和正常运行。
CodeGeeX4-ALL-9B仓库地址
目录
一、华为云ECS验证
1.购买华为云虚拟私有云VPC和弹性云服务器ECS
详解教程参考链接:创建华为云弹性云服务器ECS流程
已有忽略
2.环境搭建
python版本为3.9,
#包安装
pip install torch==2.5.1 accelerate==1.1.1 tiktoken==0.3.3 transformers==4.39.0
代码拉取:
https://github.com/THUDM/CodeGeeX4
无法通过hungging face获取模型,使用ModelScope获取模型参数
通过ModelScope下载到当前文件夹下
pip install modelscope
modelscope download --model ZhipuAI/codegeex4-all-9b --local_dir .
3.准备测试
新建测试文件run.py,复制下面代码,使用python run.py准备测试
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("./codegeex4-all-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"./codegeex4-all-9b",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
with torch.no_grad():
outputs = model.generate(**inputs,max_new_tokens=2048)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
生成结果(略有差异):
4.构建本地示例应用
4.1 查看弹性公网IP和设置开放端口
查看后得到用于登录Web应用界面的IP地址。这里为:http://1.94.197.211:7860/
4.2 使用gradio构建Web应用界面
pip install gradio
4.3 修改run.py代码,如下
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import gradio as gr
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("./codegeex4-all-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"./codegeex4-all-9b",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
def codegee_inference(text):
inputs = tokenizer.apply_chat_template([{"role": "user", "content": text}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
with torch.no_grad():
outputs = model.generate(**inputs,max_new_tokens=2048)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
return tokenizer.decode(outputs[0], skip_special_tokens=True)
return "error"
# 创建 Gradio 接口
iface = gr.Interface(
fn=codegee_inference,
inputs="text",
outputs="text",
title="CodegeeX4模型调用",
description="输入文字,模型调用并显示结果。"
)
# 启动服务
iface.launch(server_name="0.0.0.0", server_port=7860)
nohup 运行run.py,并在浏览器转到http://1.94.197.211:7860/打开服务页面,可以输入以下文本让模型推理(注意:推理时间较长,请耐心等待)
基于python代码编写一个快速排序代码
运行结果:
二、部署到昇腾NPU
昇腾环境:
芯片类型:昇腾910B3
CANN版本:CANN 7.0.1.5
驱动版本:23.0.6
操作系统:Huawei Cloud EulerOS 2.0
1.查看NPU硬件信息
npu-smi info
如果 Health 状态为 OK,说明 NPU 和 CANN 正常运行。
2.NPU环境下运行代码
代码拉取:
git clone https://github.com/THUDM/CodeGeeX4.git
模型参数下载
pip install modelscope
modelscope download --model ZhipuAI/codegeex4-all-9b --local_dir .
环境配置:创建python==3.9环境并激活
conda create -n py39 python==3.9
conda activate py39
进入CodeGeeX4/local_mode文件路径下,
pip install -r requirement.txt
vim run.py
添加下边代码
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch_npu
from torch_npu.contrib import transfer_to_npu
if not torch.npu.is_available():
raise RuntimeError("NPU is not available. Please check your environment.")
# 指定设备
device = torch.device("npu")
#device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("此处更改为模型参数路径", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"此处更改为模型参数路径",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="
pt", return_dict=True).to(device)
with torch.no_grad():
outputs = model.generate(**inputs,max_length=1024)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
2.1 下载torch_npu插件
pip install torch==2.1.0 torch_npu==2.1.0.post8
运行代码:
python run.py
结果展示:
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 7.38it/s]
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and the other array holds values greater than the pivot value.
Here is a simple implementation of Quick Sort in Python:
```python
def quick_sort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
# Test the function
print(quick_sort([3,6,8,10,1,2,1]))
```
In this code, the function `quick_sort` takes a list `arr` as input. If the length of the list is less than or equal to 1, it returns the list as it is already sorted. Otherwise, it selects a pivot (in this case, the middle element of the list), and creates three lists: `left` (elements less than the pivot), `middle` (elements equal to the pivot), and `right` (elements greater than the pivot). It then recursively sorts the `left` and `right` lists and concatenates them with the `middle` list to get the sorted list.
- 点赞
- 收藏
- 关注作者
评论(0)