- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

如何用华为云ModelArts平台玩转Llama2

码上开花_Lancer 发表于 2023/09/08 11:59:44 2023/09/08

【摘要】 MetaAI开源了Llama2模型，我只想说一句：“「MetaAI改名叫OpenAI吧！」”Llama2不仅开源了预训练模型，而且还开源了利用对话数据SFT后的Llama2-Chat模型，并对Llama2-Chat模型的微调进行了详细的介绍。开源模型目前有7B、13B、70B三种尺寸，预训练阶段使用了2万亿Token，SFT阶段使用了超过10w数据，人类偏好数据超过100w。![image....

MetaAI开源了Llama2模型，我只想说一句：“「MetaAI改名叫OpenAI吧！」”

Llama2不仅开源了预训练模型，而且还开源了利用对话数据SFT后的Llama2-Chat模型，并对Llama2-Chat模型的微调进行了详细的介绍。

开源模型目前有7B、13B、70B三种尺寸，预训练阶段使用了2万亿Token，SFT阶段使用了超过10w数据，人类偏好数据超过100w。

发布不到一周的Llama 2，已经在研究社区爆火，一系列性能评测、在线试用的demo纷纷出炉。

就连OpenAI联合创始人Karpathy用C语言实现了对Llama 2婴儿模型的推理。

既然Llama 2现已人人可用，那么如何在华为云上去微调实现更多可能的应用呢？

打开华为云的ModelArts 创建notebook,首先需要下载数据集上传到OBS对象存储空间中，再通过命令copy到本地。

数据集地址：https://huggingface.co/datasets/samsum

打开obs上传下载的数据集：

import moxing as mox
mox.file.copy_parallel('obs://xxx/llma2_samsum_data','data')

1. 下载模型

克隆Meta的Llama推理存储库（包含下载脚本）：

!git clone https://github.com/facebookresearch/llama.git

Cloning into 'llama'...
remote: Enumerating objects: 353, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 353 (delta 1), reused 3 (delta 0), pack-reused 346
Receiving objects: 100% (353/353), 1.07 MiB | 2.09 MiB/s, done.
Resolving deltas: 100% (186/186), done.

然后运行下载脚本：

!bash download.sh

在这里，你只需要下载7B模型就可以了。

2. 将模型转换为Hugging Face支持的格式

!pip install git https://github.com/huggingface/transformerscd transformerspython convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir models_hf/7B

现在，我们得到了一个Hugging Face模型，可以利用Hugging Face库进行微调了！

3. 运行微调笔记本：

克隆Llama-recipies存储库：

!git clone https://github.com/facebookresearch/llama-recipes.git

然后，在你喜欢的notebook界面中打开quickstart.ipynb文件，并运行整个notebook。

（此处，使用的是Jupyter lab）：

!pip install jupyterlabjupyter lab # in the repo you want to work in

为了适应转换后的实际模型路径，确保将以下一行更改为：

model_id="./models_hf/7B"

最后，一个经过Lora微调的模型就完成了。

4. 在微调的模型上进行推理

当前，问题在于Hugging Face只保存了适配器权重，而不是完整的模型。所以我们需要将适配器权重加载到完整的模型中。

导入库：

import torchfrom transformers
import LlamaForCausalLM, LlamaTokenizerfrom peft import PeftModel, PeftConfig

加载分词器和模型：

model_id="./models_hf/7B"tokenizer = LlamaTokenizer.from_pretrained(model_id)model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

从训练后保存的位置加载适配器：

model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

运行推理：

eval_prompt = """Summarize this dialog:A: Hi Tom, are you busy tomorrow’s afternoon?B: I’m pretty sure I am. What’s up?A: Can you go with me to the animal shelter?.B: What do you want to do?A: I want to get a puppy for my son.B: That will make him so happy.A: Yeah, we’ve discussed it many times. I think he’s ready now.B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)A: I'll get him one of those little dogs.B: One that won't grow up too big;-)A: And eat too much;-))B: Do you know which one he would like?A: Oh, yes, I took him there last Monday. He showed me one that he really liked.B: I bet you had to drag him away.A: He wanted to take it home right away ;-).B: I wonder what he'll name it.A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))---Summary:"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()with torch.no_grad(): print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

LLM Engine微调更便捷

如果你想用自己的数据对Llama 2微调，该如何做？

创办Scale AI初创公司的华人CEO Alexandr Wang表示，自家公司开源的LLM Engine，能够用最简单方法微调Llama 2。

Scale AI的团队在一篇博文中，具体介绍了Llama 2的微调方法。

from llmengine import FineTuneresponse = FineTune.create( model="llama-2-7b", training_file="s3://my-bucket/path/to/training-file.csv",)
print(response.json())

数据集

在如下示例中，Scale使用了Science QA数据集。

这是一个由多项选择题组成的流行数据集，每个问题可能有文本上下文和图像上下文，并包含支持解决方案的详尽解释和讲解。

Science QA的示例

目前，LLM Engine支持对「提示完成对」进行微调。首先，需要将Science QA数据集转换为支持的格式，一个包含两列的CSV：prompt和response 。

在开始之前，请安装所需的依赖项。

!pip install datasets==2.13.1 smart_open[s3]==5.2.1 pandas==1.4.4

可以从Hugging Face加载数据集，并观察数据集的特征。

from datasets import load_datasetfrom smart_open import smart_openimport pandas as pd
dataset = load_dataset('derek-thomas/ScienceQA')dataset['train'].features

提供Science QA示例的常用格式是：

Context: A baby wants to know what is inside of a cabinet. Her hand applies a force to the door, and the door opens.Question: Which type of force from the baby's hand opens the cabinet door?Options: (A) pull (B) pushAnswer: A.

由于Hugging Face数据集中options的格式是「可能答案的列表」，需要通过添加枚举前缀，将此列表转换为上面的示例格式。

choice_prefixes = [chr(ord('A') + i) for i in range(26)] # A-Zdef format_options(options, choice_prefixes): return ' '.join([f'({c}) {o}' for c, o in zip(choice_prefixes, options)])

现在，编写格式化函数，将这个数据集中的单个样本转换为输入模型的prompt和response 。

def format_prompt(r, choice_prefixes): 
    options = format_options(r['choices'], choice_prefixes) 
    return f'''Context: {r["hint"]}\nQuestion: {r["question"]}\nOptions:{options}\nAnswer:'''
def format_response(r, choice_prefixes):
    return choice_prefixes[r['answer']]

最后，构建数据集。

请注意，Science QA中的某些示例只有上下文图像。（如下演示中会跳过这些示例，因为Llama-2纯粹是一种语言模型，并且不能接受图像输入。）

def convert_dataset(ds): 
    prompts = [format_prompt(i, choice_prefixes) for i in ds if i['hint'] != '']
    labels = [format_response(i, choice_prefixes) for i in ds if i['hint'] != ''] 
    df = pd.DataFrame.from_dict({'prompt': prompts, 'response': labels}) 
     return df

LLM Engine支持使用「预训练和验证数据集」来进行训练。假如你只提供训练集，LLM Engine会从数据集中随机拆分10%内容进行验证。

因为拆分数据集可以防止模型过度拟合训练数据，不会导致在推理期间实时数据泛化效果不佳。

另外，这些数据集文件必须存储在可公开访问的URL中，以便LLM Engine可以读取。对于此示例，Scale将数据集保存到s3。

并且，还在Github Gist中公开了预处理训练数据集和验证数据集。你可以直接用这些链接替换train_url和val_url 。

train_url = 's3://...'val_url = 's3://...'df_train = convert_dataset(dataset['train'])with smart_open(train_url, 'wb') as f: df_train.to_csv(f)df_val = convert_dataset(dataset['validation'])with smart_open(val_url, 'wb') as f:df_val.to_csv(f)

现在，可以通过LLM Engine API开始微调。

微调
首先，需要安装LLM Engine。

!pip install scale-llm-engine

接下来，你需要设置Scale API密钥。按照README的说明获你唯一的API密钥。

高级用户还可以按照自托管LLM Engine指南进行操作，由此就不需要Scale API密钥。

import os
os.environ['SCALE_API_KEY'] = 'xxx'

一旦你设置好一切，微调模型只需要一个API的调用。

在此，Scale选择了Llama-2的70亿参数版本，因为它对大多数用例来说已经足够强大了。

from llmengine import FineTuneresponse = FineTune.create( model="llama-2-7b", training_file=train_url, validation_file=val_url, hyperparameters={ 'lr':2e-4, }, suffix='science-qa-llama')run_id = response.fine_tune_id

通过run_id ，你可以监控工作状态，并获取每个epoch的实时更新指标，比如训练和验证损失。

Science QA是一个大型数据集，因此训练可能需要一两个小时才能完成。

while True: job_status = FineTune.get(run_id).status # Returns one of `PENDING`, `STARTED`, `SUCCESS`, `RUNNING`, # `FAILURE`, `CANCELLED`, `UNDEFINED` or `TIMEOUT` print(job_status) if job_status == 'SUCCESS': break time.sleep(60)#Logs for completed or running jobs can be fetched withlogs = FineTune.get_events(run_id)

推理与评估

完成微调后，你可以开始对任何输入生成响应。但是，在此之前，确保模型存在，并准备好接受输入。

ft_model = FineTune.get(run_id).fine_tuned_model

不过，你的第一个推理结果可能需要几分钟才能输出。之后，推理过程就会加快。

一起评估下在Science QA上微调的Llama-2模型的性能。

import pandas as pd
#Helper a function to get outputs for fine-tuned model with retriesdef 
get_output(prompt: str, num_retry: int = 5): 
for _ in range(num_retry):
    try: response = Completion.create( model=ft_model, prompt=prompt, max_new_tokens=1, temperature=0.01 ) 
            return response.output.text.strip()
     except Exception as e: print(e) 
           return ""
#Read the test datatest = pd.read_csv(val_url)
test["prediction"] = test["prompt"].apply(get_output)
print(f"Accuracy: {(test['response'] == test['prediction']).mean() * 100:.2f}%")

微调后的Llama-2能够达到82.15%的准确率，已经相当不错了。

那么，这个结果与Llama-2基础模型相比如何？

由于预训练模型没有在这些数据集上进行微调，因此需要在提示中提供一个示例，以便模型学会遵从我们期望的回复格式。

另外，我们还可以看到与微调类似大小的模型MPT-7B相比的情况。

在Science QA上微调Llama-2，其性能增益有26.59%的绝对差异！

此外，由于提示长度较短，使用微调模型进行推理比使用少样本提示更便宜。这种微调Llama-27B模型也优于1750亿参数模型GPT-3.5。

可以看到，Llama-2模型在微调和少样本提示设置中表现都优于MPT，充分展示了它作为基础模型和可微调模型的优势。

此外，Scale还使用LLM Engine微调和评估LLAMA-2在GLUE（一组常用的NLP基准数据集）的几个任务上的性能。

现在，任何人都可以释放微调模型的真正潜力，并见证强大的AI生成回复的魔力。

我发现虽然Huggingface在transformers方面构建了一个出色的库，但他们的指南对于普通用户来说往往过于复杂。

参考资料：
https://brev.dev/blog/fine-tuning-llama-2
https://scale.com/blog/fine-tune-llama-2

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

如何用华为云ModelArts平台玩转Llama2

1. 下载模型

2. 将模型转换为Hugging Face支持的格式

3. 运行微调笔记本：

4. 在微调的模型上进行推理

数据集

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

如何用华为云ModelArts平台玩转Llama2

1. 下载模型

2. 将模型转换为Hugging Face支持的格式

3. 运行微调笔记本：

4. 在微调的模型上进行推理

数据集

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品