昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

目录

  • 前情提要
    • torch_npu框架
    • mindsport框架
    • mindnlp框架
  • 下载模型
    • 国外
    • 国内
  • 环境设置
  • 代码适配(非流式)
    • Main
    • Branch
    • 结果展示
  • 代码适配(流式)

前情提要

torch_npu框架

官方未适配
在这里插入图片描述

mindsport框架

官方未适配
在这里插入图片描述

mindnlp框架

官方适配了,但是速度非常非常慢,10秒一个字
在这里插入图片描述

下载模型

国外

Hugging FaceHugging Face

国内

在这里插入图片描述modelscope

环境设置

pip install transformers==4.39.2
pip3 install torch==2.1.0
pip3 install torch-npu==2.1.0.post4
pip3 install accelerate==0.24.1
pip3 install transformers-stream-generator==0.0.5

代码适配(非流式)

Main

import torch
import torch_npu
import os
import platform
torch_device = "npu:1" # 0~7
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
from transformers import AutoModelForCausalLM, AutoTokenizer
# device = "cuda" # the device to load the model onto
DEFAULT_CKPT_PATH = '/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct'
model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_CKPT_PATH,
    torch_dtype=torch.float16,
    device_map=torch_device
).npu().eval()
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_CKPT_PATH)
while True:
    prompt = input("user:")
    if prompt == "exit":
        break
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(torch_device)

    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print("Qwen2-7B-Instruct:",response)

Branch

找到自己虚拟环境

which python

我的是/root/anaconda3/envs/sakura/bin/python
找到/lib/python3.9/site-packages/transformers/generation/utils.py示例:

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py

找到第2708行,注释掉2708行~2712行
在2709行添加

next_token_scores = outputs.logits[:, -1, :]

示例:
在这里插入图片描述
出错就是在这里,如果进行了pre-process distribution,就会报错

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/logits_process.py:455: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
Traceback (most recent call last):
  File "/root/Qwen_test.py", line 63, in <module>
    generated_ids = model.generate(
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._sample(
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 2736, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: Sync:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158 NPU error, error code is 507018
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
E39999: Inner Error!
E39999: 2024-07-02-14:14:50.735.070  An exception occurred during AICPU execution, stream_id:23, task_id:2750, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730]
        TraceBack (most recent call last):
        rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]


DEVICE[1] PID[864803]:
EXCEPTION TASK:
  Exception info:TGID=864803, model id=65535, stream id=23, stream phase=SCHEDULE, task id=2750, task type=aicpu kernel, recently received task id=2750, recently send task id=2749, task phase=RUN
  Message info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210
    Other info[0]:time=2024-07-02-14:14:50.091.974, function=proc_aicpu_task_done, line=970, error code=0x2a
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999: 2024-07-02-14:14:50.743.702  Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776]
        TraceBack (most recent call last):
        Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579]
        Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512]
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.745.695  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.747.300  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.814.377  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.816.023  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.817.628  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.819.236  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.820.843  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.822.422  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)

结果展示

最后运行Main文件
在这里插入图片描述

代码适配(流式)

未完待续

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/774613.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

add_metrology_object_generic 添加测量模型对象。找两条直线,并计算两条线的夹角和两个线的总长度,转换成毫米单位

*添加测量模型对象 *将测量对象添加到测量模型中 *算子参数&#xff1a; *    MeasureHandle&#xff1a;输入测量模型的句柄&#xff1b; *    Shape&#xff1a;输入要测量对象的类型&#xff1b;默认值&#xff1a;‘circle’&#xff0c;参考值&#xff1a;‘circl…

淘宝扭蛋机小程序:打造新的扭蛋体验

扭蛋机行业近年来发展非常迅速&#xff0c;呈现出了明显的增长势头&#xff0c;深受年轻消费者的青睐。当下在消费市场中&#xff0c;年轻人占据了很大的份额&#xff0c;这也推动了扭蛋机市场的发展。如今&#xff0c;扭蛋机也正在向多个方向发展&#xff0c;不再局限于线下扭…

如何利用代理IP打造热门文章

作为内容创作者&#xff0c;我们都知道&#xff0c;有时候地理限制和访问障碍可能会成为我们获取新鲜素材和优质信息的障碍。使用代理IP&#xff0c;正是突破这些限制的好方法&#xff01; 1. 无缝获取全球视野 如果你还在苦恼看不到其他地区的热点文章&#xff0c;你可以尝试…

保障信息资产:ISO 27001信息安全管理体系的重要性

在当今数字化和全球化的时代&#xff0c;信息安全已经成为企业成功和持续发展的关键因素之一。随着信息技术的快速发展和互联网的普及&#xff0c;企业面临着越来越多的信息安全威胁和挑战&#xff0c;如数据泄露、网络攻击、恶意软件等。为了有效应对这些威胁&#xff0c;企业…

docker集群部署主从mysql

搭建一个mysql集群&#xff0c;1主2从&#xff0c;使用docker容器 一、创建docker的mysql镜像 下次补上&#xff0c;因为现在很多网络不能直接pull&#xff0c;操作下次补上。 二、创建mysql容器 创建容器1 docker run -it -d --name mysql_1 -p 7001:3306 --net mynet --…

Astro新前端框架首次体验

Astro新前端框架首次体验 1、什么是Astro Astro是一个静态网站生成器的前端框架&#xff0c;它提供了一种新的开发方式和更好的性能体验&#xff0c;帮助开发者更快速地构建现代化的网站和应用程序。 简单来说就是&#xff1a;Astro这个是一个网站生成器&#xff0c;可以直接…

生成式人工智能如何改变软件开发:助手还是取代者?

生成式人工智能如何改变软件开发&#xff1a;助手还是取代者&#xff1f; 生成式人工智能&#xff08;AIGC&#xff09;正在引领软件开发领域的技术变革。从代码生成、错误检测到自动化测试&#xff0c;AI工具在提高开发效率的同时&#xff0c;也引发了对开发者职业前景的讨论…

标贝语音识别在智能会议系统的应用案例

语音识别是指将语音信号转换成文本或者其他数字信号形式的过程&#xff0c;随着人工智能在人们日常工作生活中的普及&#xff0c;语音识别技术也被广泛的应用在智能家居、智能会议、智能客服、智能驾驶等领域&#xff0c;以语音识别技术在智能会议系统中的应用为例&#xff0c;…

【读点论文】基于二维伽马函数的光照不均匀图像自适应校正算法

基于二维伽马函数的光照不均匀图像自适应校正算法 摘 要:提出了一种基于二维伽马函数的光照不均匀图像自适应校正算法.利用多尺度高斯函数提取出场景的光照分量,然后构造了一种二维伽马函数,并利用光照分量的分布特性调整二维伽马函数的参数,降低光照过强区域图像的亮度值,提高…

服务器U盘安装Centos 7时提示Warning:/dev/root does not exist

这是没有找到正确的镜像路径导致的&#xff0c;我们可以在命令行输入ls /dev看一下有哪些盘符 像图中红色圈起来的就是我插入U盘的盘符&#xff0c;大家的输几盘可能做了多个逻辑盘&#xff0c;这种情况下就可以先将U盘拔掉再ls /dev看一下和刚才相比少了那两个盘符&#xff0c…

Redis高级篇之最佳实践

Redis高级篇之最佳实践 今日内容 Redis键值设计批处理优化服务端优化集群最佳实践 1、Redis键值设计 1.1、优雅的key结构 Redis的Key虽然可以自定义&#xff0c;但最好遵循下面的几个最佳实践约定&#xff1a; 遵循基本格式&#xff1a;[业务名称]:[数据名]:[id]长度不超过…

空调计费系统是什么,你知道吗

空调计费系统是一种通过对使用空调的时间和能源消耗进行监测和计量来进行费用计算的系统。它广泛应用于各种场所&#xff0c;如家庭、办公室、商场等&#xff0c;为用户提供了方便、准确的能源使用管理和费用控制。 可实现功能 智能计费&#xff1a;中央空调分户计费系统通过智…

光电液位传感器在宠物洗澡机的应用

光电液位传感器在宠物洗澡机中的应用&#xff0c;为洗澡机的智能化管理提供了重要支持和保障。这种先进的传感技术不仅提升了设备的操作便捷性&#xff0c;还大幅度提高了洗澡过程的安全性和效率。 宠物洗澡机作为宠物护理的重要设备&#xff0c;其水位的控制至关重要。光电液…

SD16S1Y 符合GB2312标准16X16点阵汉字库芯片IC

一般概述 SD16S1Y是一款内含16x16点阵的汉字库芯片&#xff0c;支持GB2312国标简体汉字(含有国家信标委 合法授权)、ASCII字符。排列格式为竖置横排。用户通过字符内码&#xff0c;利用本手册提供的方法计算出 该字符点阵在芯片中的地址&#xff0c;可从该地址连续读出字…

STM32/GD32驱动步进电机芯片TM2160

文章目录 官方概要简单介绍整体架构流程 官方概要 TMC2160是一款带SPI接口的大功率步进电机驱动IC。它具有业界最先进的步进电机驱动器&#xff0c;具有简单的步进/方向接口。采用外部晶体管&#xff0c;可实现高动态、高转矩驱动。基于TRINAMICs先进的spreadCycle和stealthCh…

以太网协议介绍——UDP

注&#xff1a;需要先了解一些以太网的背景知识&#xff0c;方便更好理解UDP协议、 以太网基础知识一 以太网基础知识二 UDP协议 UDP即用户数据报协议&#xff0c;是一种面向无连接的传输层协议&#xff0c;属于 TCP/IP 协议簇的一种。UDP具有消耗资源少、通信效率高等优点&a…

第二届计算机、视觉与智能技术国际会议(ICCVIT 2024)

随着科技的飞速发展&#xff0c;计算机、视觉与智能技术已成为推动现代社会进步的重要力量。为了汇聚全球顶尖专家学者&#xff0c;共同探讨这一领域的最新研究成果和前沿技术&#xff0c;第二届计算机、视觉与智能技术国际会议&#xff08;ICCVIT 2024&#xff09;将于2024年1…

从海上长城到数字防线:视频技术在海域边防现代化中的创新应用

随着全球化和科技发展的加速&#xff0c;海域安全问题日益凸显其重要性。海域边防作为国家安全的第一道防线&#xff0c;其监控和管理面临着诸多挑战。近年来&#xff0c;视频技术的快速发展为海域边防场景提供了新的解决方案&#xff0c;其高效、实时、远程的监控特点极大地提…

【稳定检索/投稿优惠】2024年教育、人文发展与艺术国际会议(EHDA 2024)

2024 International Conference on Education, Humanities Development and Arts 2024年教育、人文发展与艺术国际会议 【会议信息】 会议简称&#xff1a;EHDA 2024 大会时间&#xff1a;点击查看 截稿时间&#xff1a;点击查看 大会地点&#xff1a;中国北京 会议官网&#…

云微客短视频矩阵全域营销,更高效的获客引流方式!

在抖音这样一个拥有海量用户和内容的短视频平台上&#xff0c;单一账号往往难以覆盖我们的客户群体&#xff0c;甚至于每天发布四五条视频&#xff0c;所引发的流量也是微乎其微的。在竞争如此激烈的市场环境中&#xff0c;商家企业无不想方设法追求更高效的获客引流方式&#…