MindIE-ICU
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
测试文档
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
npu-smi info 相关报错- dcmi module initialize failed_MindIE_昇腾论坛
鲲鹏服务器,300I-Duo(310P3)npu卡,部署Qwen3-32B,容器内启动 ./mindieservice_daemon报权限错误_MindIE_昇腾论坛
今天刚到的算力服务器:300I-Duo(310P3)npu卡, 容器内启动 ./mindieservice_daemon,日志是什么意思?_MindIE_昇腾论坛
deepseekR1模型的性能需要怎么优化_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
求Atlas 800 Mindie部署Qwen2.5VL32B的教程_MindIE_昇腾论坛
大模型部署_MindIE_昇腾论坛
MindIE 2.1RC1部署qwen2.5-vl-7B-Instruct并发推理20张以上图片报错_MindIE_昇腾论坛
import mindiesd,报错:libPTAExtensionOPS.so: undefined symbol_MindIE_昇腾论坛
单卡310P3使用镜像跑deepsee 7B报出Warning: EZ9999: Inner Error! EZ9999: [PID: 1338] _MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
mindieservice_daemon 启动报错, NPU out of memory (PyTorch)_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
vit-b-16镜像服务启动成功之后的调用命令例子不对_MindIE_昇腾论坛
跑2张310P卡,使用mindie镜像,npu-smi info显示的设备不是0,1_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
模型答非所问_MindIE_昇腾论坛
mindIE 设置system prompt以及和ollama的一些参数对应问题_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
拉起mindIE服务(mindieservice_daemon)出错_MindIE_昇腾论坛
MindIE推理DeepSeek R1 0528 BF16乱码_MindIE_昇腾论坛
Atlas 500 A2 智能小站 ffmpeg硬件编解码报错_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
在使用mis_tei:7.1.RC1-300I-Duo-aarch64镜像时报错,illegal_MindIE_昇腾论坛
银河麒麟V10 SP1运行mindie报错_MindIE_昇腾论坛
qwen2.5-vl图片处理报错_MindIE_昇腾论坛
mindie部署模型后,并发多的情况下请求报错_MindIE_昇腾论坛
在2块Atlas 300i pro 上,开启mindie-service服务,跑Qwen3-8b,速度较慢 ,请问有加速方案吗? _MindIE_昇腾论坛
部署mindie docker时,应该选用什么操作系统,麒麟还是欧拉?_MindIE_昇腾论坛
mindieserver提供的大模型服务化接口如何配置API key_MindIE_昇腾论坛
Ascend310P1的OrangePi AI Studio使用mindIE镜像报错ImportError: /usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib/libatb.so: un_MindIE_昇腾论坛
Embedding、Rerank部署报错_MindIE_昇腾论坛
Tokenizer encode wait sub process timeout. errno is 110_MindIE_昇腾论坛
310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
https安装证书失败_MindIE_昇腾论坛
S500C+300IPRO 部署gpustack ,适应mindeie 部署qwen 2.5 7B instruct 启动不了_MindIE_昇腾论坛
求助,300i duo卡使用msmodelslim中的w8a8量化,用mindie部署报错_MindIE_昇腾论坛
【MindIE】【接口疑问】服务管控指标查询接口如果有多个模型时,指标值是多个模型汇总的结果吗_MindIE_昇腾论坛
最新镜像2.1.RC1-300I-Duo-py311-openeuler24.03-lts 部署Qwen2.5-VL-7B,文本问答正常,推理图像报错_MindIE_昇腾论坛
MindIE 2.1.RC1版本中,支持Qwen3-32B 在300IDUO下的稀疏量化问题_MindIE_昇腾论坛
部署qwen3-32b模型tool_call功能异常,急急急_MindIE_昇腾论坛
MindIE MS Coordinate使用多模态的方式访问报[ERROR] [RequestListener.cpp:482] : [MIE03E200C00] [RequestListener] Failed to deal MindI_MindIE_昇腾论坛
mindie2.1启动qwen3:32b报错llminferengine failed to init llminfermodels_MindIE_昇腾论坛
export MIES_SERVICE_MONITOR_MODE=1设置需重启服务中断业务才能生效,有规避的方法吗?_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
deepseekR1模型的性能需要怎么优化_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
mindieservice_daemon 启动报错, NPU out of memory (PyTorch)_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
Atlas 500 A2 智能小站 ffmpeg硬件编解码报错_MindIE_昇腾论坛
最新镜像2.1.RC1-300I-Duo-py311-openeuler24.03-lts 部署Qwen2.5-VL-7B,文本问答正常,推理图像报错_MindIE_昇腾论坛
部署qwen3-32b模型tool_call功能异常,急急急_MindIE_昇腾论坛
MindIE 启动卡死,日志停滞在 model_runner.dtype,NPU 进程存在但模型未加载_MindIE_昇腾论坛
求助谁用最新的mindie2.1rc1在300i-duo的卡下面跑通qwen3的moe模型了?_MindIE_昇腾论坛
请求快速支持 gpt-oss-120b 和 gpt-oss-20b 模型_MindIE_昇腾论坛
MindIE 支持 msmodelslim 量化后的模型嘛_MindIE_昇腾论坛
有偿求助帮忙在autodl上部署MindIE Server_MindIE_昇腾论坛
MindIE如何一卡推理两个语言模型?_MindIE_昇腾论坛
w cassin baixar_MindIE_昇腾论坛
适配微调后的MiniCPM-V-2_6,总是出现错误:Failed to get vocab size from tokenizer wrapper with exception_MindIE_昇腾论坛
求助,现在mindie哪个版本支持部署qwen2.5-vl-72B,有没有部署指导文档,谢谢。_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
Atlas 300I Duo8卡运行DeepSeek-R1-Distill-Llama-70B异常_MindIE_昇腾论坛
《急贴求大神回答!!!》800TA2宿主机部署Deepseek-R1、Qwen3-235B、Qwen3-32B推理服务报错!!!!_MindIE_昇腾论坛
qwen2.5VL 72B模型部署后无法确保结果可复现_MindIE_昇腾论坛
VLM 模型是否支持w8a16 量化_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
之前昇腾产品公告中提到7月份会升级mindie到2.1rc1版本,什么时候能试用啊?_MindIE_昇腾论坛
MindIE 部署JanusPro7B 识别返回乱码 _MindIE_昇腾论坛
MindIE双机部署Qwen3-235B-A22B-Instruct-2507模型报错:Failed to get vocab size from tokenizer wrapper with exception_MindIE_昇腾论坛
mindie2.0.RC2版本运行GLM-4.1V-9B-Thinking报错是不支持吗_MindIE_昇腾论坛
多模型部署报错了_MindIE_昇腾论坛
atlas300 卡,mindie 或者 gpustack 启动本地 llm , 怎么对 本地部署的 ragflow 等 RAG 应用进行测评呢?_MindIE_昇腾论坛
dev-2.0.T17.B010-800I-A2-py311-ubuntu22.04-aarch64 部署Qwen3-Coder 失败_MindIE_昇腾论坛
310P3 运行Qwen3/Deepseek-R1的性能_MindIE_昇腾论坛
求助DeepSeek-R1-Distill-Llama-70B-W8A8-NPU谁有这个能下载的模型啊,两张300iDUO还有模型的优化配置_MindIE_昇腾论坛
mindie:2.0.RC2-300I-Duo-py311-openeuler24.03-lts请问如何去升级旧的变量_MindIE_昇腾论坛
镜像文件中使用 ./bin/mindieservice_daemon报错:_MindIE_昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
大EP场景下运行,D节点夯住,提升Slave waiting for master init flag_MindIE_昇腾论坛
四机部署DeepSeek-R1-0528-bf16问题与解决方案_MindIE_昇腾论坛
DeepSeek-V3-w8a8双机直连部署启MindIE服务化报错_MindIE_昇腾论坛
登录镜像仓库报错_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
部署qwen2.5vl-32b时,报错 Exception:call aclnnArange failed, detail:EZ9999: Inner Error!_MindIE_昇腾论坛
双机32卡910B (MindIE 2.0 rc2) 部署DeepSeek-R1-w8a8,性能优化遇到瓶颈(32卡吞吐仅780 tok/s vs 单机16卡680 tok/s),求tpdp,moetpdp参数设置建议_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
300iduo,一张310P3 96G的卡,mindie可以跑qwen2.5vl吗?_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
求Atlas 800 Mindie部署Qwen2.5VL32B的教程_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
MindIE 2.1.RC1版本中,支持Qwen3-32B 在300IDUO下的稀疏量化问题_MindIE_昇腾论坛
MindIE如何一卡推理两个语言模型?_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
镜像文件中使用 ./bin/mindieservice_daemon报错:_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE多机多卡推理,是否支持使用部分卡,而不是整机8张卡都使用_MindIE_昇腾论坛
MindIE部署Qwen2-audio模型怎么调用?_MindIE_昇腾论坛
MindIE-Service 部署的推理服务是不是不能上传附件比如pdf_MindIE_昇腾论坛
在Atlas 300I Duo 96GB卡上如何部署SGLang,没有找到相关的文档_MindIE_昇腾论坛
MindIE是否支持bge-reranker-V2-m3模型_MindIE_昇腾论坛
《急帖求回》宿主机怎么部署Qwen3系列模型_MindIE_昇腾论坛
mindie2.0用2卡910B推理Qwen2.5-VL-3B正常,而切换为单卡推理报错_MindIE_昇腾论坛
能否增加下 m3e-large 的emb镜像呢_MindIE_昇腾论坛
910B3支持的量化类型_MindIE_昇腾论坛
910B2求助_MindIE_昇腾论坛
Qwen2.5-VL-72B-Instruct模型分析图片报错{"error":"Failed to get engine response.","error_type":"Incomplete Generation"}_MindIE_昇腾论坛
咨询关于mindIE支持模型的几个问题_MindIE_昇腾论坛
mindie下载申请能不能快一些通过啊!_MindIE_昇腾论坛
910A用MindIE部署Qwen2.5-VL-7B-Instruct报错_MindIE_昇腾论坛
910A用MindIE部署Qwen2.5-VL-7B-Instruct报错_MindIE_昇腾论坛
基于MindIE2.0.RC2在300I Duo上单卡运行qwen2.5-vl-7b报错_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
2.0.RC2-300I-Duo-py311-openeuler24.03-lts镜像是否支持qwen3_MindIE_昇腾论坛
atb-llm面壁大模型权重转换报错ValueError: safe_get_model_from_pretrained failed._MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
使用qwen2.5-vl-7b-instruct镜像跑qwen2.5-vl-7b-instruct模型对话测试报错_MindIE_昇腾论坛
使用qwen2.5-vl-7b-instruct镜像跑qwen2.5-vl-7b-instruct模型对话测试报错_MindIE_昇腾论坛
使用mis-tei驱动rerenk模型时存在的问题_MindIE_昇腾论坛
MindIE啥时候能支持GLM-4.1V-9B-Thinking模型呀,有具体计划吗?_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
请教MindIE2.0RC2运行Qwen32-14B模型的参数_MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
mis-tei:7.1.T3-800I-A2-aarch64部署错误情况_MindIE_昇腾论坛
mindie运行qwen2.5-72B失败_MindIE_昇腾论坛
申请的mindie下载权限,请管理员尽快批准_MindIE_昇腾论坛
为什么PD分离部署场景下,指定openai格式接口内的 top_k,top_p,seed, temperature,beam等参数,全都不生效_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
Atlas 300-I-duo 96g的显卡支持什么ai大模型_MindIE_昇腾论坛
mindie2.0RC2 运行Qwen2.5-VL-32B-Instruct模型失败_MindIE_昇腾论坛
VLLM+ray 搭建分布式推理运行 Qwen3_235B,VLLM 跨节点寻找 npuID 逻辑错误_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
MindIE 启动卡死,日志停滞在 model_runner.dtype,NPU 进程存在但模型未加载_MindIE_昇腾论坛
求助谁用最新的mindie2.1rc1在300i-duo的卡下面跑通qwen3的moe模型了?_MindIE_昇腾论坛
有偿求助帮忙在autodl上部署MindIE Server_MindIE_昇腾论坛
Atlas 300I Duo8卡运行DeepSeek-R1-Distill-Llama-70B异常_MindIE_昇腾论坛
之前昇腾产品公告中提到7月份会升级mindie到2.1rc1版本,什么时候能试用啊?_MindIE_昇腾论坛
多模型部署报错了__昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
大EP场景下运行,D节点夯住,提升Slave waiting for master init flag_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动__昇腾论坛
《急帖求回》宿主机怎么部署Qwen3系列模型_MindIE_昇腾论坛
mindie2.0用2卡910B推理Qwen2.5-VL-3B正常,而切换为单卡推理报错_MindIE_昇腾论坛
Qwen2.5-VL-72B-Instruct模型分析图片报错{"error":"Failed to get engine response.","error_type":"Incomplete Generation"}_MindIE_昇腾论坛
基于MindIE2.0.RC2在300I Duo上单卡运行qwen2.5-vl-7b报错__昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
atb-llm面壁大模型权重转换报错ValueError: safe_get_model_from_pretrained failed._MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
Atlas 300-I-duo 96g的显卡支持什么ai大模型_MindIE_昇腾论坛
mindie2.0RC2 运行Qwen2.5-VL-32B-Instruct模型失败_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
裸机CPU高性能开启执行" cpupower -c all frequency-set -g performance"失败_MindIE_昇腾论坛
MindIE 启动 DeepSeek-R1-0528-Qwen3-8B 报错_MindIE_昇腾论坛
Qwen3-32B进行w4a4量化时报错copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:285 NPU function error: c10_npu::acl::Acl_MindIE_昇腾论坛
Qwen3-32B进行w4a4量化时报错copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:285 NPU function error: c10_npu::acl::Acl_MindIE_昇腾论坛
300I DUO 部署 Qwen2-VL-7B-Instruct 报错_MindIE_昇腾论坛
MindIE是否有计划增加结构化输出能力, 比如集成xgrammar库?_MindIE_昇腾论坛
【已解决】mindie加载 qwen2.5-14B-instruct-w8a8 报错AttributeError: 'ForkAwareLocal' object has no attribute 'connection‘_MindIE_昇腾论坛
有什么好用的ocr识别pdf文档,可以部署到910b服务器_MindIE_昇腾论坛
2.0.RC1-800I-A2-py311-openeuler24.03-lts 部署DeepSeekV3-BF16 在多并发下如何保持首响在1s内_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
MindIE 文本/流式推理接口 是否支持上下文请求,如果支持如何使用_MindIE_昇腾论坛
thxcode/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts 服务部署deepseekv3_fp16乱码_MindIE_昇腾论坛
model-config中'asyncBatchscheduler': 'false', 'async_infer': 'false', 'distributed_enable': 'false'_MindIE_昇腾论坛
2.0.RC1 mindie 910b4 创建rerank 和embedding失败_MindIE_昇腾论坛
300I Duo卡能用MindIE 部署DeepSeek 32B吗【最新MIndIE模型支撑列表里显示不支持】_MindIE_昇腾论坛
300I Duo卡能用MindIE 部署DeepSeek 32B吗【最新MIndIE模型支撑列表里显示不支持】_MindIE_昇腾论坛
启动dsv3报错_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题__昇腾论坛
求个最新的mindie_2.0 用于部署qwen3和qwen2.5_vl_MindIE_昇腾论坛
求个最新的mindie_2.0 用于部署qwen3和qwen2.5_vl_MindIE_昇腾论坛
(mindieservice)There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracke_MindIE_昇腾论坛
mindie启动服务失败_MindIE_昇腾论坛
MindIE中执行命令报错:OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 64: Operation not permitted_MindIE_昇腾论坛
(mindieservice)get platform info failed, drvErr=87. _MindIE_昇腾论坛
mindie 2.0 版本 部署qwen 2.5vl 72B,并发数为40时 出现如下错误Segmentation fault (core dumped),导致服务直接挂死_MindIE_昇腾论坛
HunyuanVideo视频生成部署问题,爆显存_MindIE_昇腾论坛
300iduo里运行docker版模型报错_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
910b4部署deekseep失败_MindIE_昇腾论坛
910b4部署deekseep失败_MindIE_昇腾论坛
mis-tei 是否支持 Qwen3-Embedding 及 Qwen3-Reranker_MindIE_昇腾论坛
mis-tei 启动后一直输出 waiting for python backend to be ready_MindIE_昇腾论坛
Qwen3-32B的mindie:2.0.T17镜像还能下载吗_MindIE_昇腾论坛
ascend-device-pulgin-branch_v6.0.0.0-RC3 K8s部署大模型报错_MindIE_昇腾论坛
双机直连跑deepseekINT8量化模型,不报错但日志卡住_MindIE_昇腾论坛
(DMA) hardware execution error_MindIE_昇腾论坛
MindIe2.0RC1 容器化部署时必需使用24.0.0 及以上版本吗,20.0.0.0 能否进行部署?_MindIE_昇腾论坛
Ascend 310啥时候可以兼容Qwen3-14B呀?_MindIE_昇腾论坛
使用openai 接口 在多模态任务下 历史多轮数据格式问题_MindIE_昇腾论坛
MindIE Server使用https报错_MindIE_昇腾论坛
-
+
首页
310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛
# 310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛 ## 概述 本文档基于昇腾社区论坛帖子生成的技术教程。 **原始链接**: https://www.hiascend.com/forum/thread-0259190364947368222-1-1.html **生成时间**: 2025-08-27 10:33:56 --- ## 问题描述 屏幕日志如下 INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend... ## 相关代码 ### 代码示例 1 ``` INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'. INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960 INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67 → 66 sizes Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end. INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} [rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B... Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ... Downloading [model.safetensors]: 100%|█████████████████████████████████████████| 1.40G/1.40G [12:46<00:00, 1.96MB/s] Processing 1 items: 100%|█████████████████████████████████████████████████████████| 1.00/1.00 [12:46<00:00, 767s/it] 2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully. 2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s .INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480 INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match. [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match. ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start. ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last): ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 08-12 01:31:20 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 08-12 01:31:20 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ ERROR 08-12 01:31:20 [core.py:586] self._initialize_kv_caches(vllm_config) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches ERROR 08-12 01:31:20 [core.py:586] self.model_executor.initialize_from_config(kv_cache_configs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config ERROR 08-12 01:31:20 [core.py:586] self.collective_rpc("compile_or_warm_up_model") ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 08-12 01:31:20 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model ERROR 08-12 01:31:20 [core.py:586] self.model_runner.capture_model() ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model ERROR 08-12 01:31:20 [core.py:586] self._dummy_run(num_tokens) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run ERROR 08-12 01:31:20 [core.py:586] hidden_states = model( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward ERROR 08-12 01:31:20 [core.py:586] hidden_states = self.model(input_ids, positions, intermediate_tensors, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ ERROR 08-12 01:31:20 [core.py:586] model_output = self.forward(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward ERROR 08-12 01:31:20 [core.py:586] def forward( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn ERROR 08-12 01:31:20 [core.py:586] return fn(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped ERROR 08-12 01:31:20 [core.py:586] return self._wrapped_call(self, *args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ ERROR 08-12 01:31:20 [core.py:586] raise e ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ ERROR 08-12 01:31:20 [core.py:586] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "<eval_with_key>.58", line 234, in forward ERROR 08-12 01:31:20 [core.py:586] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ ERROR 08-12 01:31:20 [core.py:586] with torch.npu.graph(aclgraph, pool=self.graph_pool): ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ ERROR 08-12 01:31:20 [core.py:586] self.npu_graph.capture_begin( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin ERROR 08-12 01:31:20 [core.py:586] super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ self._initialize_kv_caches(vllm_config) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches self.model_executor.initialize_from_config(kv_cache_configs) File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config self.collective_rpc("compile_or_warm_up_model") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model self.model_runner.capture_model() File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model self._dummy_run(num_tokens) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run hidden_states = model( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward hidden_states = self.model(input_ids, positions, intermediate_tensors, File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ model_output = self.forward(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward def forward( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn return fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped return self._wrapped_call(self, *args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ raise e File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "<eval_with_key>.58", line 234, in forward submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ with torch.npu.graph(aclgraph, pool=self.graph_pool): File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ self.npu_graph.capture_begin( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Traceback (most recent call last): File "/workspace/test.py", line 9, in <module> llm = LLM(model="Qwen/Qwen3-0.6B") File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args return engine_cls.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config return cls(vllm_config=vllm_config, File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__ self.engine_core = EngineCoreClient.make_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception复制 ``` ### 代码示例 2 ``` INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'. INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960 INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67 → 66 sizes Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end. INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} [rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B... Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ... Downloading [model.safetensors]: 100%|█████████████████████████████████████████| 1.40G/1.40G [12:46<00:00, 1.96MB/s] Processing 1 items: 100%|█████████████████████████████████████████████████████████| 1.00/1.00 [12:46<00:00, 767s/it] 2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully. 2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s .INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480 INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match. [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match. ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start. ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last): ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 08-12 01:31:20 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 08-12 01:31:20 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ ERROR 08-12 01:31:20 [core.py:586] self._initialize_kv_caches(vllm_config) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches ERROR 08-12 01:31:20 [core.py:586] self.model_executor.initialize_from_config(kv_cache_configs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config ERROR 08-12 01:31:20 [core.py:586] self.collective_rpc("compile_or_warm_up_model") ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 08-12 01:31:20 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model ERROR 08-12 01:31:20 [core.py:586] self.model_runner.capture_model() ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model ERROR 08-12 01:31:20 [core.py:586] self._dummy_run(num_tokens) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run ERROR 08-12 01:31:20 [core.py:586] hidden_states = model( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward ERROR 08-12 01:31:20 [core.py:586] hidden_states = self.model(input_ids, positions, intermediate_tensors, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ ERROR 08-12 01:31:20 [core.py:586] model_output = self.forward(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward ERROR 08-12 01:31:20 [core.py:586] def forward( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn ERROR 08-12 01:31:20 [core.py:586] return fn(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped ERROR 08-12 01:31:20 [core.py:586] return self._wrapped_call(self, *args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ ERROR 08-12 01:31:20 [core.py:586] raise e ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ ERROR 08-12 01:31:20 [core.py:586] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "<eval_with_key>.58", line 234, in forward ERROR 08-12 01:31:20 [core.py:586] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ ERROR 08-12 01:31:20 [core.py:586] with torch.npu.graph(aclgraph, pool=self.graph_pool): ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ ERROR 08-12 01:31:20 [core.py:586] self.npu_graph.capture_begin( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin ERROR 08-12 01:31:20 [core.py:586] super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ self._initialize_kv_caches(vllm_config) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches self.model_executor.initialize_from_config(kv_cache_configs) File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config self.collective_rpc("compile_or_warm_up_model") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model self.model_runner.capture_model() File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model self._dummy_run(num_tokens) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run hidden_states = model( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward hidden_states = self.model(input_ids, positions, intermediate_tensors, File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ model_output = self.forward(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward def forward( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn return fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped return self._wrapped_call(self, *args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ raise e File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "<eval_with_key>.58", line 234, in forward submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ with torch.npu.graph(aclgraph, pool=self.graph_pool): File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ self.npu_graph.capture_begin( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Traceback (most recent call last): File "/workspace/test.py", line 9, in <module> llm = LLM(model="Qwen/Qwen3-0.6B") File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args return engine_cls.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config return cls(vllm_config=vllm_config, File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__ self.engine_core = EngineCoreClient.make_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception ``` ### 代码示例 3 ``` root@k8s-master:/workspace# python collect_env.py Collecting environment information... PyTorch version: 2.5.1 Is debug build: False OS: Ubuntu 22.04.5 LTS (aarch64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 4.0.3 Libc version: glibc-2.35 Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35 CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: HiSilicon Model name: Kunpeng-920 Model: 0 Thread(s) per core: 1 Core(s) per cluster: 32 Socket(s): - Cluster(s): 2 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm L1d cache: 4 MiB (64 instances) L1i cache: 4 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 64 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pyzmq==27.0.0 [pip3] torch==2.5.1 [pip3] torch-npu==2.5.1.post1.dev20250619 [pip3] torchvision==0.20.1 [pip3] transformers==4.52.4 [conda] Could not collect vLLM Version: 0.9.2 vLLM Ascend Version: 0.9.2rc1 ENV Variables: ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240 ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1 ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0 ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0 ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32 ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0 ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5 VLLM_USE_MODELSCOPE=true ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0 ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest ATB_COMPARE_TILING_EVERY_KERNEL=0 ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/: ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3 ATB_RUNNER_POOL_SIZE=64 ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0 ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_MATMUL_SHUFFLE_K_ENABLE=1 ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1 ATB_HOST_TILING_BUFFER_BLOCK_NUM=128 ATB_SHARE_MEMORY_NAME_SUFFIX= TORCH_DEVICE_BACKEND_AUTOLOAD=1 PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 NPU: +--------------------------------------------------------------------------------------------------------+ | npu-smi 25.0.rc1.1 Version: 25.0.rc1.1 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 32 310P3 | OK | NA 45 0 / 0 | | 0 0 | 0000:02:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ | 96 310P3 | OK | NA 46 0 / 0 | | 0 1 | 0000:04:00.0 | 0 1851 / 21527 | +===============================+=================+======================================================+ | 32800 310P3 | OK | NA 47 0 / 0 | | 0 2 | 0000:82:00.0 | 0 1855 / 21527 | +===============================+=================+======================================================+ | 32896 310P3 | OK | NA 50 0 / 0 | | 0 3 | 0000:85:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | No running processes found in NPU 32 | +===============================+=================+======================================================+ | No running processes found in NPU 96 | +===============================+=================+======================================================+ | No running processes found in NPU 32800 | +===============================+=================+======================================================+ | No running processes found in NPU 32896 | +===============================+=================+======================================================+ CANN: package_name=Ascend-cann-toolkit version=8.1.RC1 innerversion=V100R001C21SPC001B238 compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux 复制 ``` ### 代码示例 4 ``` root@k8s-master:/workspace# python collect_env.py Collecting environment information... PyTorch version: 2.5.1 Is debug build: False OS: Ubuntu 22.04.5 LTS (aarch64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 4.0.3 Libc version: glibc-2.35 Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35 CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: HiSilicon Model name: Kunpeng-920 Model: 0 Thread(s) per core: 1 Core(s) per cluster: 32 Socket(s): - Cluster(s): 2 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm L1d cache: 4 MiB (64 instances) L1i cache: 4 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 64 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pyzmq==27.0.0 [pip3] torch==2.5.1 [pip3] torch-npu==2.5.1.post1.dev20250619 [pip3] torchvision==0.20.1 [pip3] transformers==4.52.4 [conda] Could not collect vLLM Version: 0.9.2 vLLM Ascend Version: 0.9.2rc1 ENV Variables: ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240 ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1 ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0 ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0 ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32 ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0 ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5 VLLM_USE_MODELSCOPE=true ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0 ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest ATB_COMPARE_TILING_EVERY_KERNEL=0 ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/: ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3 ATB_RUNNER_POOL_SIZE=64 ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0 ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_MATMUL_SHUFFLE_K_ENABLE=1 ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1 ATB_HOST_TILING_BUFFER_BLOCK_NUM=128 ATB_SHARE_MEMORY_NAME_SUFFIX= TORCH_DEVICE_BACKEND_AUTOLOAD=1 PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 NPU: +--------------------------------------------------------------------------------------------------------+ | npu-smi 25.0.rc1.1 Version: 25.0.rc1.1 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 32 310P3 | OK | NA 45 0 / 0 | | 0 0 | 0000:02:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ | 96 310P3 | OK | NA 46 0 / 0 | | 0 1 | 0000:04:00.0 | 0 1851 / 21527 | +===============================+=================+======================================================+ | 32800 310P3 | OK | NA 47 0 / 0 | | 0 2 | 0000:82:00.0 | 0 1855 / 21527 | +===============================+=================+======================================================+ | 32896 310P3 | OK | NA 50 0 / 0 | | 0 3 | 0000:85:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | No running processes found in NPU 32 | +===============================+=================+======================================================+ | No running processes found in NPU 96 | +===============================+=================+======================================================+ | No running processes found in NPU 32800 | +===============================+=================+======================================================+ | No running processes found in NPU 32896 | +===============================+=================+======================================================+ CANN: package_name=Ascend-cann-toolkit version=8.1.RC1 innerversion=V100R001C21SPC001B238 compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux ``` ## 完整内容 屏幕日志如下 INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'. INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960 INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67 66 sizes Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end. INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} [rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B... Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ... Downloading [model.safetensors]: 100%|| 1.40G/1.40G [12:46<00:00, 1.96MB/s] Processing 1 items: 100%|| 1.00/1.00 [12:46<00:00, 767s/it] 2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully. 2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s .INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480 INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match. [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match. ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start. ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last): ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 08-12 01:31:20 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 08-12 01:31:20 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ ERROR 08-12 01:31:20 [core.py:586] self._initialize_kv_caches(vllm_config) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches ERROR 08-12 01:31:20 [core.py:586] self.model_executor.initialize_from_config(kv_cache_configs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config ERROR 08-12 01:31:20 [core.py:586] self.collective_rpc("compile_or_warm_up_model") ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 08-12 01:31:20 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model ERROR 08-12 01:31:20 [core.py:586] self.model_runner.capture_model() ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model ERROR 08-12 01:31:20 [core.py:586] self._dummy_run(num_tokens) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run ERROR 08-12 01:31:20 [core.py:586] hidden_states = model( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward ERROR 08-12 01:31:20 [core.py:586] hidden_states = self.model(input_ids, positions, intermediate_tensors, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ ERROR 08-12 01:31:20 [core.py:586] model_output = self.forward(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward ERROR 08-12 01:31:20 [core.py:586] def forward( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn ERROR 08-12 01:31:20 [core.py:586] return fn(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped ERROR 08-12 01:31:20 [core.py:586] return self._wrapped_call(self, *args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ ERROR 08-12 01:31:20 [core.py:586] raise e ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ ERROR 08-12 01:31:20 [core.py:586] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "<eval_with_key>.58", line 234, in forward ERROR 08-12 01:31:20 [core.py:586] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ ERROR 08-12 01:31:20 [core.py:586] with torch.npu.graph(aclgraph, pool=self.graph_pool): ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ ERROR 08-12 01:31:20 [core.py:586] self.npu_graph.capture_begin( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin ERROR 08-12 01:31:20 [core.py:586] super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ self._initialize_kv_caches(vllm_config) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches self.model_executor.initialize_from_config(kv_cache_configs) File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config self.collective_rpc("compile_or_warm_up_model") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model self.model_runner.capture_model() File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model self._dummy_run(num_tokens) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run hidden_states = model( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward hidden_states = self.model(input_ids, positions, intermediate_tensors, File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ model_output = self.forward(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward def forward( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn return fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped return self._wrapped_call(self, *args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ raise e File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "<eval_with_key>.58", line 234, in forward submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ with torch.npu.graph(aclgraph, pool=self.graph_pool): File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ self.npu_graph.capture_begin( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Traceback (most recent call last): File "/workspace/test.py", line 9, in <module> llm = LLM(model="Qwen/Qwen3-0.6B") File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args return engine_cls.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config return cls(vllm_config=vllm_config, File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__ self.engine_core = EngineCoreClient.make_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception复制 系统环境如下 root@k8s-master:/workspace# python collect_env.py Collecting environment information... PyTorch version: 2.5.1 Is debug build: False OS: Ubuntu 22.04.5 LTS (aarch64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 4.0.3 Libc version: glibc-2.35 Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35 CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: HiSilicon Model name: Kunpeng-920 Model: 0 Thread(s) per core: 1 Core(s) per cluster: 32 Socket(s): - Cluster(s): 2 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm L1d cache: 4 MiB (64 instances) L1i cache: 4 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 64 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pyzmq==27.0.0 [pip3] torch==2.5.1 [pip3] torch-npu==2.5.1.post1.dev20250619 [pip3] torchvision==0.20.1 [pip3] transformers==4.52.4 [conda] Could not collect vLLM Version: 0.9.2 vLLM Ascend Version: 0.9.2rc1 ENV Variables: ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240 ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1 ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0 ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0 ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32 ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0 ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5 VLLM_USE_MODELSCOPE=true ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0 ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest ATB_COMPARE_TILING_EVERY_KERNEL=0 ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/: ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3 ATB_RUNNER_POOL_SIZE=64 ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0 ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_MATMUL_SHUFFLE_K_ENABLE=1 ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1 ATB_HOST_TILING_BUFFER_BLOCK_NUM=128 ATB_SHARE_MEMORY_NAME_SUFFIX= TORCH_DEVICE_BACKEND_AUTOLOAD=1 PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 NPU: +--------------------------------------------------------------------------------------------------------+ | npu-smi 25.0.rc1.1 Version: 25.0.rc1.1 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 32 310P3 | OK | NA 45 0 / 0 | | 0 0 | 0000:02:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ | 96 310P3 | OK | NA 46 0 / 0 | | 0 1 | 0000:04:00.0 | 0 1851 / 21527 | +===============================+=================+======================================================+ | 32800 310P3 | OK | NA 47 0 / 0 | | 0 2 | 0000:82:00.0 | 0 1855 / 21527 | +===============================+=================+======================================================+ | 32896 310P3 | OK | NA 50 0 / 0 | | 0 3 | 0000:85:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | No running processes found in NPU 32 | +===============================+=================+======================================================+ | No running processes found in NPU 96 | +===============================+=================+======================================================+ | No running processes found in NPU 32800 | +===============================+=================+======================================================+ | No running processes found in NPU 32896 | +===============================+=================+======================================================+ CANN: package_name=Ascend-cann-toolkit version=8.1.RC1 innerversion=V100R001C21SPC001B238 compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux 复制 --- ## 技术要点总结 基于以上内容,主要技术要点包括: 1. **问题类型**: 错误处理 2. **涉及技术**: HTTPS, PyTorch, GPU, NPU, MindIE, 昇腾, CANN, AI 3. **解决方案**: 请参考完整内容中的解决方案 ## 相关资源 - 昇腾社区: https://www.hiascend.com/ - 昇腾论坛: https://www.hiascend.com/forum/ --- *本文档由AI自动生成,仅供参考。如有疑问,请参考原始帖子。*
yg9538
2025年8月27日 11:02
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
Word文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码
有效期