MindIE-ICU
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
测试文档
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
npu-smi info 相关报错- dcmi module initialize failed_MindIE_昇腾论坛
鲲鹏服务器,300I-Duo(310P3)npu卡,部署Qwen3-32B,容器内启动 ./mindieservice_daemon报权限错误_MindIE_昇腾论坛
今天刚到的算力服务器:300I-Duo(310P3)npu卡, 容器内启动 ./mindieservice_daemon,日志是什么意思?_MindIE_昇腾论坛
deepseekR1模型的性能需要怎么优化_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
求Atlas 800 Mindie部署Qwen2.5VL32B的教程_MindIE_昇腾论坛
大模型部署_MindIE_昇腾论坛
MindIE 2.1RC1部署qwen2.5-vl-7B-Instruct并发推理20张以上图片报错_MindIE_昇腾论坛
import mindiesd,报错:libPTAExtensionOPS.so: undefined symbol_MindIE_昇腾论坛
单卡310P3使用镜像跑deepsee 7B报出Warning: EZ9999: Inner Error! EZ9999: [PID: 1338] _MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
mindieservice_daemon 启动报错, NPU out of memory (PyTorch)_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
vit-b-16镜像服务启动成功之后的调用命令例子不对_MindIE_昇腾论坛
跑2张310P卡,使用mindie镜像,npu-smi info显示的设备不是0,1_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
模型答非所问_MindIE_昇腾论坛
mindIE 设置system prompt以及和ollama的一些参数对应问题_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
拉起mindIE服务(mindieservice_daemon)出错_MindIE_昇腾论坛
MindIE推理DeepSeek R1 0528 BF16乱码_MindIE_昇腾论坛
Atlas 500 A2 智能小站 ffmpeg硬件编解码报错_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
在使用mis_tei:7.1.RC1-300I-Duo-aarch64镜像时报错,illegal_MindIE_昇腾论坛
银河麒麟V10 SP1运行mindie报错_MindIE_昇腾论坛
qwen2.5-vl图片处理报错_MindIE_昇腾论坛
mindie部署模型后,并发多的情况下请求报错_MindIE_昇腾论坛
在2块Atlas 300i pro 上,开启mindie-service服务,跑Qwen3-8b,速度较慢 ,请问有加速方案吗? _MindIE_昇腾论坛
部署mindie docker时,应该选用什么操作系统,麒麟还是欧拉?_MindIE_昇腾论坛
mindieserver提供的大模型服务化接口如何配置API key_MindIE_昇腾论坛
Ascend310P1的OrangePi AI Studio使用mindIE镜像报错ImportError: /usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib/libatb.so: un_MindIE_昇腾论坛
Embedding、Rerank部署报错_MindIE_昇腾论坛
Tokenizer encode wait sub process timeout. errno is 110_MindIE_昇腾论坛
310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
https安装证书失败_MindIE_昇腾论坛
S500C+300IPRO 部署gpustack ,适应mindeie 部署qwen 2.5 7B instruct 启动不了_MindIE_昇腾论坛
求助,300i duo卡使用msmodelslim中的w8a8量化,用mindie部署报错_MindIE_昇腾论坛
【MindIE】【接口疑问】服务管控指标查询接口如果有多个模型时,指标值是多个模型汇总的结果吗_MindIE_昇腾论坛
最新镜像2.1.RC1-300I-Duo-py311-openeuler24.03-lts 部署Qwen2.5-VL-7B,文本问答正常,推理图像报错_MindIE_昇腾论坛
MindIE 2.1.RC1版本中,支持Qwen3-32B 在300IDUO下的稀疏量化问题_MindIE_昇腾论坛
部署qwen3-32b模型tool_call功能异常,急急急_MindIE_昇腾论坛
MindIE MS Coordinate使用多模态的方式访问报[ERROR] [RequestListener.cpp:482] : [MIE03E200C00] [RequestListener] Failed to deal MindI_MindIE_昇腾论坛
mindie2.1启动qwen3:32b报错llminferengine failed to init llminfermodels_MindIE_昇腾论坛
export MIES_SERVICE_MONITOR_MODE=1设置需重启服务中断业务才能生效,有规避的方法吗?_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
deepseekR1模型的性能需要怎么优化_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
真没招了搞了一天了解决不了,求助求助大佬来!!!EZ9999: [PID: 28402] 2025-08-24-16:31:00.300.443 Parse dynamic kernel config fail. Trace_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
mindieservice_daemon 启动报错, NPU out of memory (PyTorch)_MindIE_昇腾论坛
Qwen3-Reranker-4B重排序模型能够正常启动,访问重排序模型接口时报错,NPU out of memory,请问该如何解决_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
Atlas 500 A2 智能小站 ffmpeg硬件编解码报错_MindIE_昇腾论坛
最新镜像2.1.RC1-300I-Duo-py311-openeuler24.03-lts 部署Qwen2.5-VL-7B,文本问答正常,推理图像报错_MindIE_昇腾论坛
部署qwen3-32b模型tool_call功能异常,急急急_MindIE_昇腾论坛
MindIE 启动卡死,日志停滞在 model_runner.dtype,NPU 进程存在但模型未加载_MindIE_昇腾论坛
求助谁用最新的mindie2.1rc1在300i-duo的卡下面跑通qwen3的moe模型了?_MindIE_昇腾论坛
请求快速支持 gpt-oss-120b 和 gpt-oss-20b 模型_MindIE_昇腾论坛
MindIE 支持 msmodelslim 量化后的模型嘛_MindIE_昇腾论坛
有偿求助帮忙在autodl上部署MindIE Server_MindIE_昇腾论坛
MindIE如何一卡推理两个语言模型?_MindIE_昇腾论坛
w cassin baixar_MindIE_昇腾论坛
适配微调后的MiniCPM-V-2_6,总是出现错误:Failed to get vocab size from tokenizer wrapper with exception_MindIE_昇腾论坛
求助,现在mindie哪个版本支持部署qwen2.5-vl-72B,有没有部署指导文档,谢谢。_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
Atlas 300I Duo8卡运行DeepSeek-R1-Distill-Llama-70B异常_MindIE_昇腾论坛
《急贴求大神回答!!!》800TA2宿主机部署Deepseek-R1、Qwen3-235B、Qwen3-32B推理服务报错!!!!_MindIE_昇腾论坛
qwen2.5VL 72B模型部署后无法确保结果可复现_MindIE_昇腾论坛
VLM 模型是否支持w8a16 量化_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
之前昇腾产品公告中提到7月份会升级mindie到2.1rc1版本,什么时候能试用啊?_MindIE_昇腾论坛
MindIE 部署JanusPro7B 识别返回乱码 _MindIE_昇腾论坛
MindIE双机部署Qwen3-235B-A22B-Instruct-2507模型报错:Failed to get vocab size from tokenizer wrapper with exception_MindIE_昇腾论坛
mindie2.0.RC2版本运行GLM-4.1V-9B-Thinking报错是不支持吗_MindIE_昇腾论坛
多模型部署报错了_MindIE_昇腾论坛
atlas300 卡,mindie 或者 gpustack 启动本地 llm , 怎么对 本地部署的 ragflow 等 RAG 应用进行测评呢?_MindIE_昇腾论坛
dev-2.0.T17.B010-800I-A2-py311-ubuntu22.04-aarch64 部署Qwen3-Coder 失败_MindIE_昇腾论坛
310P3 运行Qwen3/Deepseek-R1的性能_MindIE_昇腾论坛
求助DeepSeek-R1-Distill-Llama-70B-W8A8-NPU谁有这个能下载的模型啊,两张300iDUO还有模型的优化配置_MindIE_昇腾论坛
mindie:2.0.RC2-300I-Duo-py311-openeuler24.03-lts请问如何去升级旧的变量_MindIE_昇腾论坛
镜像文件中使用 ./bin/mindieservice_daemon报错:_MindIE_昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
大EP场景下运行,D节点夯住,提升Slave waiting for master init flag_MindIE_昇腾论坛
四机部署DeepSeek-R1-0528-bf16问题与解决方案_MindIE_昇腾论坛
DeepSeek-V3-w8a8双机直连部署启MindIE服务化报错_MindIE_昇腾论坛
登录镜像仓库报错_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
部署qwen2.5vl-32b时,报错 Exception:call aclnnArange failed, detail:EZ9999: Inner Error!_MindIE_昇腾论坛
双机32卡910B (MindIE 2.0 rc2) 部署DeepSeek-R1-w8a8,性能优化遇到瓶颈(32卡吞吐仅780 tok/s vs 单机16卡680 tok/s),求tpdp,moetpdp参数设置建议_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
300iduo,一张310P3 96G的卡,mindie可以跑qwen2.5vl吗?_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
求Atlas 800 Mindie部署Qwen2.5VL32B的教程_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts镜像无法下载_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
MindIE镜像无法安装vllm依赖_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
MindIE 2.1.rc1启动Qwen3-30B-A3B报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
mindie推理qwen2.5-vl报错_MindIE_昇腾论坛
MindIE 2.1.RC1版本中,支持Qwen3-32B 在300IDUO下的稀疏量化问题_MindIE_昇腾论坛
MindIE如何一卡推理两个语言模型?_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
最新版的2.1.RC1-800I-A2-py311-openeuler24.03-lts 部署Qwen3-235B-A22B (2台Atlas 800I A2 推理)目前推理速度10 token/s_MindIE_昇腾论坛
910B 8卡 部署Qwen3-32B模型,模型启动报ConnectionRefusedError: [Errno 111] Connection refused错误_MindIE_昇腾论坛
镜像文件中使用 ./bin/mindieservice_daemon报错:_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
使用64G双机800I A2部署int8量化的deepseekR1最长上下文是多少?config.json的参数应该如何配置?_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
MindIE多机多卡推理,是否支持使用部分卡,而不是整机8张卡都使用_MindIE_昇腾论坛
MindIE部署Qwen2-audio模型怎么调用?_MindIE_昇腾论坛
MindIE-Service 部署的推理服务是不是不能上传附件比如pdf_MindIE_昇腾论坛
在Atlas 300I Duo 96GB卡上如何部署SGLang,没有找到相关的文档_MindIE_昇腾论坛
MindIE是否支持bge-reranker-V2-m3模型_MindIE_昇腾论坛
《急帖求回》宿主机怎么部署Qwen3系列模型_MindIE_昇腾论坛
mindie2.0用2卡910B推理Qwen2.5-VL-3B正常,而切换为单卡推理报错_MindIE_昇腾论坛
能否增加下 m3e-large 的emb镜像呢_MindIE_昇腾论坛
910B3支持的量化类型_MindIE_昇腾论坛
910B2求助_MindIE_昇腾论坛
Qwen2.5-VL-72B-Instruct模型分析图片报错{"error":"Failed to get engine response.","error_type":"Incomplete Generation"}_MindIE_昇腾论坛
咨询关于mindIE支持模型的几个问题_MindIE_昇腾论坛
mindie下载申请能不能快一些通过啊!_MindIE_昇腾论坛
910A用MindIE部署Qwen2.5-VL-7B-Instruct报错_MindIE_昇腾论坛
910A用MindIE部署Qwen2.5-VL-7B-Instruct报错_MindIE_昇腾论坛
基于MindIE2.0.RC2在300I Duo上单卡运行qwen2.5-vl-7b报错_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
2.0.RC2-300I-Duo-py311-openeuler24.03-lts镜像是否支持qwen3_MindIE_昇腾论坛
atb-llm面壁大模型权重转换报错ValueError: safe_get_model_from_pretrained failed._MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
使用qwen2.5-vl-7b-instruct镜像跑qwen2.5-vl-7b-instruct模型对话测试报错_MindIE_昇腾论坛
使用qwen2.5-vl-7b-instruct镜像跑qwen2.5-vl-7b-instruct模型对话测试报错_MindIE_昇腾论坛
使用mis-tei驱动rerenk模型时存在的问题_MindIE_昇腾论坛
MindIE啥时候能支持GLM-4.1V-9B-Thinking模型呀,有具体计划吗?_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
请教MindIE2.0RC2运行Qwen32-14B模型的参数_MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
mis-tei:7.1.T3-800I-A2-aarch64部署错误情况_MindIE_昇腾论坛
mindie运行qwen2.5-72B失败_MindIE_昇腾论坛
申请的mindie下载权限,请管理员尽快批准_MindIE_昇腾论坛
为什么PD分离部署场景下,指定openai格式接口内的 top_k,top_p,seed, temperature,beam等参数,全都不生效_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
Atlas 300-I-duo 96g的显卡支持什么ai大模型_MindIE_昇腾论坛
mindie2.0RC2 运行Qwen2.5-VL-32B-Instruct模型失败_MindIE_昇腾论坛
VLLM+ray 搭建分布式推理运行 Qwen3_235B,VLLM 跨节点寻找 npuID 逻辑错误_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
【MindIE】【DeepSeek】性能较差、推理较慢问题定位总结(持续更新~)_MindIE_昇腾论坛
310P3显卡跑Qwen2.5-7b-vl报错,参考https://www.hiascend.com/developer/ascendhub/detail/9eedc82e0c0644b2a2a9d0821ed5e7ad_MindIE_昇腾论坛
Atlas 300i Duo上部署qwen3-32b后启动失败_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
2.1.RC1-300I-Duo-py311-openeuler24.03-lts在300iDUO卡部署Qwen3-30B-A3B模型报错_MindIE_昇腾论坛
MindIE 启动卡死,日志停滞在 model_runner.dtype,NPU 进程存在但模型未加载_MindIE_昇腾论坛
求助谁用最新的mindie2.1rc1在300i-duo的卡下面跑通qwen3的moe模型了?_MindIE_昇腾论坛
有偿求助帮忙在autodl上部署MindIE Server_MindIE_昇腾论坛
Atlas 300I Duo8卡运行DeepSeek-R1-Distill-Llama-70B异常_MindIE_昇腾论坛
之前昇腾产品公告中提到7月份会升级mindie到2.1rc1版本,什么时候能试用啊?_MindIE_昇腾论坛
多模型部署报错了__昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
qwen2.5-72b-instruct运行中频繁卡顿:ai core 利用率总是莫名达到100%,然后模型推理卡住,持续约6-7分钟后释放_MindIE_昇腾论坛
大EP场景下运行,D节点夯住,提升Slave waiting for master init flag_MindIE_昇腾论坛
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动__昇腾论坛
《急帖求回》宿主机怎么部署Qwen3系列模型_MindIE_昇腾论坛
mindie2.0用2卡910B推理Qwen2.5-VL-3B正常,而切换为单卡推理报错_MindIE_昇腾论坛
Qwen2.5-VL-72B-Instruct模型分析图片报错{"error":"Failed to get engine response.","error_type":"Incomplete Generation"}_MindIE_昇腾论坛
基于MindIE2.0.RC2在300I Duo上单卡运行qwen2.5-vl-7b报错__昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
300iDUO适配MiniCPM-V-2_6微调后大模型失败_MindIE_昇腾论坛
bge-large-zh-v1.5模型部署后调用,出现 Bad Request Error, 413 Client Error_MindIE_昇腾论坛
atb-llm面壁大模型权重转换报错ValueError: safe_get_model_from_pretrained failed._MindIE_昇腾论坛
MiniCPM-V2.6-8Bmindie部署指导如何获取?_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大模型推理时常受限_MindIE_昇腾论坛
大家两张300I duo 跑DeepSeek-R1-Distill-Qwen-32B 的速度怎么样?_MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
Qwen3-14B的镜像改如何下载啊 _MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
单卡Atlas 300I DUO 使用mindie 启动qwen2-7B报错_MindIE_昇腾论坛
Atlas 300-I-duo 96g的显卡支持什么ai大模型_MindIE_昇腾论坛
mindie2.0RC2 运行Qwen2.5-VL-32B-Instruct模型失败_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
适配Qwen2.5-Omni-7B的镜像包无法下载_MindIE_昇腾论坛
裸机CPU高性能开启执行" cpupower -c all frequency-set -g performance"失败_MindIE_昇腾论坛
MindIE 启动 DeepSeek-R1-0528-Qwen3-8B 报错_MindIE_昇腾论坛
Qwen3-32B进行w4a4量化时报错copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:285 NPU function error: c10_npu::acl::Acl_MindIE_昇腾论坛
Qwen3-32B进行w4a4量化时报错copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:285 NPU function error: c10_npu::acl::Acl_MindIE_昇腾论坛
300I DUO 部署 Qwen2-VL-7B-Instruct 报错_MindIE_昇腾论坛
MindIE是否有计划增加结构化输出能力, 比如集成xgrammar库?_MindIE_昇腾论坛
【已解决】mindie加载 qwen2.5-14B-instruct-w8a8 报错AttributeError: 'ForkAwareLocal' object has no attribute 'connection‘_MindIE_昇腾论坛
有什么好用的ocr识别pdf文档,可以部署到910b服务器_MindIE_昇腾论坛
2.0.RC1-800I-A2-py311-openeuler24.03-lts 部署DeepSeekV3-BF16 在多并发下如何保持首响在1s内_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
300I DUO推理速度极慢1token/s,是配置问题还是显卡性能问题?_MindIE_昇腾论坛
MindIE 文本/流式推理接口 是否支持上下文请求,如果支持如何使用_MindIE_昇腾论坛
thxcode/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts 服务部署deepseekv3_fp16乱码_MindIE_昇腾论坛
model-config中'asyncBatchscheduler': 'false', 'async_infer': 'false', 'distributed_enable': 'false'_MindIE_昇腾论坛
2.0.RC1 mindie 910b4 创建rerank 和embedding失败_MindIE_昇腾论坛
300I Duo卡能用MindIE 部署DeepSeek 32B吗【最新MIndIE模型支撑列表里显示不支持】_MindIE_昇腾论坛
300I Duo卡能用MindIE 部署DeepSeek 32B吗【最新MIndIE模型支撑列表里显示不支持】_MindIE_昇腾论坛
启动dsv3报错_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题_MindIE_昇腾论坛
mindie容器部署,宿主机环境问题__昇腾论坛
求个最新的mindie_2.0 用于部署qwen3和qwen2.5_vl_MindIE_昇腾论坛
求个最新的mindie_2.0 用于部署qwen3和qwen2.5_vl_MindIE_昇腾论坛
(mindieservice)There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracke_MindIE_昇腾论坛
mindie启动服务失败_MindIE_昇腾论坛
MindIE中执行命令报错:OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 64: Operation not permitted_MindIE_昇腾论坛
(mindieservice)get platform info failed, drvErr=87. _MindIE_昇腾论坛
mindie 2.0 版本 部署qwen 2.5vl 72B,并发数为40时 出现如下错误Segmentation fault (core dumped),导致服务直接挂死_MindIE_昇腾论坛
HunyuanVideo视频生成部署问题,爆显存_MindIE_昇腾论坛
300iduo里运行docker版模型报错_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
本地部署和Qwen3 32B求助_MindIE_昇腾论坛
910b4部署deekseep失败_MindIE_昇腾论坛
910b4部署deekseep失败_MindIE_昇腾论坛
mis-tei 是否支持 Qwen3-Embedding 及 Qwen3-Reranker_MindIE_昇腾论坛
mis-tei 启动后一直输出 waiting for python backend to be ready_MindIE_昇腾论坛
Qwen3-32B的mindie:2.0.T17镜像还能下载吗_MindIE_昇腾论坛
ascend-device-pulgin-branch_v6.0.0.0-RC3 K8s部署大模型报错_MindIE_昇腾论坛
双机直连跑deepseekINT8量化模型,不报错但日志卡住_MindIE_昇腾论坛
(DMA) hardware execution error_MindIE_昇腾论坛
MindIe2.0RC1 容器化部署时必需使用24.0.0 及以上版本吗,20.0.0.0 能否进行部署?_MindIE_昇腾论坛
Ascend 310啥时候可以兼容Qwen3-14B呀?_MindIE_昇腾论坛
使用openai 接口 在多模态任务下 历史多轮数据格式问题_MindIE_昇腾论坛
MindIE Server使用https报错_MindIE_昇腾论坛
-
+
首页
MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛
# MindIE 运行DeepSeek-R1-Distill-Qwen-32B 无法启动_MindIE_昇腾论坛 ## 概述 本文档基于昇腾社区论坛帖子生成的技术教程。 **原始链接**: https://www.hiascend.com/forum/thread-02117188700572290225-1-1.html **生成时间**: 2025-08-27 10:33:58 --- ## 问题描述 环境 NPU: Atlas 300I Duo * 4 OS: openeuler 22.03 LTS arm64 Mindie镜像 mindie:2.0.RC2-800I-A2-py311-openeuler24.03-lts 驱动版本24.1.0.1 固件版本7.5.0.5.220 宿主机npu-smi No running processes found in NPU 1 No running processes found in NPU 2 No running processes found in NPU 4 No running processes found in NPU 5 复制 容器内npu-smi No running processes found in NPU 0 No running processes found in NPU 32 No running processes found in NPU 32768 No running processes found in NPU 32800 复制 容器内dev/davinci* davinci0 davinci1 davinci2 davinci3 davinci4 davinci5 davinci6 davinci7 davinci manager 复制 启动命令 docker run -it -d --net=host --shm-size=1g \ --privileged \ --name ai \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \ -v /usr/local/sbin:/usr/local/sbin:ro \ -v /data/npu/modelscope:/path-to-weights:ro \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC2-300I-Duo-py311-openeuler24.03-lts bash 复制 模型config.json { "arch... ## 相关代码 ### 代码示例 1 ``` No running processes found in NPU 1 No running processes found in NPU 2 No running processes found in NPU 4 No running processes found in NPU 5 复制 ``` ### 代码示例 2 ``` No running processes found in NPU 1 No running processes found in NPU 2 No running processes found in NPU 4 No running processes found in NPU 5 ``` ### 代码示例 3 ``` No running processes found in NPU 0 No running processes found in NPU 32 No running processes found in NPU 32768 No running processes found in NPU 32800 复制 ``` ### 代码示例 4 ``` No running processes found in NPU 0 No running processes found in NPU 32 No running processes found in NPU 32768 No running processes found in NPU 32800 ``` ### 代码示例 5 ``` davinci0 davinci1 davinci2 davinci3 davinci4 davinci5 davinci6 davinci7 davinci manager 复制 ``` ## 相关图片 ### 图片 1  **图片地址**: https://fileserver.developer.huaweicloud.com/FileServer/getFile/cmtybbs/3b4/b77/b8c/482ecb08413b4b77b8c2a04191392446.20250724014823.92434149821804185904910506930492:20250827020745:2400:E3F6D9A575A6AF1A37F4B7ABBCC7D05A8CF16B0CF2558D3A99821E50AAA2E79E.png **图片描述**: cke_166.png ## 完整内容 环境 NPU: Atlas 300I Duo * 4 OS: openeuler 22.03 LTS arm64 Mindie镜像 mindie:2.0.RC2-800I-A2-py311-openeuler24.03-lts 驱动版本24.1.0.1 固件版本7.5.0.5.220 宿主机npu-smi No running processes found in NPU 1 No running processes found in NPU 2 No running processes found in NPU 4 No running processes found in NPU 5 复制 容器内npu-smi No running processes found in NPU 0 No running processes found in NPU 32 No running processes found in NPU 32768 No running processes found in NPU 32800 复制 容器内dev/davinci* davinci0 davinci1 davinci2 davinci3 davinci4 davinci5 davinci6 davinci7 davinci manager 复制 启动命令 docker run -it -d --net=host --shm-size=1g \ --privileged \ --name ai \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \ -v /usr/local/sbin:/usr/local/sbin:ro \ -v /data/npu/modelscope:/path-to-weights:ro \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC2-300I-Duo-py311-openeuler24.03-lts bash 复制 模型config.json { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 27648, "max_position_embeddings": 131072, "max_window_layers": 64, "model_type": "qwen2", "num_attention_heads": 40, "num_hidden_layers": 64, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.43.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 复制 MindIE config.json { "Version" : "1.0.0", "ServerConfig" : { "ipAddress" : "127.0.0.1", "managementIpAddress" : "127.0.0.2", "port" : 1040, "managementPort" : 1041, "metricsPort" : 1042, "allowAllZeroIpListening" : false, "maxLinkNum" : 1000, "httpsEnabled" : false, "fullTextEnabled" : false, "tlsCaPath" : "security/ca/", "tlsCaFile" : ["ca.pem"], "tlsCert" : "security/certs/server.pem", "tlsPk" : "security/keys/server.key.pem", "tlsPkPwd" : "security/pass/key_pwd.txt", "tlsCrlPath" : "security/certs/", "tlsCrlFiles" : ["server_crl.pem"], "managementTlsCaFile" : ["management_ca.pem"], "managementTlsCert" : "security/certs/management/server.pem", "managementTlsPk" : "security/keys/management/server.key.pem", "managementTlsPkPwd" : "security/pass/management/key_pwd.txt", "managementTlsCrlPath" : "security/management/certs/", "managementTlsCrlFiles" : ["server_crl.pem"], "kmcKsfMaster" : "tools/pmt/master/ksfa", "kmcKsfStandby" : "tools/pmt/standby/ksfb", "inferMode" : "standard", "interCommTLSEnabled" : true, "interCommPort" : 1121, "interCommTlsCaPath" : "security/grpc/ca/", "interCommTlsCaFiles" : ["ca.pem"], "interCommTlsCert" : "security/grpc/certs/server.pem", "interCommPk" : "security/grpc/keys/server.key.pem", "interCommPkPwd" : "security/grpc/pass/key_pwd.txt", "interCommTlsCrlPath" : "security/grpc/certs/", "interCommTlsCrlFiles" : ["server_crl.pem"], "openAiSupport" : "vllm", "tokenTimeout" : 600, "e2eTimeout" : 600, "distDPServerEnabled":false }, "BackendConfig" : { "backendName" : "mindieservice_llm_engine", "modelInstanceNumber" : 1, "npuDeviceIds" : [[0,1]], "tokenizerProcessNumber" : 8, "multiNodesInferEnabled" : false, "multiNodesInferPort" : 1120, "interNodeTLSEnabled" : true, "interNodeTlsCaPath" : "security/grpc/ca/", "interNodeTlsCaFiles" : ["ca.pem"], "interNodeTlsCert" : "security/grpc/certs/server.pem", "interNodeTlsPk" : "security/grpc/keys/server.key.pem", "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt", "interNodeTlsCrlPath" : "security/grpc/certs/", "interNodeTlsCrlFiles" : ["server_crl.pem"], "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa", "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb", "ModelDeployConfig" : { "maxSeqLen" : 2560, "maxInputTokenLen" : 2048, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "qwen", "modelWeightPath" : "/path-to-weights", "worldSize" : 2, "cpuMemSize" : 5, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false } ] }, "ScheduleConfig" : { "templateType" : "Standard", "templateName" : "Standard_LLM", "cacheBlockSize" : 128, "maxPrefillBatchSize" : 50, "maxPrefillTokens" : 8192, "prefillTimeMsPerReq" : 150, "prefillPolicyType" : 0, "decodeTimeMsPerReq" : 50, "decodePolicyType" : 0, "maxBatchSize" : 200, "maxIterTimes" : 512, "maxPreemptCount" : 0, "supportSelectBatch" : false, "maxQueueDelayMicroseconds" : 5000 } } } ## 启动日志 ```txt [root@bm-node03 bin]# ./mindieservice_daemon [WARNING] Check path: config.json failed, by:Check path: config.json failed, by:owner id diff: current process user id is 0, file owner id is 1000 The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. Log default log dir is ~/mindie/log, your can use env MINDIE_LOG_PATH to change log saving dir. Log default log dir is ~/mindie/log, your can use env MINDIE_LOG_PATH to change log saving dir. Log default log dir is ~/mindie/log, your can use env MINDIE_LOG_PATH to change log saving dir. [msservice_profiler] [PID:3180] [DEBUG] [ReadEnable:344] profile enable_: false [msservice_profiler] [PID:3182] [DEBUG] [ReadEnable:344] profile enable_: false [msservice_profiler] [PID:3180] [DEBUG] [ReadAclTaskTime:372] profile enableAclTaskTime_: false [msservice_profiler] [PID:3182] [DEBUG] [ReadAclTaskTime:372] profile enableAclTaskTime_: false The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Daemon is killing... Killed 复制 ``` 试过并不好使 报啥错呢 报错信息贴在楼下了 [2025-07-24 14:50:20,668] torch.distributed.run: [WARNING] [2025-07-24 14:50:20,668] torch.distributed.run: [WARNING] ***************************************** [2025-07-24 14:50:20,668] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2025-07-24 14:50:20,668] torch.distributed.run: [WARNING] ***************************************** The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable ATB_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_PATH will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_PATH as soon as possible. The old environment variable ATB_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable MINDIE_LLM_PYTHON_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable MINDIE_LLM_LOG_TO_FILE will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_FILE as soon as possible. The old environment variable OCK_LOG_LEVEL will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_LEVEL as soon as possible. The old environment variable OCK_LOG_TO_STDOUT will be deprecated on 2025/12/31. Please use the new environment variable MINDIE_LOG_TO_STDOUT as soon as possible. [2025-07-24 14:50:31,572] [925] [281473776835520] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 1, device_id: 1, numa_id: 0, shard_devices: [0, 1, 2, 3], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2025-07-24 14:50:31,576] [925] [281473776835520] [llmmodels] [INFO] [cpu_binding.py-280] : process 925, new_affinity is [6, 7, 8, 9, 10, 11], cpu count 6 [2025-07-24 14:50:31,715] [926] [281473398946752] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 2, device_id: 2, numa_id: 0, shard_devices: [0, 1, 2, 3], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2025-07-24 14:50:31,718] [926] [281473398946752] [llmmodels] [INFO] [cpu_binding.py-280] : process 926, new_affinity is [12, 13, 14, 15, 16, 17], cpu count 6 [2025-07-24 14:50:32,115] [924] [281473518373824] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 0, device_id: 0, numa_id: 0, shard_devices: [0, 1, 2, 3], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2025-07-24 14:50:32,118] [924] [281473518373824] [llmmodels] [INFO] [cpu_binding.py-280] : process 924, new_affinity is [0, 1, 2, 3, 4, 5], cpu count 6 [2025-07-24 14:50:32,233] [927] [281473191488448] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 3, device_id: 3, numa_id: 0, shard_devices: [0, 1, 2, 3], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2025-07-24 14:50:32,236] [927] [281473191488448] [llmmodels] [INFO] [cpu_binding.py-280] : process 927, new_affinity is [18, 19, 20, 21, 22, 23], cpu count 6 [2025-07-24 14:50:32,791] [924] [281473518373824] [llmmodels] [INFO] [model_runner.py-154] : model_runner.quantize: None, model_runner.kv_quant_type: None, model_runner.fa_quant_type: None, model_runner.dtype: torch.float16 [2025-07-24 14:50:41,554] [927] [281473191488448] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-07-24 14:50:41,810] [927] [281473191488448] [llmmodels] [INFO] [flash_causal_qwen2.py-152] : >>>> qwen_QwenDecoderModel is called. [2025-07-24 14:50:42,022] [926] [281473398946752] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-07-24 14:50:42,234] [926] [281473398946752] [llmmodels] [INFO] [flash_causal_qwen2.py-152] : >>>> qwen_QwenDecoderModel is called. [2025-07-24 14:50:42,430] [924] [281473518373824] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-07-24 14:50:42,454] [924] [281473518373824] [llmmodels] [INFO] [model_runner.py-176] : init tokenizer done [2025-07-24 14:50:42,558] [924] [281473518373824] [llmmodels] [INFO] [flash_causal_qwen2.py-152] : >>>> qwen_QwenDecoderModel is called. [2025-07-24 14:50:42,866] [925] [281473776835520] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-07-24 14:50:43,110] [925] [281473776835520] [llmmodels] [INFO] [flash_causal_qwen2.py-152] : >>>> qwen_QwenDecoderModel is called. [2025-07-24 14:50:59,004] [924] [281473518373824] [llmmodels] [INFO] [model_runner.py-269] : model: FlashQwen2ForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorParallelEmbedding() (h): ModuleList( (0-47): 48 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) [2025-07-24 14:51:00,588] [926] [281473398946752] [llmmodels] [INFO] [cache.py-153] : kv cache will allocate 0.052734375GB memory [2025-07-24 14:51:00,993] [927] [281473191488448] [llmmodels] [INFO] [cache.py-153] : kv cache will allocate 0.052734375GB memory [2025-07-24 14:51:01,921] [924] [281473518373824] [llmmodels] [INFO] [run_pa.py-131] : hbm_capacity(GB): 43.2421875, init_memory(GB): 9.080859374254942 [2025-07-24 14:51:01,921] [924] [281473518373824] [llmmodels] [INFO] [run_pa.py-546] : pa_runner: PARunner(model_path=/path-to-weights/, input_text=None, max_position_embeddings=None, max_input_length=1024, max_output_length=20, max_prefill_tokens=-1, load_tokenizer=True, enable_atb_torch=False, max_prefill_batch_size=None, max_batch_size=1, dtype=torch.float16, block_size=128, model_config=ModelConfig(num_heads=10, num_kv_heads=2, num_kv_heads_origin=8, head_size=128, k_head_size=128, v_head_size=128, num_layers=48, device=npu:0, dtype=torch.float16, soc_info=NPUSocInfo(soc_name='', soc_version=202, need_nz=True, matmul_nd_nz=False), kv_quant_type=None, fa_quant_type=None, mapping=Mapping(world_size=4, rank=0, num_nodes=1,pp_rank=0, pp_groups=[[0], [1], [2], [3]], micro_batch_size=1, attn_dp_groups=[[0], [1], [2], [3]], attn_tp_groups=[[0, 1, 2, 3]], attn_inner_sp_groups=[[0], [1], [2], [3]], attn_o_proj_tp_groups=[[0], [1], [2], [3]], mlp_tp_groups=[[0, 1, 2, 3]], moe_ep_groups=[[0], [1], [2], [3]], moe_tp_groups=[[0, 1, 2, 3]]), cla_share_factor=1, model_type=qwen2, nz_cache=False), max_memory=46430945280, [2025-07-24 14:51:01,922] [924] [281473518373824] [llmmodels] [INFO] [run_pa.py-244] : ---------------begin warm_up--------------- [2025-07-24 14:51:01,922] [924] [281473518373824] [llmmodels] [INFO] [cache.py-153] : kv cache will allocate 0.052734375GB memory [2025-07-24 14:51:01,926] [924] [281473518373824] [llmmodels] [INFO] [generate.py-1055] : ------total req num: 1, infer start-------- [2025-07-24 14:51:02,086] [925] [281473776835520] [llmmodels] [INFO] [cache.py-153] : kv cache will allocate 0.052734375GB memory [2025-07-24 14:51:03,170] [924] [281473518373824] [llmmodels] [INFO] [flash_causal_qwen2.py-556] : <<<<<<< ori k_caches[0].shape=torch.Size([9, 16, 128, 16]) [2025-07-24 14:51:03,172] [924] [281473518373824] [llmmodels] [INFO] [flash_causal_qwen2.py-560] : <<<<<<<after transdata k_caches[0].shape=torch.Size([9, 16, 128, 16]) [2025-07-24 14:51:03,173] [924] [281473518373824] [llmmodels] [INFO] [flash_causal_qwen2.py-580] : >>>>>>id of kcache is 281470384633424 id of vcache is 281470384633520 [2025-07-24 14:51:03,173] [924] [281473518373824] [llmmodels] [INFO] [flash_causal_lm.py-498] : flash_causal_lm reset: True Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/usr/local/Ascend/atb-models/examples/run_pa.py", line 547, in <module> pa_runner.warm_up() File "/usr/local/Ascend/atb-models/examples/run_pa.py", line 273, in warm_up generate_req(req_list, self.model, self.max_batch_size, self.max_prefill_tokens, self.cache_manager) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 1143, in generate_req generate_token_with_clocking(model, cache_manager, batch) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 810, in generate_token_with_clocking res = generate_token(model, cache_manager, input_batch_in) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 587, in generate_token logits = model.forward( File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 297, in forward res = self.model.forward(**kwargs) File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 493, in forward self.init_ascend_weight() File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 282, in init_ascend_weight self.acl_encoder_operation.set_param(json.dumps({**encoder_param})) RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/usr/local/Ascend/atb-models/examples/run_pa.py", line 547, in <module> pa_runner.warm_up() File "/usr/local/Ascend/atb-models/examples/run_pa.py", line 273, in warm_up generate_req(req_list, self.model, self.max_batch_size, self.max_prefill_tokens, self.cache_manager) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 1143, in generate_req generate_token_with_clocking(model, cache_manager, batch) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 810, in generate_token_with_clocking res = generate_token(model, cache_manager, input_batch_in) File "/usr/local/Ascend/atb-models/examples/server/generate.py", line 587, in generate_token logits = model.forward( File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 297, in forward res = self.model.forward(**kwargs) File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 493, in forward self.init_ascend_weight() File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 282, in init_ascend_weight self.acl_encoder_operation.set_param(json.dumps({**encoder_param})) RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. [2025-07-24 14:51:03,333] [925] [281473776835520] [llmmodels] [INFO] [flash_causal_qwen2.py-560] : <<<<<<<after transdata k_caches[0].shape=torch.Size([9, 16, 128, 16]) [ERROR] 2025-07-24-14:51:14 (PID:926, Device:2, RankID:-1) ERR99999 UNKNOWN application exception [ERROR] 2025-07-24-14:51:14 (PID:927, Device:3, RankID:-1) ERR99999 UNKNOWN application exception [2025-07-24 14:51:25,678] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 924 closing signal SIGTERM [2025-07-24 14:51:25,678] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 925 closing signal SIGTERM [2025-07-24 14:51:26,042] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 2 (pid: 926) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in <module> sys.exit(main()) File "/usr/local/lib64/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib64/python3.11/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/usr/local/lib64/python3.11/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/usr/local/lib64/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib64/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ examples.run_pa FAILED ------------------------------------------------------------ Failures: [1]: time : 2025-07-24_14:51:25 host : bm-node03 rank : 3 (local_rank: 3) exitcode : 1 (pid: 927) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-07-24_14:51:25 host : bm-node03 rank : 2 (local_rank: 2) exitcode : 1 (pid: 926) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [root@bm-node03 atb-models]# /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' https://support.huawei.com/enterprise/zh/doc/EDOC1100468889/788a553e?idPath=23710424|251366513|254884019|261408772|252764743设置所有卡BAR空间拷贝使能重启OS生效 --- ## 技术要点总结 基于以上内容,主要技术要点包括: 1. **问题类型**: 错误处理 2. **涉及技术**: MindIE, HTTPS, PyTorch, NPU, Docker, Atlas, 昇腾, AI 3. **解决方案**: 请参考完整内容中的解决方案 ## 相关资源 - 昇腾社区: https://www.hiascend.com/ - 昇腾论坛: https://www.hiascend.com/forum/ --- *本文档由AI自动生成,仅供参考。如有疑问,请参考原始帖子。*
yg9538
2025年8月27日 11:06
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
Word文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码
有效期