310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛

# 310p运行vllm-ascend报[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception_MindIE_昇腾论坛

## 概述

本文档基于昇腾社区论坛帖子生成的技术教程。

**原始链接**: https://www.hiascend.com/forum/thread-0259190364947368222-1-1.html

**生成时间**: 2025-08-27 10:33:56

---

## 问题描述

屏幕日志如下 INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend...

## 相关代码

### 代码示例 1

```
INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated
WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960
INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66
INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67 → 66 sizes
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end.
INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null}
[rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B...
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ...
Downloading [model.safetensors]: 100%|█████████████████████████████████████████| 1.40G/1.40G [12:46<00:00, 1.96MB/s]
Processing 1 items: 100%|█████████████████████████████████████████████████████████| 1.00/1.00 [12:46<00:00, 767s/it]
2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully.
2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.99it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.99it/s]

INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds
INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB
INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile
INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s
INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s
.INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total
INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480
INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens
INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x
[WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match.
[WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match.
ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start.
ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last):
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
ERROR 08-12 01:31:20 [core.py:586]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
ERROR 08-12 01:31:20 [core.py:586]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__
ERROR 08-12 01:31:20 [core.py:586]     self._initialize_kv_caches(vllm_config)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches
ERROR 08-12 01:31:20 [core.py:586]     self.model_executor.initialize_from_config(kv_cache_configs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config
ERROR 08-12 01:31:20 [core.py:586]     self.collective_rpc("compile_or_warm_up_model")
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 08-12 01:31:20 [core.py:586]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method
ERROR 08-12 01:31:20 [core.py:586]     return func(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model
ERROR 08-12 01:31:20 [core.py:586]     self.model_runner.capture_model()
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model
ERROR 08-12 01:31:20 [core.py:586]     self._dummy_run(num_tokens)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 08-12 01:31:20 [core.py:586]     return func(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run
ERROR 08-12 01:31:20 [core.py:586]     hidden_states = model(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
ERROR 08-12 01:31:20 [core.py:586]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
ERROR 08-12 01:31:20 [core.py:586]     model_output = self.forward(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
ERROR 08-12 01:31:20 [core.py:586]     def forward(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
ERROR 08-12 01:31:20 [core.py:586]     return fn(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
ERROR 08-12 01:31:20 [core.py:586]     return self._wrapped_call(self, *args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
ERROR 08-12 01:31:20 [core.py:586]     raise e
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
ERROR 08-12 01:31:20 [core.py:586]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "<eval_with_key>.58", line 234, in forward
ERROR 08-12 01:31:20 [core.py:586]     submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
ERROR 08-12 01:31:20 [core.py:586]     with torch.npu.graph(aclgraph, pool=self.graph_pool):
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__
ERROR 08-12 01:31:20 [core.py:586]     self.npu_graph.capture_begin(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin
ERROR 08-12 01:31:20 [core.py:586]     super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch.
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core
raise e
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
super().__init__(vllm_config, executor_class, log_stats,
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__
self._initialize_kv_caches(vllm_config)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches
self.model_executor.initialize_from_config(kv_cache_configs)
File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config
self.collective_rpc("compile_or_warm_up_model")
File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method
return func(*args, **kwargs)
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model
self.model_runner.capture_model()
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model
self._dummy_run(num_tokens)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run
hidden_states = model(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
hidden_states = self.model(input_ids, positions, intermediate_tensors,
File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
model_output = self.forward(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
def forward(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
raise e
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "<eval_with_key>.58", line 234, in forward
submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None
File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
with torch.npu.graph(aclgraph, pool=self.graph_pool):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__
self.npu_graph.capture_begin(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin
super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch.
Traceback (most recent call last):
File "/workspace/test.py", line 9, in <module>
llm = LLM(model="Qwen/Qwen3-0.6B")
File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args
return engine_cls.from_vllm_config(
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config
return cls(vllm_config=vllm_config,
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__
self.engine_core = EngineCoreClient.make_client(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__
super().__init__(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__
with launch_core_engines(vllm_config, executor_class,
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines
wait_for_engine_startup(
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception复制
```

### 代码示例 2

```
INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated
WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960
INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66
INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67 → 66 sizes
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end.
INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null}
[rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B...
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ...
Downloading [model.safetensors]: 100%|█████████████████████████████████████████| 1.40G/1.40G [12:46<00:00, 1.96MB/s]
Processing 1 items: 100%|█████████████████████████████████████████████████████████| 1.00/1.00 [12:46<00:00, 767s/it]
2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully.
2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.99it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.99it/s]

INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds
INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB
INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile
INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s
INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s
.INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total
INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480
INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens
INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x
[WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match.
[WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match.
ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start.
ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last):
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
ERROR 08-12 01:31:20 [core.py:586]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
ERROR 08-12 01:31:20 [core.py:586]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__
ERROR 08-12 01:31:20 [core.py:586]     self._initialize_kv_caches(vllm_config)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches
ERROR 08-12 01:31:20 [core.py:586]     self.model_executor.initialize_from_config(kv_cache_configs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config
ERROR 08-12 01:31:20 [core.py:586]     self.collective_rpc("compile_or_warm_up_model")
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 08-12 01:31:20 [core.py:586]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method
ERROR 08-12 01:31:20 [core.py:586]     return func(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model
ERROR 08-12 01:31:20 [core.py:586]     self.model_runner.capture_model()
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model
ERROR 08-12 01:31:20 [core.py:586]     self._dummy_run(num_tokens)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 08-12 01:31:20 [core.py:586]     return func(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run
ERROR 08-12 01:31:20 [core.py:586]     hidden_states = model(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
ERROR 08-12 01:31:20 [core.py:586]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
ERROR 08-12 01:31:20 [core.py:586]     model_output = self.forward(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
ERROR 08-12 01:31:20 [core.py:586]     def forward(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
ERROR 08-12 01:31:20 [core.py:586]     return fn(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
ERROR 08-12 01:31:20 [core.py:586]     return self._wrapped_call(self, *args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
ERROR 08-12 01:31:20 [core.py:586]     raise e
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
ERROR 08-12 01:31:20 [core.py:586]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-12 01:31:20 [core.py:586]     return self._call_impl(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-12 01:31:20 [core.py:586]     return forward_call(*args, **kwargs)
ERROR 08-12 01:31:20 [core.py:586]   File "<eval_with_key>.58", line 234, in forward
ERROR 08-12 01:31:20 [core.py:586]     submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None
ERROR 08-12 01:31:20 [core.py:586]   File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
ERROR 08-12 01:31:20 [core.py:586]     with torch.npu.graph(aclgraph, pool=self.graph_pool):
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__
ERROR 08-12 01:31:20 [core.py:586]     self.npu_graph.capture_begin(
ERROR 08-12 01:31:20 [core.py:586]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin
ERROR 08-12 01:31:20 [core.py:586]     super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch.
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core
raise e
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
super().__init__(vllm_config, executor_class, log_stats,
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__
self._initialize_kv_caches(vllm_config)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches
self.model_executor.initialize_from_config(kv_cache_configs)
File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config
self.collective_rpc("compile_or_warm_up_model")
File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method
return func(*args, **kwargs)
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model
self.model_runner.capture_model()
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model
self._dummy_run(num_tokens)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run
hidden_states = model(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
hidden_states = self.model(input_ids, positions, intermediate_tensors,
File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
model_output = self.forward(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
def forward(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
raise e
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "<eval_with_key>.58", line 234, in forward
submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None
File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
with torch.npu.graph(aclgraph, pool=self.graph_pool):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__
self.npu_graph.capture_begin(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin
super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch.
Traceback (most recent call last):
File "/workspace/test.py", line 9, in <module>
llm = LLM(model="Qwen/Qwen3-0.6B")
File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args
return engine_cls.from_vllm_config(
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config
return cls(vllm_config=vllm_config,
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__
self.engine_core = EngineCoreClient.make_client(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__
super().__init__(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__
with launch_core_engines(vllm_config, executor_class,
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines
wait_for_engine_startup(
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
```

### 代码示例 3

```
root@k8s-master:/workspace# python collect_env.py
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.3
Libc version: glibc-2.35

Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Vendor ID:                       HiSilicon
Model name:                      Kunpeng-920
Model:                           0
Thread(s) per core:              1
Core(s) per cluster:             32
Socket(s):                       -
Cluster(s):                      2
Stepping:                        0x1
CPU max MHz:                     2600.0000
CPU min MHz:                     200.0000
BogoMIPS:                        200.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
L1d cache:                       4 MiB (64 instances)
L1i cache:                       4 MiB (64 instances)
L2 cache:                        32 MiB (64 instances)
L3 cache:                        64 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-31
NUMA node1 CPU(s):               32-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250619
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.2
vLLM Ascend Version: 0.9.2rc1

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
VLLM_USE_MODELSCOPE=true
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

NPU:
+--------------------------------------------------------------------------------------------------------+
| npu-smi 25.0.rc1.1                               Version: 25.0.rc1.1                                   |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 32      310P3                 | OK              | NA           45                0     / 0             |
| 0       0                     | 0000:02:00.0    | 0            1850 / 21527                            |
+===============================+=================+======================================================+
| 96      310P3                 | OK              | NA           46                0     / 0             |
| 0       1                     | 0000:04:00.0    | 0            1851 / 21527                            |
+===============================+=================+======================================================+
| 32800   310P3                 | OK              | NA           47                0     / 0             |
| 0       2                     | 0000:82:00.0    | 0            1855 / 21527                            |
+===============================+=================+======================================================+
| 32896   310P3                 | OK              | NA           50                0     / 0             |
| 0       3                     | 0000:85:00.0    | 0            1850 / 21527                            |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
+===============================+=================+======================================================+
| No running processes found in NPU 32                                                                   |
+===============================+=================+======================================================+
| No running processes found in NPU 96                                                                   |
+===============================+=================+======================================================+
| No running processes found in NPU 32800                                                                |
+===============================+=================+======================================================+
| No running processes found in NPU 32896                                                                |
+===============================+=================+======================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
复制
```

### 代码示例 4

```
root@k8s-master:/workspace# python collect_env.py
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.3
Libc version: glibc-2.35

Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Vendor ID:                       HiSilicon
Model name:                      Kunpeng-920
Model:                           0
Thread(s) per core:              1
Core(s) per cluster:             32
Socket(s):                       -
Cluster(s):                      2
Stepping:                        0x1
CPU max MHz:                     2600.0000
CPU min MHz:                     200.0000
BogoMIPS:                        200.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
L1d cache:                       4 MiB (64 instances)
L1i cache:                       4 MiB (64 instances)
L2 cache:                        32 MiB (64 instances)
L3 cache:                        64 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-31
NUMA node1 CPU(s):               32-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250619
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.2
vLLM Ascend Version: 0.9.2rc1

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
VLLM_USE_MODELSCOPE=true
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

NPU:
+--------------------------------------------------------------------------------------------------------+
| npu-smi 25.0.rc1.1                               Version: 25.0.rc1.1                                   |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 32      310P3                 | OK              | NA           45                0     / 0             |
| 0       0                     | 0000:02:00.0    | 0            1850 / 21527                            |
+===============================+=================+======================================================+
| 96      310P3                 | OK              | NA           46                0     / 0             |
| 0       1                     | 0000:04:00.0    | 0            1851 / 21527                            |
+===============================+=================+======================================================+
| 32800   310P3                 | OK              | NA           47                0     / 0             |
| 0       2                     | 0000:82:00.0    | 0            1855 / 21527                            |
+===============================+=================+======================================================+
| 32896   310P3                 | OK              | NA           50                0     / 0             |
| 0       3                     | 0000:85:00.0    | 0            1850 / 21527                            |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
+===============================+=================+======================================================+
| No running processes found in NPU 32                                                                   |
+===============================+=================+======================================================+
| No running processes found in NPU 96                                                                   |
+===============================+=================+======================================================+
| No running processes found in NPU 32800                                                                |
+===============================+=================+======================================================+
| No running processes found in NPU 32896                                                                |
+===============================+=================+======================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
```

## 完整内容

屏幕日志如下 INFO 08-12 01:17:01 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 08-12 01:17:01 [__init__.py:41] - ascend -> vllm_ascend:register INFO 08-12 01:17:01 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 08-12 01:17:01 [__init__.py:235] Platform plugin ascend is activated WARNING 08-12 01:17:02 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 08-12 01:17:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 08-12 01:17:05 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:09,246 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:11,712 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:24 [config.py:841] This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'. INFO 08-12 01:17:24 [config.py:1472] Using max model len 40960 INFO 08-12 01:17:24 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 08-12 01:17:24 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 08-12 01:17:24 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 08-12 01:17:24 [utils.py:336] Adjusted ACL graph batch sizes for Qwen3ForCausalLM model (layers: 28): 67  66 sizes Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:28,842 - modelscope - INFO - Target directory already exists, skipping creation. Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:17:32,924 - modelscope - INFO - Target directory already exists, skipping creation. INFO 08-12 01:17:32 [core.py:526] Waiting for init message from front-end. INFO 08-12 01:17:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} [rank0]:[W812 01:17:43.665372666 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) INFO 08-12 01:17:54 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 08-12 01:18:01 [model_runner_v1.py:1745] Starting to load model Qwen/Qwen3-0.6B... Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B 2025-08-12 01:18:06,622 - modelscope - INFO - Got 1 files, start to download ... Downloading [model.safetensors]: 100%|| 1.40G/1.40G [12:46<00:00, 1.96MB/s] Processing 1 items: 100%|| 1.00/1.00 [12:46<00:00, 767s/it] 2025-08-12 01:30:53,281 - modelscope - INFO - Download model 'Qwen/Qwen3-0.6B' successfully. 2025-08-12 01:30:53,282 - modelscope - INFO - Target directory already exists, skipping creation. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.99it/s] INFO 08-12 01:30:53 [default_loader.py:272] Loading weights took 0.34 seconds INFO 08-12 01:30:54 [model_runner_v1.py:1777] Loading model weights took 1.1202 GB INFO 08-12 01:31:08 [backends.py:508] Using cache directory: /root/.cache/vllm/torch_compile_cache/c878ede1af/rank_0_0/backbone for vLLM's torch.compile INFO 08-12 01:31:08 [backends.py:519] Dynamo bytecode transform time: 7.62 s INFO 08-12 01:31:11 [backends.py:193] Compiling a graph for general shape takes 1.66 s .INFO 08-12 01:31:18 [monitor.py:34] torch.compile takes 9.28 s in total INFO 08-12 01:31:19 [worker_v1.py:181] Available memory: 16910831104, total memory: 22573076480 INFO 08-12 01:31:19 [kv_cache_utils.py:716] GPU KV cache size: 147,328 tokens INFO 08-12 01:31:19 [kv_cache_utils.py:720] Maximum concurrency for 40,960 tokens per request: 3.60x [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158:Feature is not supportted and the possible cause is that driver and firmware packages do not match. [WARN]operator(),build/CMakeFiles/torch_npu.dir/compiler_depend.ts:161:Feature is not supportted and the possible cause is that driver and firmware packages do not match. ERROR 08-12 01:31:20 [core.py:586] EngineCore failed to start. ERROR 08-12 01:31:20 [core.py:586] Traceback (most recent call last): ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 08-12 01:31:20 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 08-12 01:31:20 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ ERROR 08-12 01:31:20 [core.py:586] self._initialize_kv_caches(vllm_config) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches ERROR 08-12 01:31:20 [core.py:586] self.model_executor.initialize_from_config(kv_cache_configs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config ERROR 08-12 01:31:20 [core.py:586] self.collective_rpc("compile_or_warm_up_model") ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 08-12 01:31:20 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model ERROR 08-12 01:31:20 [core.py:586] self.model_runner.capture_model() ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model ERROR 08-12 01:31:20 [core.py:586] self._dummy_run(num_tokens) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 08-12 01:31:20 [core.py:586] return func(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run ERROR 08-12 01:31:20 [core.py:586] hidden_states = model( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward ERROR 08-12 01:31:20 [core.py:586] hidden_states = self.model(input_ids, positions, intermediate_tensors, ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ ERROR 08-12 01:31:20 [core.py:586] model_output = self.forward(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward ERROR 08-12 01:31:20 [core.py:586] def forward( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn ERROR 08-12 01:31:20 [core.py:586] return fn(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped ERROR 08-12 01:31:20 [core.py:586] return self._wrapped_call(self, *args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ ERROR 08-12 01:31:20 [core.py:586] raise e ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ ERROR 08-12 01:31:20 [core.py:586] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl ERROR 08-12 01:31:20 [core.py:586] return self._call_impl(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl ERROR 08-12 01:31:20 [core.py:586] return forward_call(*args, **kwargs) ERROR 08-12 01:31:20 [core.py:586] File "<eval_with_key>.58", line 234, in forward ERROR 08-12 01:31:20 [core.py:586] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None ERROR 08-12 01:31:20 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ ERROR 08-12 01:31:20 [core.py:586] with torch.npu.graph(aclgraph, pool=self.graph_pool): ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ ERROR 08-12 01:31:20 [core.py:586] self.npu_graph.capture_begin( ERROR 08-12 01:31:20 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin ERROR 08-12 01:31:20 [core.py:586] super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) ERROR 08-12 01:31:20 [core.py:586] RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 82, in __init__ self._initialize_kv_caches(vllm_config) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 169, in _initialize_kv_caches self.model_executor.initialize_from_config(kv_cache_configs) File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 66, in initialize_from_config self.collective_rpc("compile_or_warm_up_model") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 253, in compile_or_warm_up_model self.model_runner.capture_model() File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2064, in capture_model self._dummy_run(num_tokens) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1663, in _dummy_run hidden_states = model( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward hidden_states = self.model(input_ids, positions, intermediate_tensors, File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__ model_output = self.forward(*args, **kwargs) File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward def forward( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn return fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped return self._wrapped_call(self, *args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__ raise e File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__ return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "<eval_with_key>.58", line 234, in forward submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_k_norm_parameters_weight_ = None File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__ with torch.npu.graph(aclgraph, pool=self.graph_pool): File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 310, in __enter__ self.npu_graph.capture_begin( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/graphs.py", line 210, in capture_begin super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) RuntimeError: status == aclmdlRICaptureStatus::ACL_MODEL_RI_CAPTURE_STATUS_ACTIVE INTERNAL ASSERT FAILED at "build/CMakeFiles/torch_npu.dir/compiler_depend.ts":162, please report a bug to PyTorch. Traceback (most recent call last): File "/workspace/test.py", line 9, in <module> llm = LLM(model="Qwen/Qwen3-0.6B") File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args return engine_cls.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config return cls(vllm_config=vllm_config, File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__ self.engine_core = EngineCoreClient.make_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-08-12-01:31:20 (PID:909, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception复制 系统环境如下 root@k8s-master:/workspace# python collect_env.py Collecting environment information... PyTorch version: 2.5.1 Is debug build: False OS: Ubuntu 22.04.5 LTS (aarch64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 4.0.3 Libc version: glibc-2.35 Python version: 3.10.17 (main, May 27 2025, 01:33:16) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-4.19.90-2003.4.0.0036.oe1.aarch64-aarch64-with-glibc2.35 CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: HiSilicon Model name: Kunpeng-920 Model: 0 Thread(s) per core: 1 Core(s) per cluster: 32 Socket(s): - Cluster(s): 2 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm L1d cache: 4 MiB (64 instances) L1i cache: 4 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 64 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pyzmq==27.0.0 [pip3] torch==2.5.1 [pip3] torch-npu==2.5.1.post1.dev20250619 [pip3] torchvision==0.20.1 [pip3] transformers==4.52.4 [conda] Could not collect vLLM Version: 0.9.2 vLLM Ascend Version: 0.9.2rc1 ENV Variables: ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240 ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1 ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0 ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0 ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32 ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0 ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5 VLLM_USE_MODELSCOPE=true ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0 ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest ATB_COMPARE_TILING_EVERY_KERNEL=0 ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/: ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3 ATB_RUNNER_POOL_SIZE=64 ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0 ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest ATB_MATMUL_SHUFFLE_K_ENABLE=1 ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1 ATB_HOST_TILING_BUFFER_BLOCK_NUM=128 ATB_SHARE_MEMORY_NAME_SUFFIX= TORCH_DEVICE_BACKEND_AUTOLOAD=1 PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 NPU: +--------------------------------------------------------------------------------------------------------+ | npu-smi 25.0.rc1.1 Version: 25.0.rc1.1 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 32 310P3 | OK | NA 45 0 / 0 | | 0 0 | 0000:02:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ | 96 310P3 | OK | NA 46 0 / 0 | | 0 1 | 0000:04:00.0 | 0 1851 / 21527 | +===============================+=================+======================================================+ | 32800 310P3 | OK | NA 47 0 / 0 | | 0 2 | 0000:82:00.0 | 0 1855 / 21527 | +===============================+=================+======================================================+ | 32896 310P3 | OK | NA 50 0 / 0 | | 0 3 | 0000:85:00.0 | 0 1850 / 21527 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | No running processes found in NPU 32 | +===============================+=================+======================================================+ | No running processes found in NPU 96 | +===============================+=================+======================================================+ | No running processes found in NPU 32800 | +===============================+=================+======================================================+ | No running processes found in NPU 32896 | +===============================+=================+======================================================+ CANN: package_name=Ascend-cann-toolkit version=8.1.RC1 innerversion=V100R001C21SPC001B238 compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux 复制

---

## 技术要点总结

基于以上内容，主要技术要点包括：

1. **问题类型**: 错误处理
2. **涉及技术**: HTTPS, PyTorch, GPU, NPU, MindIE, 昇腾, CANN, AI
3. **解决方案**: 请参考完整内容中的解决方案

## 相关资源

- 昇腾社区: https://www.hiascend.com/
- 昇腾论坛: https://www.hiascend.com/forum/

---

*本文档由AI自动生成，仅供参考。如有疑问，请参考原始帖子。*