- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

HarmonyOS APP开发：端云协同AI推理架构

Jack20 发表于 2026/06/21 14:35:50 2026/06/21

【摘要】 HarmonyOS APP开发：端云协同AI推理架构核心要点：端云协同推理让AI在设备端和云端之间智能调度，低延迟场景走端侧、复杂任务走云端，实现"鱼和熊掌兼得"的推理体验。本文深入讲解端云协同的决策引擎、动态分流策略以及在HarmonyOS APP中的完整实现。一、背景与动机你有没有遇到过这种情况？用手机上的AI助手识别一张图片，等了3秒结果才出来——因为模型跑在云端，网络一卡就完蛋。...

HarmonyOS APP开发：端云协同AI推理架构

核心要点：端云协同推理让AI在设备端和云端之间智能调度，低延迟场景走端侧、复杂任务走云端，实现"鱼和熊掌兼得"的推理体验。本文深入讲解端云协同的决策引擎、动态分流策略以及在HarmonyOS APP中的完整实现。

一、背景与动机

你有没有遇到过这种情况？用手机上的AI助手识别一张图片，等了3秒结果才出来——因为模型跑在云端，网络一卡就完蛋。又或者，用端侧模型做人脸解锁，虽然快，但光线暗一点就识别不了——因为端侧模型太小，精度不够。

这就是端云协同要解决的问题。

简单说，端云协同推理就是：简单的任务在设备上跑，复杂的任务交给云端，根据实时条件智能决策。就像公司里，简单的审批主管当场拍板，复杂的项目才需要上报CEO。这样既保证了响应速度，又不牺牲模型精度。

HarmonyOS的分布式能力让端云协同更加自然。手机、手表、平板、智慧屏可以组成一个"推理集群"——手表做人脸检测（轻量级），手机做人脸识别（中等），云端做人脸比对（高精度）。设备之间通过分布式软总线高速通信，延迟比传统网络低一个数量级。

二、核心原理

2.1 端云协同推理架构

端云协同推理架构由三个核心层组成：

决策引擎层：根据任务复杂度、网络状况、设备算力、电量等因素，决定推理在端侧还是云端执行
推理执行层：端侧推理引擎（MindSpore Lite / NNAPI）和云端推理服务（ModelArts / 自建服务）
结果融合层：对端云结果进行融合、校验和缓存

flowchart TB
    classDef primary fill:#4F46E5,stroke:#3730A3,color:#FFFFFF
    classDef warning fill:#F59E0B,stroke:#D97706,color:#FFFFFF
    classDef error fill:#EF4444,stroke:#DC2626,color:#FFFFFF
    classDef info fill:#06B6D4,stroke:#0891B2,color:#FFFFFF
    classDef purple fill:#8B5CF6,stroke:#7C3AED,color:#FFFFFF

    A[AI推理请求]:::primary --> B{决策引擎}:::warning
    B -->|简单任务 / 网络差| C[端侧推理]:::info
    B -->|复杂任务 / 网络好| D[云端推理]:::purple
    B -->|中等任务| E[端云并行]:::error
    C --> F[端侧结果]:::info
    D --> G[云端结果]:::purple
    E --> F
    E --> G
    F --> H{结果融合}:::warning
    G --> H
    H --> I[最终输出]:::primary

    J[网络监测]:::info -.-> B
    K[设备状态]:::info -.-> B
    L[任务分析]:::purple -.-> B
    M[缓存查询]:::warning -.-> B

2.2 智能分流决策模型

决策引擎是端云协同的"大脑"，它需要综合考虑多个因素：

决策因子	端侧倾向	云端倾向
任务复杂度	低（分类、检测）	高（生成、分割）
网络延迟	>200ms	<50ms
设备算力	NPU可用	NPU不可用
电量	>50%	<20%
隐私要求	高（人脸、健康）	低（通用识别）
请求频率	高频实时	低频批量

决策函数可以简化为：

$Score_{edge} = \sum_{i=1}^{n} w_i \cdot f_i(x_i)$

当 $Score_{edge} > \theta$ 时走端侧，否则走云端。权重 $w_i$ 可以通过历史数据在线学习调整。

2.3 端云并行推理

对于中等复杂度的任务，最聪明的做法是端云并行：同时发起端侧和云端推理，谁先返回用谁的结果。如果云端先返回，就用高精度的云端结果；如果端侧先返回且置信度够高，就直接用端侧结果，不等云端了。

这种"双保险"策略在弱网环境下特别有效——网络差的时候端侧兜底，网络好的时候云端提供更高精度。

2.4 结果缓存与预取

缓存是端云协同的"加速器"。对于重复性的推理请求（如同一张图片多次识别），直接返回缓存结果，省去推理开销。预取则更进一步——根据用户行为预测下一步可能需要的推理结果，提前在后台发起。

三、代码实战

3.1 端云协同决策引擎

这是整个架构的核心，负责根据实时条件决定推理路径。

// EdgeCloudDecisionEngine.ets
// 端云协同决策引擎 - 智能推理路径选择

import { connection } from '@kit.NetworkKit';
import { batteryInfo } from '@kit.BasicServicesKit';

// 推理路径枚举
export enum InferencePath {
  EDGE_ONLY = 'edge_only',         // 仅端侧
  CLOUD_ONLY = 'cloud_only',       // 仅云端
  EDGE_CLOUD_PARALLEL = 'parallel', // 端云并行
  EDGE_FIRST = 'edge_first',       // 端侧优先，失败回退云端
  CLOUD_FIRST = 'cloud_first'      // 云端优先，失败回退端侧
}

// 任务描述接口
export interface TaskDescriptor {
  taskType: string;           // 任务类型：classification/detection/generation/segmentation
  inputSize: number;          // 输入数据大小（字节）
  modelComplexity: number;    // 模型复杂度 0-1
  privacyLevel: number;       // 隐私等级 0-1（1=最高）
  latencyRequirement: number; // 延迟要求（毫秒）
  accuracyRequirement: number; // 精度要求 0-1
}

// 设备状态接口
interface DeviceStatus {
  cpuUsage: number;           // CPU使用率 0-1
  memoryAvailable: number;    // 可用内存（MB）
  batteryLevel: number;       // 电量 0-100
  isCharging: boolean;        // 是否充电中
  isNPUAvailable: boolean;    // NPU是否可用
  thermalStatus: number;      // 温度状态 0-3
}

// 网络状态接口
interface NetworkStatus {
  type: string;               // wifi/cellular/none
  latency: number;            // 网络延迟（毫秒）
  bandwidth: number;          // 带宽（Mbps）
  isStable: boolean;          // 网络是否稳定
}

// 决策结果接口
export interface DecisionResult {
  path: InferencePath;
  confidence: number;         // 决策置信度 0-1
  reason: string;             // 决策理由
  estimatedLatency: number;   // 预估延迟（毫秒）
  fallbackPath: InferencePath; // 回退路径
}

export class EdgeCloudDecisionEngine {
  // 决策权重配置
  private weights = {
    taskComplexity: 0.25,
    networkQuality: 0.20,
    deviceCapability: 0.20,
    privacyRequirement: 0.15,
    latencyRequirement: 0.10,
    batteryStatus: 0.10
  };

  // 端侧阈值：得分高于此值倾向端侧
  private edgeThreshold: number = 0.55;

  // 缓存最近一次的设备/网络状态
  private lastDeviceStatus: DeviceStatus | null = null;
  private lastNetworkStatus: NetworkStatus | null = null;

  // 历史决策记录（用于在线学习）
  private decisionHistory: Array<{
    task: TaskDescriptor;
    decision: DecisionResult;
    actualLatency: number;
    success: boolean;
  }> = [];

  // 执行推理路径决策
  decide(task: TaskDescriptor): DecisionResult {
    // 获取当前设备状态
    const deviceStatus = this.getDeviceStatus();
    const networkStatus = this.getNetworkStatus();

    // 计算各因子得分
    const complexityScore = this.calcComplexityScore(task);
    const networkScore = this.calcNetworkScore(networkStatus);
    const deviceScore = this.calcDeviceScore(deviceStatus);
    const privacyScore = this.calcPrivacyScore(task);
    const latencyScore = this.calcLatencyScore(task, networkStatus);
    const batteryScore = this.calcBatteryScore(deviceStatus);

    // 加权求和
    const edgeScore =
      this.weights.taskComplexity * (1 - complexityScore) +  // 复杂度低→端侧
      this.weights.networkQuality * (1 - networkScore) +     // 网络差→端侧
      this.weights.deviceCapability * deviceScore +           // 设备强→端侧
      this.weights.privacyRequirement * privacyScore +        // 隐私高→端侧
      this.weights.latencyRequirement * (1 - latencyScore) + // 延迟要求高→端侧
      this.weights.batteryStatus * batteryScore;              // 电量足→端侧

    // 根据综合得分选择推理路径
    let path: InferencePath;
    let reason: string;
    let estimatedLatency: number;
    let fallbackPath: InferencePath;

    if (edgeScore > this.edgeThreshold + 0.15) {
      // 明确倾向端侧
      path = InferencePath.EDGE_ONLY;
      reason = `端侧优势明显(得分${edgeScore.toFixed(2)})`;
      estimatedLatency = this.estimateEdgeLatency(task, deviceStatus);
      fallbackPath = InferencePath.CLOUD_ONLY;
    } else if (edgeScore < this.edgeThreshold - 0.15) {
      // 明确倾向云端
      path = InferencePath.CLOUD_ONLY;
      reason = `云端优势明显(得分${edgeScore.toFixed(2)})`;
      estimatedLatency = this.estimateCloudLatency(task, networkStatus);
      fallbackPath = InferencePath.EDGE_ONLY;
    } else {
      // 得分接近，走端云并行
      path = InferencePath.EDGE_CLOUD_PARALLEL;
      reason = `端云势均力敌(得分${edgeScore.toFixed(2)})，并行取优`;
      estimatedLatency = Math.min(
        this.estimateEdgeLatency(task, deviceStatus),
        this.estimateCloudLatency(task, networkStatus)
      );
      fallbackPath = InferencePath.EDGE_ONLY;
    }

    // 特殊情况覆盖
    if (networkStatus.type === 'none') {
      path = InferencePath.EDGE_ONLY;
      reason = '无网络连接，强制端侧推理';
      fallbackPath = InferencePath.EDGE_ONLY;
    } else if (task.privacyLevel > 0.8 && path === InferencePath.CLOUD_ONLY) {
      path = InferencePath.EDGE_FIRST;
      reason = '高隐私任务，端侧优先';
      fallbackPath = InferencePath.EDGE_ONLY;
    } else if (task.latencyRequirement < 50 && networkStatus.latency > 100) {
      path = InferencePath.EDGE_ONLY;
      reason = '低延迟要求+高网络延迟，强制端侧';
      fallbackPath = InferencePath.EDGE_ONLY;
    }

    const confidence = Math.abs(edgeScore - this.edgeThreshold) / 0.5;

    const result: DecisionResult = {
      path,
      confidence: Math.min(1, confidence),
      reason,
      estimatedLatency,
      fallbackPath
    };

    console.info(`[DecisionEngine] 决策: ${path}, 理由: ${reason}, 置信度: ${confidence.toFixed(2)}`);
    return result;
  }

  // 计算任务复杂度得分（越高越复杂→越倾向云端）
  private calcComplexityScore(task: TaskDescriptor): number {
    const complexityMap: Record<string, number> = {
      'classification': 0.2,
      'detection': 0.4,
      'segmentation': 0.7,
      'generation': 0.9
    };
    return complexityMap[task.taskType] || task.modelComplexity;
  }

  // 计算网络质量得分（越高→网络越好→倾向云端）
  private calcNetworkScore(network: NetworkStatus): number {
    if (network.type === 'none') return 0;
    let score = 0;
    // 延迟评分
    if (network.latency < 30) score += 0.4;
    else if (network.latency < 100) score += 0.25;
    else if (network.latency < 300) score += 0.1;
    // 带宽评分
    if (network.bandwidth > 50) score += 0.3;
    else if (network.bandwidth > 10) score += 0.2;
    else score += 0.05;
    // 稳定性评分
    score += network.isStable ? 0.3 : 0.1;
    return Math.min(1, score);
  }

  // 计算设备能力得分（越高→设备越强→倾向端侧）
  private calcDeviceScore(device: DeviceStatus): number {
    let score = 0;
    score += device.isNPUAvailable ? 0.4 : 0;
    score += device.cpuUsage < 0.5 ? 0.2 : 0.05;
    score += device.memoryAvailable > 2048 ? 0.2 : 0.05;
    score += device.thermalStatus < 2 ? 0.2 : 0;
    return Math.min(1, score);
  }

  // 计算隐私得分（越高→隐私要求越高→倾向端侧）
  private calcPrivacyScore(task: TaskDescriptor): number {
    return task.privacyLevel;
  }

  // 计算延迟得分（越高→延迟要求越宽松→倾向云端）
  private calcLatencyScore(task: TaskDescriptor, network: NetworkStatus): number {
    if (task.latencyRequirement < 50) return 0;   // 极低延迟→端侧
    if (task.latencyRequirement < 200) return 0.3; // 低延迟→倾向端侧
    if (task.latencyRequirement < 1000) return 0.6; // 中等延迟→均可
    return 1.0;                                     // 无延迟要求→倾向云端
  }

  // 计算电量得分（越高→电量越充足→倾向端侧）
  private calcBatteryScore(device: DeviceStatus): number {
    if (device.isCharging) return 1.0;
    if (device.batteryLevel > 60) return 0.8;
    if (device.batteryLevel > 30) return 0.4;
    return 0.1; // 低电量→倾向云端（省电）
  }

  // 预估端侧推理延迟
  private estimateEdgeLatency(task: TaskDescriptor, device: DeviceStatus): number {
    const baseLatency = task.modelComplexity * 200; // 基础延迟
    const npuFactor = device.isNPUAvailable ? 0.3 : 1.0; // NPU加速3倍
    const thermalFactor = 1 + device.thermalStatus * 0.2; // 降频惩罚
    return Math.round(baseLatency * npuFactor * thermalFactor);
  }

  // 预估云端推理延迟
  private estimateCloudLatency(task: TaskDescriptor, network: NetworkStatus): number {
    const computeLatency = task.modelComplexity * 50; // 云端计算快
    const transferLatency = (task.inputSize / 1024 / 1024) / network.bandwidth * 1000; // 传输延迟
    const networkLatency = network.latency * 2; // 往返延迟
    return Math.round(computeLatency + transferLatency + networkLatency);
  }

  // 获取设备状态
  private getDeviceStatus(): DeviceStatus {
    try {
      const battery = batteryInfo.getBatteryInfo();
      return {
        cpuUsage: 0.3, // 简化：实际应通过系统API获取
        memoryAvailable: 4096,
        batteryLevel: battery.level,
        isCharging: battery.chargingStatus === batteryInfo.BatteryChargeState.ENABLE,
        isNPUAvailable: true, // 简化：实际应检测NPU
        thermalStatus: 0
      };
    } catch (error) {
      return {
        cpuUsage: 0.5,
        memoryAvailable: 2048,
        batteryLevel: 50,
        isCharging: false,
        isNPUAvailable: false,
        thermalStatus: 0
      };
    }
  }

  // 获取网络状态
  private getNetworkStatus(): NetworkStatus {
    try {
      const netConnection = connection.createNetConnection();
      // 简化：实际应异步获取网络状态
      return {
        type: 'wifi',
        latency: 30,
        bandwidth: 50,
        isStable: true
      };
    } catch (error) {
      return {
        type: 'none',
        latency: 999,
        bandwidth: 0,
        isStable: false
      };
    }
  }

  // 记录决策结果（用于在线学习）
  recordDecision(task: TaskDescriptor, decision: DecisionResult,
    actualLatency: number, success: boolean): void {
    this.decisionHistory.push({ task, decision, actualLatency, success });
    // 保留最近100条记录
    if (this.decisionHistory.length > 100) {
      this.decisionHistory.shift();
    }
    // 简单的在线学习：调整阈值
    this.adaptThreshold();
  }

  // 自适应阈值调整
  private adaptThreshold(): void {
    if (this.decisionHistory.length < 10) return;

    const recentDecisions = this.decisionHistory.slice(-20);
    const edgeSuccessRate = recentDecisions
      .filter(d => d.decision.path === InferencePath.EDGE_ONLY)
      .filter(d => d.success)
      .length / Math.max(1, recentDecisions
        .filter(d => d.decision.path === InferencePath.EDGE_ONLY).length);

    // 如果端侧成功率低，提高阈值（更倾向云端）
    if (edgeSuccessRate < 0.7) {
      this.edgeThreshold = Math.min(0.7, this.edgeThreshold + 0.02);
    } else if (edgeSuccessRate > 0.9) {
      // 如果端侧成功率高，降低阈值（更倾向端侧）
      this.edgeThreshold = Math.max(0.3, this.edgeThreshold - 0.01);
    }
  }
}

3.2 端云协同推理执行器

负责实际的推理执行，支持端侧、云端和并行三种模式。

// EdgeCloudInferenceExecutor.ets
// 端云协同推理执行器 - 统一的推理执行接口

import { http } from '@kit.NetworkKit';
import {
  EdgeCloudDecisionEngine,
  InferencePath,
  TaskDescriptor,
  DecisionResult
} from './EdgeCloudDecisionEngine';

// 推理结果接口
export interface InferenceResult {
  data: Record<string, Object>;    // 推理输出
  confidence: number;               // 置信度
  latency: number;                  // 实际延迟（毫秒）
  path: InferencePath;              // 实际推理路径
  fromCache: boolean;               // 是否来自缓存
  timestamp: number;                // 时间戳
}

// 端侧推理引擎接口
interface EdgeInferenceEngine {
  predict(input: Float32Array): Promise<Record<string, Object>>;
  isReady(): boolean;
  getModelInfo(): { name: string; version: string; inputSize: number };
}

// 云端推理服务接口
interface CloudInferenceConfig {
  endpoint: string;
  apiKey: string;
  timeout: number;
  retryCount: number;
}

export class EdgeCloudInferenceExecutor {
  private decisionEngine: EdgeCloudDecisionEngine;
  private edgeEngine: EdgeInferenceEngine | null = null;
  private cloudConfig: CloudInferenceConfig;
  private resultCache: Map<string, InferenceResult> = new Map();
  private maxCacheSize: number = 100;

  constructor(cloudConfig: CloudInferenceConfig) {
    this.decisionEngine = new EdgeCloudDecisionEngine();
    this.cloudConfig = cloudConfig;
  }

  // 注册端侧推理引擎
  registerEdgeEngine(engine: EdgeInferenceEngine): void {
    this.edgeEngine = engine;
    console.info(`[InferenceExecutor] 端侧引擎已注册: ${engine.getModelInfo().name}`);
  }

  // 统一推理入口
  async infer(input: Float32Array, task: TaskDescriptor): Promise<InferenceResult> {
    const startTime = Date.now();

    // 第一步：检查缓存
    const cacheKey = this.generateCacheKey(input, task);
    const cachedResult = this.resultCache.get(cacheKey);
    if (cachedResult && Date.now() - cachedResult.timestamp < 300000) { // 5分钟缓存有效期
      console.info('[InferenceExecutor] 命中缓存');
      return { ...cachedResult, fromCache: true };
    }

    // 第二步：决策引擎选择推理路径
    const decision = this.decisionEngine.decide(task);
    console.info(`[InferenceExecutor] 推理路径: ${decision.path}, 理由: ${decision.reason}`);

    // 第三步：执行推理
    let result: InferenceResult;
    try {
      switch (decision.path) {
        case InferencePath.EDGE_ONLY:
          result = await this.inferOnEdge(input, task, decision);
          break;
        case InferencePath.CLOUD_ONLY:
          result = await this.inferOnCloud(input, task, decision);
          break;
        case InferencePath.EDGE_CLOUD_PARALLEL:
          result = await this.inferParallel(input, task, decision);
          break;
        case InferencePath.EDGE_FIRST:
          result = await this.inferEdgeFirst(input, task, decision);
          break;
        case InferencePath.CLOUD_FIRST:
          result = await this.inferCloudFirst(input, task, decision);
          break;
        default:
          result = await this.inferOnEdge(input, task, decision);
      }
    } catch (error) {
      // 推理失败，尝试回退路径
      console.error(`[InferenceExecutor] 推理失败: ${error}, 尝试回退路径: ${decision.fallbackPath}`);
      result = await this.fallbackInfer(input, task, decision.fallbackPath);
    }

    // 第四步：记录决策结果（用于在线学习）
    const actualLatency = Date.now() - startTime;
    this.decisionEngine.recordDecision(task, decision, actualLatency, true);

    // 第五步：缓存结果
    this.cacheResult(cacheKey, result);

    return result;
  }

  // 端侧推理
  private async inferOnEdge(input: Float32Array, task: TaskDescriptor,
    decision: DecisionResult): Promise<InferenceResult> {
    if (!this.edgeEngine || !this.edgeEngine.isReady()) {
      throw new Error('端侧推理引擎不可用');
    }

    const startTime = Date.now();
    const output = await this.edgeEngine.predict(input);
    const latency = Date.now() - startTime;

    return {
      data: output,
      confidence: 0.85, // 端侧模型通常精度略低
      latency,
      path: InferencePath.EDGE_ONLY,
      fromCache: false,
      timestamp: Date.now()
    };
  }

  // 云端推理
  private async inferOnCloud(input: Float32Array, task: TaskDescriptor,
    decision: DecisionResult): Promise<InferenceResult> {
    const startTime = Date.now();

    // 将输入数据编码为Base64
    const inputBase64 = this.float32ToBase64(input);

    // 发送推理请求
    const request = http.createHttp();
    const response = await request.request(this.cloudConfig.endpoint, {
      method: http.RequestMethod.POST,
      header: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${this.cloudConfig.apiKey}`
      },
      extraData: JSON.stringify({
        task_type: task.taskType,
        input_data: inputBase64,
        model_version: 'latest'
      }),
      expectDataType: http.HttpDataType.STRING,
      connectTimeout: this.cloudConfig.timeout,
      readTimeout: this.cloudConfig.timeout
    });

    const latency = Date.now() - startTime;

    if (response.responseCode !== 200) {
      throw new Error(`云端推理失败，状态码: ${response.responseCode}`);
    }

    const result = JSON.parse(response.result as string);

    return {
      data: result.output,
      confidence: result.confidence || 0.95,
      latency,
      path: InferencePath.CLOUD_ONLY,
      fromCache: false,
      timestamp: Date.now()
    };
  }

  // 端云并行推理 - 谁先返回用谁
  private async inferParallel(input: Float32Array, task: TaskDescriptor,
    decision: DecisionResult): Promise<InferenceResult> {
    const startTime = Date.now();

    // 同时发起端侧和云端推理
    const edgePromise = this.inferOnEdge(input, task, decision)
      .catch(_ => null as InferenceResult | null);
    const cloudPromise = this.inferOnCloud(input, task, decision)
      .catch(_ => null as InferenceResult | null);

    // 使用Promise.race获取最快结果
    // 但我们不用race，因为想等两边都完成来选更好的结果
    const edgeResult = await edgePromise;
    const cloudResult = await cloudPromise;

    // 选择策略：如果端侧结果置信度够高且延迟低，优先用端侧
    // 否则用云端结果（精度更高）
    if (edgeResult && cloudResult) {
      if (edgeResult.confidence > 0.9 && edgeResult.latency < cloudResult.latency) {
        console.info('[InferenceExecutor] 并行推理：选择端侧结果（高置信度+低延迟）');
        return { ...edgeResult, path: InferencePath.EDGE_CLOUD_PARALLEL };
      } else {
        console.info('[InferenceExecutor] 并行推理：选择云端结果（更高精度）');
        return { ...cloudResult, path: InferencePath.EDGE_CLOUD_PARALLEL };
      }
    }

    // 只有一边成功
    const finalResult = edgeResult || cloudResult;
    if (!finalResult) {
      throw new Error('端云并行推理均失败');
    }
    return { ...finalResult, path: InferencePath.EDGE_CLOUD_PARALLEL };
  }

  // 端侧优先推理
  private async inferEdgeFirst(input: Float32Array, task: TaskDescriptor,
    decision: DecisionResult): Promise<InferenceResult> {
    try {
      const edgeResult = await this.inferOnEdge(input, task, decision);
      if (edgeResult.confidence > 0.7) {
        return edgeResult;
      }
      // 端侧置信度不够，回退云端
      console.info('[InferenceExecutor] 端侧置信度不足，回退云端');
      return await this.inferOnCloud(input, task, decision);
    } catch (error) {
      return await this.inferOnCloud(input, task, decision);
    }
  }

  // 云端优先推理
  private async inferCloudFirst(input: Float32Array, task: TaskDescriptor,
    decision: DecisionResult): Promise<InferenceResult> {
    try {
      return await this.inferOnCloud(input, task, decision);
    } catch (error) {
      console.info('[InferenceExecutor] 云端推理失败，回退端侧');
      return await this.inferOnEdge(input, task, decision);
    }
  }

  // 回退推理
  private async fallbackInfer(input: Float32Array, task: TaskDescriptor,
    fallbackPath: InferencePath): Promise<InferenceResult> {
    const fallbackDecision: DecisionResult = {
      path: fallbackPath,
      confidence: 0.5,
      reason: '回退推理',
      estimatedLatency: 0,
      fallbackPath: InferencePath.EDGE_ONLY
    };

    if (fallbackPath === InferencePath.EDGE_ONLY) {
      return await this.inferOnEdge(input, task, fallbackDecision);
    } else {
      return await this.inferOnCloud(input, task, fallbackDecision);
    }
  }

  // 生成缓存键
  private generateCacheKey(input: Float32Array, task: TaskDescriptor): string {
    // 简化：使用输入数据的哈希和任务类型作为缓存键
    let hash = 0;
    for (let i = 0; i < Math.min(input.length, 100); i++) {
      hash = ((hash << 5) - hash + Math.round(input[i] * 1000)) | 0;
    }
    return `${task.taskType}_${hash}`;
  }

  // 缓存推理结果
  private cacheResult(key: string, result: InferenceResult): void {
    if (this.resultCache.size >= this.maxCacheSize) {
      // 删除最早的缓存
      const firstKey = this.resultCache.keys().next().value;
      if (firstKey) this.resultCache.delete(firstKey);
    }
    this.resultCache.set(key, result);
  }

  // Float32Array转Base64（简化实现）
  private float32ToBase64(array: Float32Array): string {
    const bytes = new Uint8Array(array.buffer);
    let binary = '';
    for (let i = 0; i < bytes.length; i++) {
      binary += String.fromCharCode(bytes[i]);
    }
    return btoa(binary);
  }

  // 清除缓存
  clearCache(): void {
    this.resultCache.clear();
    console.info('[InferenceExecutor] 缓存已清除');
  }
}

3.3 端云协同推理完整UI

将决策引擎和推理执行器集成到HarmonyOS APP中，提供直观的推理路径可视化和状态监控。

// EdgeCloudInferencePage.ets
// 端云协同推理页面 - 可视化推理路径与状态监控

import {
  EdgeCloudInferenceExecutor,
  InferenceResult,
  CloudInferenceConfig
} from './EdgeCloudInferenceExecutor';
import {
  EdgeCloudDecisionEngine,
  InferencePath,
  TaskDescriptor
} from './EdgeCloudDecisionEngine';

@Entry
@Component
struct EdgeCloudInferencePage {
  @State currentPath: string = '待决策';
  @State inferenceStatus: string = '空闲';
  @State lastLatency: number = 0;
  @State lastConfidence: number = 0;
  @State edgeCallCount: number = 0;
  @State cloudCallCount: number = 0;
  @State parallelCallCount: number = 0;
  @State cacheHitCount: number = 0;
  @State avgLatency: number = 0;
  @State networkType: string = 'WiFi';
  @State batteryLevel: number = 85;
  @State isNPUAvailable: boolean = true;
  @State selectedTaskType: number = 0;

  // 推理历史
  @State inferenceHistory: Array<{
    time: string;
    path: string;
    latency: number;
    confidence: number;
  }> = [];

  private taskTypes: string[] = ['图像分类', '目标检测', '语义分割', '文本生成'];
  private executor: EdgeCloudInferenceExecutor | null = null;
  private latencySum: number = 0;
  private totalCalls: number = 0;

  aboutToAppear() {
    const cloudConfig: CloudInferenceConfig = {
      endpoint: 'https://ai-api.example.com/v1/inference',
      apiKey: 'your-api-key',
      timeout: 5000,
      retryCount: 2
    };
    this.executor = new EdgeCloudInferenceExecutor(cloudConfig);
  }

  build() {
    Navigation() {
      Scroll() {
        Column({ space: 16 }) {
          // 顶部状态栏
          this.StatusBar()

          // 推理路径可视化
          this.PathVisualization()

          // 任务选择
          this.TaskSelector()

          // 操作按钮
          this.ActionButtons()

          // 性能统计
          this.PerformanceStats()

          // 推理历史
          this.InferenceHistory()
        }
        .width('100%')
        .padding(16)
      }
      .width('100%')
      .height('100%')
    }
    .title('端云协同推理')
    .titleMode(NavigationTitleMode.Mini)
  }

  // 顶部状态栏
  @Builder StatusBar() {
    Row({ space: 12 }) {
      // 网络状态
      Column({ space: 2 }) {
        Text(this.networkType === 'WiFi' ? '📶' : '📡')
          .fontSize(20)
        Text(this.networkType)
          .fontSize(10)
          .fontColor('#64748B')
      }
      .padding(8)
      .borderRadius(8)
      .backgroundColor('#F0F9FF')

      // 电量
      Column({ space: 2 }) {
        Text('🔋')
          .fontSize(20)
        Text(`${this.batteryLevel}%`)
          .fontSize(10)
          .fontColor('#64748B')
      }
      .padding(8)
      .borderRadius(8)
      .backgroundColor('#F0FDF4')

      // NPU
      Column({ space: 2 }) {
        Text(this.isNPUAvailable ? '⚡' : '💤')
          .fontSize(20)
        Text(this.isNPUAvailable ? 'NPU就绪' : 'NPU休眠')
          .fontSize(10)
          .fontColor(this.isNPUAvailable ? '#10B981' : '#94A3B8')
      }
      .padding(8)
      .borderRadius(8)
      .backgroundColor(this.isNPUAvailable ? '#ECFDF5' : '#F8FAFC')

      // 当前路径
      Column({ space: 2 }) {
        Text('🧭')
          .fontSize(20)
        Text(this.currentPath)
          .fontSize(10)
          .fontColor('#4F46E5')
          .fontWeight(FontWeight.Bold)
      }
      .padding(8)
      .borderRadius(8)
      .backgroundColor('#EEF2FF')
    }
    .width('100%')
    .justifyContent(FlexAlign.SpaceAround)
  }

  // 推理路径可视化
  @Builder PathVisualization() {
    Column({ space: 12 }) {
      Text('🔀 推理路径')
        .fontSize(16)
        .fontWeight(FontWeight.Bold)
        .fontColor('#1E293B')
        .width('100%')

      // 端云协同流程图
      Row({ space: 8 }) {
        // 端侧
        Column({ space: 4 }) {
          Text('📱')
            .fontSize(28)
          Text('端侧推理')
            .fontSize(12)
            .fontColor('#475569')
          Text('MindSpore Lite')
            .fontSize(10)
            .fontColor('#94A3B8')
        }
        .padding(12)
        .borderRadius(12)
        .backgroundColor(this.currentPath.includes('端侧') ? '#EEF2FF' : '#F8FAFC')
        .border({
          width: this.currentPath.includes('端侧') ? 2 : 1,
          color: this.currentPath.includes('端侧') ? '#4F46E5' : '#E2E8F0'
        })
        .layoutWeight(1)
        .alignItems(HorizontalAlign.Center)

        // 箭头
        Column() {
          Text('⇌')
            .fontSize(24)
            .fontColor('#94A3B8')
        }
        .justifyContent(FlexAlign.Center)

        // 云端
        Column({ space: 4 }) {
          Text('☁️')
            .fontSize(28)
          Text('云端推理')
            .fontSize(12)
            .fontColor('#475569')
          Text('ModelArts')
            .fontSize(10)
            .fontColor('#94A3B8')
        }
        .padding(12)
        .borderRadius(12)
        .backgroundColor(this.currentPath.includes('云端') ? '#F5F3FF' : '#F8FAFC')
        .border({
          width: this.currentPath.includes('云端') ? 2 : 1,
          color: this.currentPath.includes('云端') ? '#8B5CF6' : '#E2E8F0'
        })
        .layoutWeight(1)
        .alignItems(HorizontalAlign.Center)
      }
      .width('100%')

      // 当前推理状态
      Row() {
        Text('状态:')
          .fontSize(13)
          .fontColor('#64748B')
        Text(this.inferenceStatus)
          .fontSize(13)
          .fontWeight(FontWeight.Bold)
          .fontColor(this.inferenceStatus === '推理完成' ? '#10B981' : '#F59E0B')
      }
      .width('100%')
    }
    .width('100%')
    .padding(16)
    .borderRadius(16)
    .backgroundColor('#FFFFFF')
    .shadow({ radius: 8, color: 'rgba(0,0,0,0.06)', offsetX: 0, offsetY: 2 })
  }

  // 任务选择器
  @Builder TaskSelector() {
    Column({ space: 12 }) {
      Text('🎯 任务类型')
        .fontSize(16)
        .fontWeight(FontWeight.Bold)
        .fontColor('#1E293B')
        .width('100%')

      Row({ space: 8 }) {
        ForEach(this.taskTypes, (type: string, index: number) => {
          Button(type)
            .fontSize(13)
            .fontColor(this.selectedTaskType === index ? '#FFFFFF' : '#475569')
            .backgroundColor(this.selectedTaskType === index ? '#4F46E5' : '#F1F5F9')
            .borderRadius(8)
            .layoutWeight(1)
            .height(36)
            .onClick(() => {
              this.selectedTaskType = index;
            })
        }, (type: string, index: number) => `${index}`)
      }
      .width('100%')
    }
    .width('100%')
    .padding(16)
    .borderRadius(16)
    .backgroundColor('#FFFFFF')
    .shadow({ radius: 8, color: 'rgba(0,0,0,0.06)', offsetX: 0, offsetY: 2 })
  }

  // 操作按钮
  @Builder ActionButtons() {
    Row({ space: 12 }) {
      Button('🚀 执行推理')
        .fontSize(16)
        .fontWeight(FontWeight.Bold)
        .fontColor('#FFFFFF')
        .backgroundColor('#4F46E5')
        .borderRadius(12)
        .layoutWeight(1)
        .height(48)
        .onClick(() => this.runInference())

      Button('🗑️ 清除缓存')
        .fontSize(14)
        .fontColor('#64748B')
        .backgroundColor('#F1F5F9')
        .borderRadius(12)
        .width(100)
        .height(48)
        .onClick(() => {
          this.executor?.clearCache();
          this.cacheHitCount = 0;
        })
    }
    .width('100%')
  }

  // 性能统计
  @Builder PerformanceStats() {
    Column({ space: 12 }) {
      Text('📊 性能统计')
        .fontSize(16)
        .fontWeight(FontWeight.Bold)
        .fontColor('#1E293B')
        .width('100%')

      Row({ space: 8 }) {
        this.StatCard('端侧调用', `${this.edgeCallCount}`, '#4F46E5')
        this.StatCard('云端调用', `${this.cloudCallCount}`, '#8B5CF6')
        this.StatCard('并行调用', `${this.parallelCallCount}`, '#06B6D4')
      }
      .width('100%')

      Row({ space: 8 }) {
        this.StatCard('缓存命中', `${this.cacheHitCount}`, '#10B981')
        this.StatCard('平均延迟', `${this.avgLatency}ms`, '#F59E0B')
        this.StatCard('最近置信度', `${(this.lastConfidence * 100).toFixed(0)}%`, '#EF4444')
      }
      .width('100%')
    }
    .width('100%')
    .padding(16)
    .borderRadius(16)
    .backgroundColor('#FFFFFF')
    .shadow({ radius: 8, color: 'rgba(0,0,0,0.06)', offsetX: 0, offsetY: 2 })
  }

  // 统计卡片
  @Builder StatCard(label: string, value: string, color: string) {
    Column({ space: 4 }) {
      Text(value)
        .fontSize(18)
        .fontWeight(FontWeight.Bold)
        .fontColor(color)
      Text(label)
        .fontSize(11)
        .fontColor('#94A3B8')
    }
    .padding(8)
    .borderRadius(8)
    .backgroundColor('#F8FAFC')
    .layoutWeight(1)
    .alignItems(HorizontalAlign.Center)
  }

  // 推理历史
  @Builder InferenceHistory() {
    Column({ space: 8 }) {
      Text('📋 推理历史')
        .fontSize(16)
        .fontWeight(FontWeight.Bold)
        .fontColor('#1E293B')
        .width('100%')

      if (this.inferenceHistory.length === 0) {
        Text('暂无推理记录')
          .fontSize(13)
          .fontColor('#94A3B8')
          .width('100%')
          .textAlign(TextAlign.Center)
          .padding(20)
      } else {
        List({ space: 4 }) {
          ForEach(this.inferenceHistory, (item: {
            time: string; path: string; latency: number; confidence: number;
          }, index: number) => {
            ListItem() {
              Row() {
                Text(item.time)
                  .fontSize(11)
                  .fontColor('#94A3B8')
                  .width(60)
                Text(item.path)
                  .fontSize(12)
                  .fontColor('#475569')
                  .layoutWeight(1)
                Text(`${item.latency}ms`)
                  .fontSize(11)
                  .fontColor('#F59E0B')
                  .width(50)
                  .textAlign(TextAlign.End)
                Text(`${(item.confidence * 100).toFixed(0)}%`)
                  .fontSize(11)
                  .fontColor('#10B981')
                  .width(35)
                  .textAlign(TextAlign.End)
              }
              .width('100%')
              .padding({ left: 8, right: 8, top: 4, bottom: 4 })
            }
          }, (item: Object, index: number) => `${index}`)
        }
        .width('100%')
        .height(160)
        .borderRadius(8)
        .backgroundColor('#F8FAFC')
        .padding(4)
      }
    }
    .width('100%')
    .padding(16)
    .borderRadius(16)
    .backgroundColor('#FFFFFF')
    .shadow({ radius: 8, color: 'rgba(0,0,0,0.06)', offsetX: 0, offsetY: 2 })
  }

  // 执行推理
  private async runInference() {
    if (!this.executor) return;

    this.inferenceStatus = '推理中...';

    const taskTypeMap: Record<string, string> = {
      '图像分类': 'classification',
      '目标检测': 'detection',
      '语义分割': 'segmentation',
      '文本生成': 'generation'
    };

    const complexityMap: Record<string, number> = {
      'classification': 0.2,
      'detection': 0.4,
      'segmentation': 0.7,
      'generation': 0.9
    };

    const taskType = taskTypeMap[this.taskTypes[this.selectedTaskType]] || 'classification';

    const task: TaskDescriptor = {
      taskType,
      inputSize: 1024 * 100,
      modelComplexity: complexityMap[taskType] || 0.5,
      privacyLevel: 0.3,
      latencyRequirement: 200,
      accuracyRequirement: 0.8
    };

    // 模拟输入数据
    const input = new Float32Array(100);
    for (let i = 0; i < 100; i++) {
      input[i] = Math.random();
    }

    try {
      const result = await this.executor.infer(input, task);

      // 更新UI状态
      this.currentPath = this.getPathDisplayName(result.path);
      this.lastLatency = result.latency;
      this.lastConfidence = result.confidence;
      this.inferenceStatus = '推理完成';

      // 更新统计
      if (result.path === InferencePath.EDGE_ONLY) this.edgeCallCount++;
      else if (result.path === InferencePath.CLOUD_ONLY) this.cloudCallCount++;
      else this.parallelCallCount++;
      if (result.fromCache) this.cacheHitCount++;

      this.totalCalls++;
      this.latencySum += result.latency;
      this.avgLatency = Math.round(this.latencySum / this.totalCalls);

      // 添加历史记录
      this.inferenceHistory.unshift({
        time: new Date().toLocaleTimeString(),
        path: this.getPathDisplayName(result.path),
        latency: result.latency,
        confidence: result.confidence
      });
      if (this.inferenceHistory.length > 20) {
        this.inferenceHistory.pop();
      }
    } catch (error) {
      this.inferenceStatus = '推理失败';
      console.error(`推理失败: ${error}`);
    }
  }

  // 获取路径显示名称
  private getPathDisplayName(path: InferencePath): string {
    const nameMap: Record<string, string> = {
      'edge_only': '端侧推理',
      'cloud_only': '云端推理',
      'parallel': '端云并行',
      'edge_first': '端侧优先',
      'cloud_first': '云端优先'
    };
    return nameMap[path] || path;
  }
}

四、踩坑与注意事项

坑1：云端推理超时导致UI卡死

问题：云端推理是网络请求，如果直接在主线程调用，一旦网络超时（默认60秒），UI就冻住了。用户等不了60秒，3秒没反应就会以为APP挂了。

解决方案：所有推理调用必须异步执行，并设置合理的超时时间。同时提供"取消推理"功能，让用户可以主动中断。

// 带超时和取消的推理调用
async inferWithTimeout(input: Float32Array, task: TaskDescriptor,
  timeoutMs: number = 5000): Promise<InferenceResult> {
  // 创建AbortController用于取消
  const abortController = new AbortController();

  const timeoutPromise = new Promise<never>((_, reject) => {
    setTimeout(() => {
      abortController.abort();
      reject(new Error('推理超时'));
    }, timeoutMs);
  });

  const inferPromise = this.executor!.infer(input, task);

  // 超时和推理竞争
  return Promise.race([inferPromise, timeoutPromise]);
}

坑2：端云模型版本不一致

问题：端侧模型是v1.2，云端模型已经更新到v2.0，两边输出的特征维度或语义不同，融合结果完全错误。

解决方案：在推理请求中携带模型版本号，云端根据版本号选择对应模型。或者约定版本兼容策略——v2.0必须向后兼容v1.2的输出格式。

坑3：缓存导致隐私风险

问题：推理结果缓存在设备上，如果缓存了人脸识别结果，手机被root后缓存文件可能泄露用户身份信息。

解决方案：敏感推理结果不缓存，或使用设备安全存储（HarmonyOS的HUKS密钥管理服务）加密缓存。

坑4：网络切换导致推理中断

问题：WiFi切蜂窝时，正在进行的云端推理TCP连接断开，请求失败。

解决方案：监听网络变化事件，网络切换时自动重试推理请求，或临时切换到端侧推理。

坑5：端云并行时的资源竞争

问题：端云并行推理时，端侧推理占用NPU和内存，同时云端推理占用网络带宽和CPU（序列化数据），可能导致设备发热和卡顿。

解决方案：限制并行推理的并发度，设置资源使用上限。在低电量或高温时自动降级为串行推理。

五、HarmonyOS 6适配

5.1 API差异

功能	HarmonyOS 5.0	HarmonyOS 6 Beta
AI推理	`@ohos.ai.mindSpore`	`@ohos.ai.inferenceEngine`（重构）
网络监测	`@ohos.net.connection`	新增网络质量实时评估API
设备状态	分散在各模块	统一`DeviceStatusManager`
安全存储	`@ohos.security.huks`	新增AI推理结果专用安全缓存
分布式通信	`@ohos.data.distributedData`	新增推理数据专用高速通道

5.2 迁移指南

// HarmonyOS 5.0 - MindSpore推理
import mindSpore from '@ohos.ai.mindSpore';
const model = await mindSpore.createModel(modelPath);

// HarmonyOS 6 - 统一推理引擎
import { inferenceEngine } from '@kit.AiKit';
const model = await inferenceEngine.loadModel({
  uri: modelPath,
  accelerator: inferenceEngine.Accelerator.NPU, // 指定NPU加速
  precision: inferenceEngine.Precision.FP16     // FP16精度
});

// 新增：网络质量实时评估
import { networkQuality } from '@kit.NetworkKit';
const quality = await networkQuality.assess();
// quality.score: 0-100 网络质量评分
// quality.recommendPath: 'edge' | 'cloud' | 'auto' 推荐推理路径

5.3 HarmonyOS 6新增特性

智能推理调度器：系统级端云协同调度，自动选择最优推理路径
推理结果安全缓存：基于TEE的加密缓存，防泄露
分布式推理集群：多设备组成推理集群，手机NPU+平板NPU协同计算
网络质量预测：基于历史数据预测未来30秒网络质量，提前决策推理路径
自适应模型切换：根据设备状态动态切换不同大小的端侧模型

六、总结

知识点	核心内容
端云协同架构	决策引擎+推理执行+结果融合三层架构
智能分流决策	多因子加权评分，综合考虑复杂度/网络/算力/隐私/延迟/电量
端云并行推理	同时发起端侧和云端推理，取更优结果
端侧优先/云端优先	带回退的推理策略，兼顾速度和精度
结果缓存	LRU缓存+TTL过期，减少重复推理开销
在线学习	根据历史决策结果自适应调整分流阈值
隐私保护	敏感数据不缓存，安全存储加密
HarmonyOS 6	系统级推理调度器、分布式推理集群、网络质量预测

端云协同推理的本质是一种"智能路由"——不是简单地二选一，而是根据实时条件动态决策。这就像导航软件不是永远推荐同一条路线，而是根据实时路况帮你选最快的路。好的端云协同架构，应该让用户完全感知不到"推理在哪里发生"，只感受到"结果又快又准"。

在HarmonyOS 6中，系统级推理调度器的引入将大大简化开发者的工作。但理解底层决策逻辑依然重要——因为只有理解了"为什么选择这条路"，才能在出问题时快速定位"哪里走错了"。端云协同不是银弹，但它是当前AI落地最务实的架构选择。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

HarmonyOS APP开发：端云协同AI推理架构

HarmonyOS APP开发：端云协同AI推理架构

一、背景与动机

二、核心原理

2.1 端云协同推理架构

2.2 智能分流决策模型

2.3 端云并行推理

2.4 结果缓存与预取

三、代码实战

3.1 端云协同决策引擎

3.2 端云协同推理执行器

3.3 端云协同推理完整UI

四、踩坑与注意事项

坑1：云端推理超时导致UI卡死

坑2：端云模型版本不一致

坑3：缓存导致隐私风险

坑4：网络切换导致推理中断

坑5：端云并行时的资源竞争

五、HarmonyOS 6适配

5.1 API差异

5.2 迁移指南

5.3 HarmonyOS 6新增特性

六、总结

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

HarmonyOS APP开发：端云协同AI推理架构

HarmonyOS APP开发：端云协同AI推理架构

一、背景与动机

二、核心原理

2.1 端云协同推理架构

2.2 智能分流决策模型

2.3 端云并行推理

2.4 结果缓存与预取

三、代码实战

3.1 端云协同决策引擎

3.2 端云协同推理执行器

3.3 端云协同推理完整UI

四、踩坑与注意事项

坑1：云端推理超时导致UI卡死

坑2：端云模型版本不一致

坑3：缓存导致隐私风险

坑4：网络切换导致推理中断

坑5：端云并行时的资源竞争

五、HarmonyOS 6适配

5.1 API差异

5.2 迁移指南

5.3 HarmonyOS 6新增特性

六、总结

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品