- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

KubeEdge-Sedna v0.7.0 发布：联合推理引擎原生集成K8S HPA，系统稳定性全面升级

云容器大未来发表于 2025/05/14 14:22:57 2025/05/14

【摘要】本文以智慧工业园区为例，探讨了云边协同AI在案例实践过程发现的高峰期资源调度问题、系统维护时的“幽灵实例”、及底座版本问题。文章重点介绍北京时间 2025年4月20日，KubeEdge-Sedna 发布的 v0.7.0 版本。此版本为联合推理提供原生的HPA（Horizontal Pod Autoscaling）支持、联合推理与联邦学习控制器功能优化、底座架构升级。

本文以智慧工业园区为例，探讨了云边协同AI在案例实践过程发现的高峰期资源调度问题、系统维护时的“幽灵实例”、及底座版本问题。文章重点介绍北京时间 2025年4月20日，KubeEdge-Sedna 发布的 v0.7.0 版本。此版本为联合推理提供原生的HPA（Horizontal Pod Autoscaling）支持、联合推理与联邦学习控制器功能优化、底座架构升级。

本文重点展示了 KubeEdge-Sedna v0.7.0 版本升级的主要内容，包括:

联合推理支持 HPA：在人工智能快速发展的背景下，深度学习模型对计算资源的需求呈现显著波动，尤其在高峰期更为突出。为此，本次更新引入 HPA（Horizontal Pod Autoscaling）机制，该技术通过实时监控系统负载，动态调节推理实例数量，有效保障高并发场景下的系统稳定性。升级后的 HPA 不仅适配云边协同架构，还可无缝兼容 Sedna 历史版本，显著提升了系统扩展性和资源利用率。
控制器优化与增强：本次更新也迎来了联合推理与联邦学习控制器的重要优化。针对以往版本中存在的任务删除不彻底、实例重建不及时等问题，本次更新通过引入 Deployment 管理机制，实现了推理任务的动态伸缩与生命周期管理。此外，联邦学习任务的控制器也得到了改进，在稳定性上有了重大提升。
底座架构升级：本次更新同步将 Kubernetes 版本升级至 v1.30.7，Golang 版本升级至 v1.22.9。以及一系列问题修复，大幅提升 Sedna 整体性能与可靠性。

一、背景：以智慧工业园区为例

截止2025年，KubeEdge Sedna 相关算法及系统已应用于20+案例中。在中国、印度多项大型智慧工业园区案例中，KubeEdge Sedna 使能的边缘 AI 安全帽检测系统持续保障工地安全生产。随着接入的工地和摄像头数量不断增加，系统面临着高峰期资源调度问题、系统维护时的“幽灵实例”问题、底座版本架构问题。

问题1：高峰期的资源调度。在每天早班和晚班交接时段，工地出入口的人员流动量激增，边缘节点的安全帽检测请求量比平时增长300%，导致系统响应延迟增加50%，影响了对违规未戴安全帽行为的实时识别和预警。引入 HPA 机制后，系统能够根据 CPU 利用率自动扩展检测实例，将响应时间控制在可接受范围内；而在非高峰时段，系统又会自动缩减实例数量，整体节省了40%的计算资源成本。
问题2：系统维护时的“幽灵实例”。在系统升级或维护期间，删除旧的安全帽检测任务后，系统中常常残留检测实例。这些“幽灵实例”会导致新检测任务无法正常创建，运维团队不得不花费大量时间手动清理残留资源。更棘手的是，如果误删了正在运行的检测实例，系统无法自动重建，导致部分区域的安全帽检测中断数小时，存在安全隐患。
问题3：底座版本陈旧。随着部署时间过去，底座 Kubernetes 版本、Golang 版本均有功能升级，但 KubeEdge Sedna 仍在使用数年前的陈旧版本，存在性能和可靠性风险。

二、解决方案：智慧工业园区的AI升级之路

为了解决上述挑战，sedna v0.7.0 版本带来了三项关键升级

方案1：引入了 HPA（Horizontal Pod Autoscaling）机制，能根据实际负载自动调节检测实例数量
方案2：优化了控制器，实现了检测实例的自动清理和重建
方案3：将 Kubernetes 版本升级至 v1.30.7，Golang 版本升级至 v1.22.9，并修复一系列问题

新版本在各地工业园区应用中效果显著

在人员流动高峰期，系统能够自动感知负载变化，及时增加检测实例，确保安全帽检测的实时性和准确性；在低峰期，系统自动释放多余资源，极大降低了运维成本。
系统维护时，无需再手动清理残留实例，新检测任务可以顺利创建和运行，即使发生误删，系统也能自动重建检测实例，保障工地安全管理的连续性和可靠性。
底座版本实现无缝升级，大幅提升 Sedna 整体性能与可靠性。

👇🏻下面详细介绍实现方案：

▍1. 联合推理原生支持 HPA 能力

图1 HPA架构：以联合推理为例

HPA 是 k8s （Kubernetes ）原生提供的 pod 动态扩缩容能力，详细可参考 horizontal-pod-autoscale^[1]，其可以直接应用于 Deployment^[2]或者 StatefulSet^[3]等 k8s 原生资源，为了直接复用 HPA 能力且与老版本 Sedna 兼容，本次更新我们将联合推理的 pod 实例采用 Deployment 进行管理 (详细内容在后续介绍)，这样，我们就可以直接在联合推理范式的 API 中引入 HPA 的配置。

联合推理范式的 API 设计可参见附录1: HPA API 设计。其设计中，HPA 的配置与 k8s 的原生配置保持一致，用户可以直接参考 HPA 官方文档^[1]进行配置。HPA 的配置支持 “同时在云边配置”、“只在云配置”、“只在边配置”以及“不配置”四种模式。当用户不配置时，将与 Sedna 历史版本使用完全一致。

🔗具体方案设计以及实现请参考：proposal^[4] 、implementation^[5]

🔗使用案例参考：附录2: HPA 配置示例

▍2. 联合推理&联邦学习控制器优化

图2 联合推理控制器优化

联合推理方案设计：推理任务可以认为是一种无状态工作负载，因此可以利用 Kubernetes 的本地组件 Deployment 来实现 Pod 生命周期管理。通过使用 k8s 的 Informer 机制来监听推理任务的变化事件，然后通过调用addDeployment/deleteDeployment/updateDeployment等函数对 Deployment 资源进行对应的操作。此外，这一改变也可以直接对接上一章节提到的 HPA 能力，实现推理实例的动态伸缩。

🔗更多细节请参考：proposal^[6]、implementation^[7]

图3 联邦学习控制器优化

联邦学习方案设计：在 Sedna 中，联邦学习属于一种训练任务，其会涉及到一些中间态的参数保存，其 pod 实例存在先后次序关系(即重新启动的 pod 实例可能需要访问上次失败 pod 的中间参数)，因此其不再适合使用 Deployment 进行管理。所以对于联邦学习的控制器优化，依然采用原始方案，对其进行改进优化。

🔗更多细节请参考：proposal^[8] 、implementation^[9]

▍3. 底座架构升级

本次升级将 Sedna 的 k8s 和 go 版本升级到了与 kubeedge 保持一致，并移除了大量 k8s 中已经废弃的函数和工具包。

🔗更多详细信息请参考：Upgrade K8s and Go versions^[10]

本次升级也修复一系列问题，提高系统稳定性

修复对象搜索范式中的级联删除问题：PR #443^[11]
修复联邦学习任务无法删除问题：PR #467^[12]
修复 helm chart 包中 crd 未更新问题：PR #472^[13]
修复 github CI 中的工作流版本问题：PR #475^[14]

三、Release Note

如果读者对于本次版本发布的更多细节感兴趣，欢迎查阅 Sedna v0.7.0 Release Note^[15]。

后续 KubeEdge SIG AI 将发布系列文章，陆续具体介绍新版本升级的特性，欢迎各位读者继续关注社区动态。

相关链接：

[1] Pod 水平自动扩缩: https://kubernetes.io/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/
[2] Deployments: https://kubernetes.io/zh-cn/docs/concepts/workloads/controllers/deployment/
[3] StatefulSet: https://kubernetes.io/zh-cn/docs/concepts/workloads/controllers/statefulset/
[4] joint-inference-hpa.md: https://github.com/kubeedge/sedna/blob/main/docs/proposals/joint-inference-hpa.md
[5] feature: hpa for jointinference : https://github.com/kubeedge/sedna/pull/465
[6] sedna-controller-enhancement.md: https://github.com/kubeedge/sedna/blob/main/docs/proposals/sedna-controller-enhancement.md
[7] JointInferenceService controller enhancement: https://github.com/kubeedge/sedna/pull/445
[8] sedna-controller-enhancement.md: https://github.com/kubeedge/sedna/blob/main/docs/proposals/sedna-controller-enhancement.md
[9] Sedna FederatedLearning controller enhancement: https://github.com/kubeedge/sedna/pull/446
[10] Upgrade K8s and Go versions: https://github.com/kubeedge/sedna/pull/462
[11] fix objectsearch bug of joint delete: https://github.com/kubeedge/sedna/pull/443
[12] fix FederatedLearningJob delete error: https://github.com/kubeedge/sedna/pull/467
[13] fix helm crd can not generete error: https://github.com/kubeedge/sedna/pull/472
[14] update workfow actions from v2 to v4: https://github.com/kubeedge/sedna/pull/475
[15] Sedna v0.7.0 release: https://github.com/kubeedge/sedna/releases/tag/v0.7.0

附录1：HPA API设计

// HPA describes the desired functionality of the HorizontalPodAutoscaler.
type HPA struct {
    // +optional
    MinReplicas *int32`json:"minReplicas,omitempty"`

    MaxReplicas int32`json:"maxReplicas"`

    // +optional
    Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`

    // +optional
    Behavior *autoscalingv2.HorizontalPodAutoscalerBehavior `json:"behavior,omitempty"`
}

// EdgeWorker describes the data a edge worker should have
type EdgeWorker struct {
    Model             SmallModel         `json:"model"`
    HardExampleMining HardExampleMining  `json:"hardExampleMining"`
    Template          v1.PodTemplateSpec `json:"template"`

    // HPA describes the desired functionality of the HorizontalPodAutoscaler.
    // +optional
    HPA *HPA `json:"hpa"`
}

// CloudWorker describes the data a cloud worker should have
type CloudWorker struct {
    Model    BigModel           `json:"model"`
    Template v1.PodTemplateSpec `json:"template"`

    // HPA describes the desired functionality of the HorizontalPodAutoscaler.
    // +optional
    HPA *HPA `json:"hpa"`
}

附录2：HPA 配置示例

apiVersion: sedna.io/v1alpha1
kind: JointInferenceService
metadata:
  name: helmet-detection-inference-example
  namespace: default
spec:
  edgeWorker:
    hpa:
      maxReplicas: 2
      metrics:
        - resource:
            name: cpu
            target:
              averageUtilization: 50
              type: Utilization
          type: Resource
      minReplicas: 1
    model:
      name: "helmet-detection-inference-little-model"
    hardExampleMining:
      name: "IBT"
      parameters:
        - key: "threshold_img"
          value: "0.9"
        - key: "threshold_box"
          value: "0.9"
    template:
      spec:
        nodeName: $Edge-NodeName
        hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        containers:
        - image: kubeedge/sedna-example-joint-inference-helmet-detection-little:v0.5.0
          imagePullPolicy: IfNotPresent
          name:  little-model
          env:  # user defined environments
          - name: input_shape
            value: "416,736"
          - name: "video_url"
            value: "rtsp://localhost/video"
          - name: "all_examples_inference_output"
            value: "/data/output"
          - name: "hard_example_cloud_inference_output"
            value: "/data/hard_example_cloud_inference_output"
          - name: "hard_example_edge_inference_output"
            value: "/data/hard_example_edge_inference_output"
          resources:  # user defined resources
            requests:
              memory: 64M
              cpu: 50m
            limits:
              memory: 2Gi
              cpu: 500m
          volumeMounts:
            - name: outputdir
              mountPath: /data/
        volumes:   # user defined volumes
          - name: outputdir
            hostPath:
              # user must create the directory in host
              path: /joint_inference/output
              type: Directory

  cloudWorker:
    hpa:
      maxReplicas: 5
      metrics:
        - resource:
            name: cpu
            target:
              averageUtilization: 20
              type: Utilization
          type: Resource
      minReplicas: 1
    model:
      name: "helmet-detection-inference-big-model"
    template:
      spec:
        nodeName: $Cloud-NodeName
        dnsPolicy: ClusterFirstWithHostNet
        containers:
          - image: kubeedge/sedna-example-joint-inference-helmet-detection-big:v0.5.0
            name:  big-model
            imagePullPolicy: IfNotPresent
            env:  # user defined environments
              - name: "input_shape"
                value: "544,544"
            resources:  # user defined resources
              requests:
                cpu: 1024m
                memory: 2Gi
              limits:
                cpu: 1024m
                memory: 2Gi

附录3：HPA 部署效果

[root@master-01 ~]# kubectl get hpa -w
NAME                                                      REFERENCE                                                        TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
hpa-helmet-detection-inference-example-deployment-cloud   Deployment/helmet-detection-inference-example-deployment-cloud   37%/20%         1         5         3          92s
hpa-helmet-detection-inference-example-deployment-edge    Deployment/helmet-detection-inference-example-deployment-edge    348%/50%        1         2         2          92s
hpa-helmet-detection-inference-example-deployment-cloud   Deployment/helmet-detection-inference-example-deployment-cloud   37%/20%         1         5         4          106s
hpa-helmet-detection-inference-example-deployment-edge    Deployment/helmet-detection-inference-example-deployment-edge    535%/50%        1         2         2          106s
hpa-helmet-detection-inference-example-deployment-cloud   Deployment/helmet-detection-inference-example-deployment-cloud   18%/20%         1         5         4          2m1s
hpa-helmet-detection-inference-example-deployment-edge    Deployment/helmet-detection-inference-example-deployment-edge    769%/50%        1         2         2          2m1s
hpa-helmet-detection-inference-example-deployment-cloud   Deployment/helmet-detection-inference-example-deployment-cloud   12%/20%         1         5         4          2m16s
[root@master-01 jointinference]# kubectl get po
NAME                                                              READY   STATUS    RESTARTS      AGE
helmet-detection-inference-example-deployment-cloud-7dffd47c6fl   1/1     Running   0             4m34s
helmet-detection-inference-example-deployment-cloud-7dffd4dpnnh   1/1     Running   0             2m49s
helmet-detection-inference-example-deployment-cloud-7dffd4f4dtw   1/1     Running   0             4m19s
helmet-detection-inference-example-deployment-cloud-7dffd4kcvwd   1/1     Running   0             5m20s
helmet-detection-inference-example-deployment-cloud-7dffd4shk86   1/1     Running   0             5m50s
helmet-detection-inference-example-deployment-edge-7b6575c52s7k   1/1     Running   0             5m50s
helmet-detection-inference-example-deployment-edge-7b6575c59g48   1/1     Running   0             5m20s

【更多KubeEdge资讯推荐】玩转KubeEdge保姆级攻略——环境搭建篇

玩转KubeEdge保姆级攻略——环境搭建篇

《玩转KubeEdge保姆级攻略——环境搭建篇》课程主要介绍如何通过华为云服务快速搭建一套KubeEdge边缘计算开发平台及部署Sedna、EdgeMesh等KubeEdge生态组件。

课程免费学习链接：https://connect.huaweicloud.com/courses/learn/course-v1:HuaweiX+CBUCNXNX022+Self-paced/about

KubeEdge社区介绍：KubeEdge是业界首个云原生边缘计算框架、云原生计算基金会（CNCF）唯一毕业级边缘计算开源项目，社区已完成业界最大规模云原生边云协同高速公路项目（统一管理10万边缘节点/50万边缘应用）、业界首个云原生星地协同卫星、业界首个云原生车云协同汽车、业界首个云原生油田项目，开源业界首个分布式协同AI框架Sedna及业界首个边云协同终身学习范式，并在持续开拓创新中。

KubeEdge网站 : https://kubeedge.io

GitHub地址 : https://github.com/kubeedge/kubeedge

Slack地址 : https://kubeedge.slack.com

邮件列表 : https://groups.google.com/forum/#!forum/kubeedge

每周社区例会 : https://zoom.us/j/4167237304

Twitter : https://twitter.com/KubeEdge

文档地址 : https://docs.kubeedge.io/en/latest/

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

KubeEdge-Sedna v0.7.0 发布：联合推理引擎原生集成K8S HPA，系统稳定性全面升级

一、背景：以智慧工业园区为例

二、解决方案：智慧工业园区的AI升级之路

▍1. 联合推理原生支持 HPA 能力

▍2. 联合推理&联邦学习控制器优化

▍3. 底座架构升级

三、Release Note

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

KubeEdge-Sedna v0.7.0 发布：联合推理引擎原生集成K8S HPA，系统稳定性全面升级

一、背景：以智慧工业园区为例

二、解决方案：智慧工业园区的AI升级之路

▍1. 联合推理原生支持 HPA 能力

▍2. 联合推理&联邦学习控制器优化

▍3. 底座架构升级

三、Release Note

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品