- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

基于openEuler 进程管理与调度实验

pluto1447 发表于 2025/12/23 20:23:18 2025/12/23

【摘要】本实验深入探究了openEuler系统的进程管理与调度机制。通过编程实践了进程创建、阻塞与唤醒等基础操作，并对比实现了FCFS、SJF和RR等经典调度算法。重点通过内核模块模拟了CFS及其openEuler特有的QoS优化策略，验证了在线任务对离线任务的“绝对压制”特性，确保了关键业务的低时延响应，并加深了对Linux内核调度逻辑、红黑树应用及国产系统高性能优化特性的理解。

openEuler 进程管理与调度

一、实验目的

掌握基础的进程管理：进程创建、进程撤消、进程阻塞、进程唤醒等。
openEuler的进程管理和调度：学会openEuler的进程管理和调度。
掌握基础的进程掉调度算法：先来先服务算法（FCFS）、短作业优先（SJF）调度算法、时间片轮转算法、优先级调度算法和完全公平调度器（CFS）。

二、实验环境

宿主机：Windows 10/11 或 macOS
虚拟机软件：VMware Workstation
目标系统：openEuler 22.03 LTS 或 24.03 LTS (x86_64 架构)
开发工具：VS Code

三、实验内容与步骤

1. 基础的进程管理

1.1 进程概述

进程是执行的程序，包括当前活动，如程序计数器的值和处理器的内容等。包括：

文本段：可执行代码。
数据段：全局变量。
堆段：在程序运行时动态分配的内存。
栈段: 调用函数是的临时动态分配的内存（如函数参数/返回地址、局部变量）。

1.2 进程状态

新建：进程正在创建。
运行：指令正在执行。
等待：进程等待发生某个事件。
就绪: 进程等待分配处理器。
终止：进程已经完成执行。
进程状态图:

1.3 进程控制块

操作系统内的每个进程表示，都采用进程控制块（Process Control Block, PCB），也称为任务控制块（task control block）。

进程状态：状态可以包括新建、就绪、运行、等待、停止等。
程序计数器：计数器表示进程将要执行的下个指令的地址。
CPU 寄存器：根据计算机体系结构的不同，寄存器的类型和数量也会不同。它们包括累加器、索引寄存器、堆栈指针、通用寄存器和其他条件码信息寄存器。在发生中断时，这些状态信息与程序计数器需要一起保存，以便进程在被重新调度后可以继续正确执行。
CPU 调度信息: 根据操作系统使用的内存系统，这类信息可以包括基址和界限寄存器的值、页表或段表。
内存管理信息：进程已经完成执行。
记账信息：这类信息包括 CPU 时间、实际使用时间、时间期限、记账数据、作业或进程数量等。
I/O 状态信息：这类信息包括分配给进程的 I/O 设备列表和打开文件列表等。

1.4 进程状态

僵尸进程 (Zombie Process)

定义：
当一个子进程结束运行（调用了 exit），但其父进程还没有调用 wait() 或 waitpid() 来读取它的退出状态（Exit Status）时，这个子进程就会变成“僵尸进程”。
状态：
进程已经死亡（代码执行完毕，内存大部分已释放），但在系统的进程表（Process Table）中仍然保留了一个位置（PCB，进程控制块），记录了 PID、终止状态和资源使用信息。

守护进程 (Daemon Process)

定义：
守护进程是在后台运行的一种特殊进程，它独立于控制终端（Terminal），并且周期性地执行某种任务或等待处理某些发生的事件。
特征：
生命周期长：通常在系统启动时启动，在系统关闭时停止。
脱离终端：它不属于任何用户终端（TTY），你在终端关闭窗口也不会影响它，因为它根本不受终端信号（如 SIGHUP）控制。
命名习惯：通常以 d 结尾，例如 sshd (SSH服务)、mysqld (MySQL服务)、httpd (Web服务)、crond (定时任务)。

1.5 进程管理

调用linux函数实现进程的进程创建、进程撤消、
进程阻塞、进程唤醒

process_control.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>     //fork, sleep, pause, getpid
#include <sys/types.h>  //pid_t
#include <sys/wait.h>   //wait
#include <signal.h>     //signal, kill

//信号处理函数：用于处理唤醒信号
void wake_up_handler(int sig) {
    printf("[子进程] 收到信号 %d，我被唤醒了！\n", sig);
}

int main() {
    pid_t pid;

    printf("[父进程] 准备创建子进程...\n");

    //进程创建 (Process Creation)
    pid = fork();

    if (pid < 0) {
        perror("Fork failed");
        exit(1);
    }

    //子进程逻辑
    if (pid == 0) {
        printf("[子进程] 创建成功！我是子进程，PID = %d\n", getpid());
        
        //注册信号处理函数，准备被唤醒
        signal(SIGUSR1, wake_up_handler);

        printf("[子进程] 我要阻塞自己 (Blocking)，等待父进程叫醒我...\n");
        
        //进程阻塞
        //pause() 会让进程挂起，不再占用 CPU，直到收到一个信号
        pause(); 

        printf("[子进程] 恢复运行，准备退出...\n");
        
        //进程撤消/退出
        printf("[子进程] 再见！(调用 exit)\n");
        exit(0); 
    }
    //父进程逻辑
    else {
        printf("[父进程] 子进程已创建 (PID: %d)。我先睡 2 秒，模拟做其他事。\n", pid);
        sleep(2); //此时父进程在 sleep，子进程在 pause，都在阻塞状态

        printf("[父进程] 时间到！正在发送信号唤醒子进程 (Waking up)...\n");

        //进程唤醒
        //向子进程发送 SIGUSR1 信号，打破子进程的 pause()
        kill(pid, SIGUSR1);

        printf("[父进程] 正在等待子进程彻底结束 (Wait)...\n");
        
        // 父进程阻塞，直到子进程退出，回收僵尸进程资源
        wait(NULL); 
        
        printf("[父进程] 子进程已回收。父进程退出。\n");
    }

    return 0;
}

编译结果：

2 基础的进程调度算法

2.1 进程调度的评价指标

完成时间：进程执行完成的时间
等待时间：进程等待调用的时间
周转时间：进程完成时间 - 进程到达时间
平均周转时间：总周转时间 / 进程数
带权周转时间：周转时间 / 服务时间（执行时间）

2.2 先来先服务（FCFS）

“先来先服务”（First-Come, First-Served，简称 FCFS）是操作系统中最基础、最简单的调度算法。用一句话概括就是：谁先到，谁先用 CPU，不管是长任务还是短任务，都要排队。核心逻辑是按照进程进入“就绪队列”的先后顺序进行调度。

task_struct.h (简单进程的实现)

#ifndef TASK_STRUCT_H
#define TASK_STRUCT_H

//进程
struct task_struct {
    int pid;                //进程号
    int arrive_time;        //到达时间
    int service_time;       //服务时间
    int start_time;         //开始时间
    int finish_time;        //完成时间
    int turnaround_time;    //周转时间
    int waiting_time;       //等待时间
};

#endif

fcfs.c (简单fcfs的实现)

#include <stdio.h>
#include <stdlib.h>
#include "task_struct.h"

int compare(const void *a, const void *b) {
    struct task_struct *p1 = (struct task_struct *)a;
    struct task_struct *p2 = (struct task_struct *)b;
    return p1->arrive_time - p2->arrive_time;
}

void fcfs(struct task_struct process[], int len) {
    int current_time = 0;               
    float total_turnaround_time = 0;    
    float total_waiting_time = 0;

    //按照到达时间排序
    qsort(process, len, sizeof(struct task_struct), compare);

    for (int i = 0; i < len; i++) {
        if (current_time < process[i].arrive_time) {
            current_time = process[i].arrive_time;
        }

        process[i].start_time = current_time;
        process[i].finish_time = current_time + process[i].service_time;
        process[i].turnaround_time = process[i].finish_time - process[i].arrive_time;
        process[i].waiting_time = process[i].turnaround_time - process[i].service_time;

        current_time += process[i].service_time;
        total_turnaround_time += process[i].turnaround_time;
        total_waiting_time += process[i].waiting_time;
    }

    //打印进程
    printf("进程号\t\t到达时间\t服务时间\t开始时间\t完成时间\t周转时间\t等待时间\n");
    for (int i = 0; i < len; i++) {
        printf("%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\n", process[i].pid, process[i].arrive_time, 
            process[i].service_time, process[i].start_time, process[i].finish_time,
            process[i].turnaround_time, process[i].waiting_time);
    }

    //打印平均时间
    printf("平均周转时间: %f\n", total_turnaround_time / len);
    printf("平均等待时间: %f\n", total_waiting_time / len);
}

int main () {
    struct task_struct *process;
    int len;

    printf("请输入进程数: ");
    scanf("%d", &len);

    //动态分配进程数
    process = (struct task_struct*)malloc(len * sizeof(struct task_struct));

    for (int i = 0; i < len; i++) {
        process[i].pid = i + 1;     //pid从1开始
        printf("请输入进程%d的到达时间和服务时间(空格隔开): ", i + 1);
        scanf("%d %d", &process[i].arrive_time, &process[i].service_time);
    }

    fcfs(process, len);
    free(process);

    return 0;
}

测试结果

2.3 短作业优先（SJF）

短作业优先（Shortest Job First）是一种非抢占式或抢占式的调度算法，总是选择估计运行时间最短的作业/进程优先执行。模拟的是非抢占式的SJF调度算法。

SJF.c (非抢占式的SJF)

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include "task_struct.h"

#define INF 99999999

// 辅助排序：先按到达时间排，方便处理时间线
int compare_arrive(const void *a, const void *b) {
    struct task_struct *p1 = (struct task_struct *)a;
    struct task_struct *p2 = (struct task_struct *)b;
    return p1->arrive_time - p2->arrive_time;
}

void sjf(struct task_struct process[], int len) {
    int current_time = 0;
    int completed = 0;
    float total_turnaround_time = 0;
    float total_waiting_time = 0;
    
    // 标记进程是否已完成，0为未完成，1为完成
    int *is_completed = (int *)calloc(len, sizeof(int));

    // 预排序：虽然SJF是看服务时间，但首先得看谁先到
    qsort(process, len, sizeof(struct task_struct), compare_arrive);

    while (completed < len) {
        int idx = -1;
        int min_service = INF;

        // 寻找当前时刻已到达且服务时间最短的进程
        for (int i = 0; i < len; i++) {
            if (process[i].arrive_time <= current_time && is_completed[i] == 0) {
                if (process[i].service_time < min_service) {
                    min_service = process[i].service_time;
                    idx = i;
                }
            }
        }

        // 如果找到了可运行的进程
        if (idx != -1) {
            process[idx].start_time = current_time;
            process[idx].finish_time = current_time + process[idx].service_time;
            process[idx].turnaround_time = process[idx].finish_time - process[idx].arrive_time;
            process[idx].waiting_time = process[idx].turnaround_time - process[idx].service_time;

            total_turnaround_time += process[idx].turnaround_time;
            total_waiting_time += process[idx].waiting_time;
            
            current_time += process[idx].service_time;
            is_completed[idx] = 1;
            completed++;
        } else {
            // 当前没有进程到达，CPU空闲，时间跳到下一个最近到达的进程
            // 因为已经按到达时间排序了，找到第一个未完成的即可
            for (int i = 0; i < len; i++) {
                 if (is_completed[i] == 0) {
                     current_time = process[i].arrive_time;
                     break;
                 }
            }
        }
    }

    // 打印进程
    printf("\n--- 短作业优先 (SJF) 结果 ---\n");
    printf("进程号\t\t到达时间\t服务时间\t开始时间\t完成时间\t周转时间\t等待时间\n");
    for (int i = 0; i < len; i++) {
        printf("%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\n", process[i].pid, process[i].arrive_time, 
            process[i].service_time, process[i].start_time, process[i].finish_time,
            process[i].turnaround_time, process[i].waiting_time);
    }

    printf("平均周转时间: %.2f\n", total_turnaround_time / len);
    printf("平均等待时间: %.2f\n", total_waiting_time / len);

    free(is_completed);
}

int main () {
    struct task_struct *process;
    int len;

    printf("请输入进程数: ");
    scanf("%d", &len);

    process = (struct task_struct*)malloc(len * sizeof(struct task_struct));

    for (int i = 0; i < len; i++) {
        process[i].pid = i + 1;
        printf("请输入进程%d的到达时间和服务时间(空格隔开): ", i + 1);
        scanf("%d %d", &process[i].arrive_time, &process[i].service_time);
    }

    sjf(process, len);
    free(process);

    return 0;
}

测试结果

2.4 时间片轮转（RR）

时间片轮转（Round Robin）是一种抢占式调度算法，每个进程被分配一个固定的时间片（Time Quantum），CPU在进程间循环轮转。

RR.c

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include "task_struct.h"

int compare_arrive(const void *a, const void *b) {
    struct task_struct *p1 = (struct task_struct *)a;
    struct task_struct *p2 = (struct task_struct *)b;
    return p1->arrive_time - p2->arrive_time;
}

void rr(struct task_struct process[], int len, int quantum) {
    int current_time = 0;
    int completed = 0;
    float total_turnaround_time = 0;
    float total_waiting_time = 0;

    // 记录剩余服务时间
    int *rem_time = (int *)malloc(len * sizeof(int));
    // 标记是否第一次获得CPU（用于记录start_time）
    int *is_started = (int *)calloc(len, sizeof(int));
    
    // 简单队列实现
    // 队列大小最大可能是进程数的几倍（考虑到入队出队），为了安全设大一点
    int *queue = (int *)malloc(len * 100 * sizeof(int)); 
    int front = 0, rear = 0;
    
    // 标记进程是否已经进入过队列（避免重复初始入队）
    int *visited = (int *)calloc(len, sizeof(int));

    // 先按到达时间排序
    qsort(process, len, sizeof(struct task_struct), compare_arrive);

    // 初始化剩余时间
    for (int i = 0; i < len; i++) {
        rem_time[i] = process[i].service_time;
    }

    // 将最先到达的进程加入队列（可能有多个同时到达）
    if (len > 0) {
        current_time = process[0].arrive_time; // 还没开始时，时间跳到第一个进程到达时刻
    }
    
    // 初始入队逻辑
    for (int i = 0; i < len; i++) {
        if (process[i].arrive_time <= current_time) {
            queue[rear++] = i;
            visited[i] = 1;
        }
    }

    while (completed < len) {
        // 如果队列为空但还有进程没完成（说明中间有CPU空闲期）
        if (front == rear) {
            for (int i = 0; i < len; i++) {
                if (visited[i] == 0) {
                    current_time = process[i].arrive_time;
                    queue[rear++] = i;
                    visited[i] = 1;
                    break; 
                }
            }
        }

        int idx = queue[front++]; // 出队
        
        // 如果是第一次运行，记录开始时间
        if (is_started[idx] == 0) {
            process[idx].start_time = current_time;
            is_started[idx] = 1;
        }

        // 计算本次运行时间
        int exec_time = (rem_time[idx] > quantum) ? quantum : rem_time[idx];
        
        // 更新当前时间
        rem_time[idx] -= exec_time;
        current_time += exec_time;

        // 在当前进程执行期间或结束时刻，检查是否有新进程到达
        // 必须先把新到达的加入队列，然后再把当前没做完的放回队尾
        for (int i = 0; i < len; i++) {
            if (visited[i] == 0 && process[i].arrive_time <= current_time) {
                queue[rear++] = i;
                visited[i] = 1;
            }
        }

        // 如果进程完成
        if (rem_time[idx] == 0) {
            completed++;
            process[idx].finish_time = current_time;
            process[idx].turnaround_time = process[idx].finish_time - process[idx].arrive_time;
            process[idx].waiting_time = process[idx].turnaround_time - process[idx].service_time;
            
            total_turnaround_time += process[idx].turnaround_time;
            total_waiting_time += process[idx].waiting_time;
        } else {
            // 没完成，放回队尾
            queue[rear++] = idx;
        }
    }

    // 打印进程
    printf("\n--- 时间片轮转 (RR, q=%d) 结果 ---\n", quantum);
    printf("进程号\t\t到达时间\t服务时间\t开始时间\t完成时间\t周转时间\t等待时间\n");
    for (int i = 0; i < len; i++) {
        // 注意：RR中的开始时间是指"第一次获得CPU的时间"
        printf("%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\t\t%d\n", process[i].pid, process[i].arrive_time, 
            process[i].service_time, process[i].start_time, process[i].finish_time,
            process[i].turnaround_time, process[i].waiting_time);
    }

    printf("平均周转时间: %.2f\n", total_turnaround_time / len);
    printf("平均等待时间: %.2f\n", total_waiting_time / len);

    free(rem_time);
    free(is_started);
    free(queue);
    free(visited);
}

int main () {
    struct task_struct *process;
    int len;
    int quantum;

    printf("请输入进程数: ");
    scanf("%d", &len);

    printf("请输入时间片大小: ");
    scanf("%d", &quantum);

    process = (struct task_struct*)malloc(len * sizeof(struct task_struct));

    for (int i = 0; i < len; i++) {
        process[i].pid = i + 1;
        printf("请输入进程%d的到达时间和服务时间(空格隔开): ", i + 1);
        scanf("%d %d", &process[i].arrive_time, &process[i].service_time);
    }

    rr(process, len, quantum);
    free(process);

    return 0;
}

测试结果

3 openEuler的进程调度

3.1 完全公平调度器（CFS）

完全公平调度器（Completely Fair Scheduler）是Linux内核自2.6.23版本引入的默认调度器，它完全摒弃了传统的时间片概念，取而代之的是公平分配CPU时间的理念。

设计目标：

绝对公平：所有可运行任务应获得相等的CPU时间
低延迟：交互式任务响应快
高吞吐量：最大化CPU利用率
可扩展性：高效支持大量进程

关键特性：

使用虚拟运行时间（vruntime）代替传统时间片： vruntime = 实际运行时间 × (基准权重/任务权重)
使用红黑树组织任务：按vruntime排序，vruntime最小的任务在最左侧
调度实体（sched_entity）：包含权重、红黑树节点、vruntime等信息
CFS运行队列（cfs_rq）：包含红黑树根节点、任务数统计、最小vruntime等
调度周期和最小粒度：调度延迟保证每个任务至少运行一次，最小粒度防止过度上下文切换
组调度：允许任务分组，在组间和组内都保持公平
nice值与权重：nice值从-20到19，对应不同权重，影响vruntime增长速度

3.2 openEuler的优化

基础性能优化：

对进程创建（fork/exec）路径进行了优化，减少了开销，显著提升了高并发服务（如 Web 服务器）的性能。
增强了 NUMA 拓扑感知调度，调度器会尽可能将进程及其内存分配在同一个 NUMA 节点内，大幅减少跨节点访问的高延迟。
改进了对 ARM big.LITTLE 等异构 CPU 的调度支持，能更智能地将计算密集型任务分配到大核，将后台任务调度到小核，兼顾性能与能效。

混合部署与优先级调度优化：

引入了 QAS（Quality aware scheduler）调度器。这是针对混部场景的核心优化，它让在线任务对 CPU 实现快速、确定的抢占，同时有效压制离线任务可能带来的干扰。
实现了优先级负载均衡。当 CPU 负载不均时，系统会优先迁移高优先级的在线任务，以更快地缓解热点 CPU 的压力，保障关键业务的响应速度。
提供了 CPU 干扰控制机制（如 SMT_EXPELLER）。该机制可以彻底解决高低优先级任务在超线程（SMT）等共享硬件资源上的竞争，实现高优先级任务对低优先级任务的绝对压制，确保性能隔离。

资源控制与可观测性优化：

全面增强了对 cgroup v2 的支持。系统管理员可以通过 cgroup v2 的接口（如 cpu.max）对进程组进行非常精细的 CPU 资源限额设置，这是实现资源隔离的技术基础。
深度融合了 eBPF 生态。提供了丰富的 eBPF 工具（如 bcc-tools），允许开发者以极低的开销动态追踪内核调度事件，为深度性能调优和问题诊断提供了强大手段。

3.3 openEuler的单核CFS模拟

本次实验简化了openEuler的cfs，只是模拟了单核情况，实现了oe_task_struct(进程控制块)、oe_sched_entity(调度实体)、oe_cfs_rq(运行队列)的基础数据结构。且增加了openEuler对于CPU抢占的策略：当一台服务上同时存在在线服务、离线服务，如果不对离线服务加以限制，离线服务将会尽可能多的占用资源，从而增加在线任务的相应时延。同时使用linux内核的函数和数据结构，比如rbtree(红黑树)、pr_info(内核的printf，打印到内核日志)、container_of(获得结构体成员所在的结构体)等。模拟时钟为10ms的调度周期来调度进程，感兴趣的话可以深入学习内核的实现和原理。

一、使用的函数

1. 链表操作 (Linked List)

Linux 内核使用双向循环链表，这些函数定义在 <linux/list.h> 中。

LIST_HEAD(name)
- 原型: #define LIST_HEAD(name) struct list_head name = LIST_HEAD_INIT(name)
- 解释: 静态定义并初始化一个名为 name 的链表头。它创建了一个 struct list_head 变量，并将 next 和 prev 指针都指向自身，表示链表为空。
INIT_LIST_HEAD(ptr)
- 原型: static inline void INIT_LIST_HEAD(struct list_head *list);
- 解释: 运行时动态初始化链表头。将传入的链表节点的 next 和 prev 指针都指向其自身。常用于结构体内的链表成员初始化。
list_add_tail(new, head)
- 原型: static inline void list_add_tail(struct list_head *new, struct list_head *head);
- 解释: 将新节点 new 插入到链表头 head 的尾部（即 head->prev 之后）。这常用于实现先进先出（FIFO）的队列。
list_del_init(entry)
- 原型: static inline void list_del_init(struct list_head *entry);
- 解释: 从链表中移除指定的节点 entry，并立即对其调用 INIT_LIST_HEAD。这样做的好处是节点被移除后处于一种已知的、安全的状态（空节点），再次调用 list_del 或 list_empty 也是安全的。
list_empty(head)
- 原型: static inline int list_empty(const struct list_head *head);
- 解释: 检查链表是否为空。如果头节点的 next 指针指向头节点本身，则返回真（1），否则返回假（0）。
list_for_each_entry_safe(pos, n, head, member)
- 原型: #define list_for_each_entry_safe(pos, n, head, member) ...
- 解释: 遍历链表的宏，专门用于在遍历过程中可能删除当前节点的场景。它使用临时变量 n 预先保存下一个节点的指针，防止因删除 pos 导致断链而崩溃。

2. 红黑树操作 (Red-Black Tree - Cached)

内核提供了带缓存的红黑树（rb_root_cached），它在标准红黑树的基础上额外缓存了最左侧节点（最小值），将查找最小值的复杂度从 O(log N) 降低到 O(1)。定义在 <linux/rbtree.h> 中。

rb_entry(ptr, type, member)
- 原型: #define rb_entry(ptr, type, member) container_of(ptr, type, member)
- 解释: container_of 的宏封装，专门用于红黑树节点。根据指向 struct rb_node 的指针 ptr，获取包含它的宿主结构体 type 的指针。
rb_link_node(node, parent, rb_link)
- 原型: static inline void rb_link_node(struct rb_node *node, struct rb_node *parent, struct rb_node **rb_link);
- 解释: 在插入新节点之前的准备工作。它设置新节点 node 的父指针，并将其放入父节点的左或右子位置（由 rb_link 指定）。此函数不执行平衡操作。
rb_insert_color_cached(node, root, newleft)
- 原型: void rb_insert_color_cached(struct rb_node *node, struct rb_root_cached *root, bool newleft);
- 解释: 在调用 rb_link_node 将节点放入树中后，调用此函数进行颜色调整和树的旋转以保持红黑树平衡。newleft 是一个布尔值，如果插入的节点是新的最左侧节点（最小值），则应传入 true，以便更新缓存。
rb_erase_cached(node, root)
- 原型: void rb_erase_cached(struct rb_node *node, struct rb_root_cached *root);
- 解释: 从带缓存的红黑树中移除一个节点，并重新平衡树。如果被移除的是缓存的最左侧节点，该函数会自动更新缓存指向次小节点。
rb_first_cached(root)
- 原型: struct rb_node *rb_first_cached(struct rb_root_cached *root);
- 解释: 以 O(1) 的时间复杂度返回树中的第一个节点（中序遍历的第一个，即最小节点）。直接返回 root->rb_leftmost。

3. 内存管理 (Memory Management)

定义在 <linux/slab.h> 中。

kmalloc(size, flags)
- 原型: void *kmalloc(size_t size, gfp_t flags);
- 解释: 内核中最常用的内存分配函数。它分配物理上连续的内存块。flags 指定分配行为（如 GFP_KERNEL 表示可能睡眠，GFP_ATOMIC 表示原子上下文中分配）。
kfree(objp)
- 原型: void kfree(const void *objp);
- 解释: 释放由 kmalloc 分配的内存。如果传入 NULL，函数会安全地直接返回，不执行任何操作。

4. 通用宏与辅助工具 (Utilities)

container_of(ptr, type, member)
- 原型: #define container_of(ptr, type, member) ...
- 解释: 内核核心宏。根据结构体中某个成员 member 的地址 ptr，反推算出该结构体 type 的起始地址。它利用 offsetof 机制计算偏移量，是链表和红黑树等数据结构实现的基础。
strscpy(dest, src, count)
- 原型: ssize_t strscpy(char *dest, const char *src, size_t count);
- 解释: 安全的字符串复制函数。比 strncpy 和 strlcpy 更优：它保证目标缓冲区以 NUL 结尾，不会读取源字符串越界，并且在截断时返回错误码 -E2BIG 而不是截断后的长度，便于调用者处理错误。
pr_info(fmt, ...)
- 原型: #define pr_info(fmt, ...) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
- 解释: 日志打印宏。它是 printk 的封装，指定了日志级别为 KERN_INFO（信息性消息）。通常用于打印非错误的运行时信息。

二、实验内容

oe_sched.h（进程的定义）

#ifndef OE_SCHED_H
#define OE_SCHED_H

#include <linux/types.h>
#include <linux/rbtree.h>
#include <linux/list.h>

#define NICE_0_LOAD 1024
#define TASK_COMM_LEN 16

#define ENQUEUE_WAKEUP   0x01
#define ENQUEUE_INITIAL  0x80

//openEuler QoS
enum qos_level {
    QOS_OFFLINE = 0,  //离线任务(低优)
    QOS_ONLINE  = 1   //在线任务(高优，绝对优先)
};


//调度实体
struct oe_sched_entity {
    struct rb_node node;    //红黑树的节点
    u64 vruntime;           //虚拟运行时间
    int weight;             //权重
};

//PCB
struct oe_task_struct {
    int pid;                
    char comm[TASK_COMM_LEN];   //进程名
    struct oe_sched_entity se;
    enum qos_level qos_level;   

    //运行时间统计
    u64 exec_start;
    u64 sum_exec_runtime;
    u64 needed_time;

    struct list_head list;          //双向链表，方便释放内存
    struct list_head throttle_list; //压制链表
};

//运行队列
struct oe_cfs_rq {
    struct rb_root_cached root;             //带缓存的红黑树
    int nr_running;                         //当前运行队列中可运行任务的总数
    int nr_running_online;                  //Online任务的数量
    struct list_head qos_throttled_list;    //存放被临时压制的Offline任务
    u64 min_vruntime;                       //防止饥饿（当新加入的vruntime一直小于旧的vruntime,会导致饥饿）
    struct oe_sched_entity *curr;           //当前调用的实体
};

//获得struct oe_task_struct的成员struct oe_sched_entity node的结构体本身
#define task_of(ptr) container_of(ptr, struct oe_task_struct, se)

//接口
void enqueue_task(struct oe_cfs_rq *cfs_rq, struct oe_task_struct *task, int flags);
void update_curr(struct oe_cfs_rq *cfs_rq, u64 delta_exec);
struct oe_task_struct *pick_next_task(struct oe_cfs_rq *cfs_rq);

#endif

oe_fair.c（CFS的具体实现）

#include <linux/kernel.h>
#include "oe_sched.h"

static inline u64 max_vruntime(u64 min_vruntime, u64 vruntime)
{
    //s64防止溢出
    s64 delta = (s64)(vruntime - min_vruntime);
    if (delta > 0)
        min_vruntime = vruntime;

    return min_vruntime;
}

//更新队列的 min_vruntime
static void update_min_vruntime(struct oe_cfs_rq *cfs_rq)
{
    struct oe_sched_entity *curr = cfs_rq->curr;
    struct oe_sched_entity *most_left = NULL;
    struct rb_node *node = rb_first_cached(&cfs_rq->root);
    
    u64 vruntime = cfs_rq->min_vruntime;

    if (node) {
        most_left = rb_entry(node, struct oe_sched_entity, node);
    }

    if (curr) {
        vruntime = curr->vruntime;
    }

    if (most_left) {
        if (!curr) {
            vruntime = most_left->vruntime;
        } else {
            vruntime = min(vruntime, most_left->vruntime);
        }
    }

    //保证单调递增
    cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
}

//计算vruntime
static inline u64 calc_delta(u64 delta_exec, int weight) {
    return delta_exec * NICE_0_LOAD / weight;
}

//更新当前信息
void update_curr(struct oe_cfs_rq *cfs_rq, u64 delta_exec)
{
    struct oe_sched_entity *curr_se = cfs_rq->curr;
    struct oe_task_struct *curr_task;
    u64 delta_vruntime;

    if (!curr_se) 
        return;

    curr_task = task_of(curr_se);
    curr_task->sum_exec_runtime += delta_exec;
    
    delta_vruntime = calc_delta(delta_exec, curr_se->weight);
    curr_se->vruntime += delta_vruntime;

    update_min_vruntime(cfs_rq);
}

//防止饥饿
static void place_entity(struct oe_cfs_rq *cfs_rq, struct oe_sched_entity *se, int initial)
{
    u64 vruntime = cfs_rq->min_vruntime;

    if (initial) {
        se->vruntime = vruntime;
    } 
    else {
        se->vruntime = max_vruntime(se->vruntime, vruntime);
    }
}

//插入红黑树
void enqueue_task(struct oe_cfs_rq *cfs_rq, struct oe_task_struct *task, int flags)
{
    struct oe_sched_entity *se = &task->se;
    struct rb_node **new = &cfs_rq->root.rb_root.rb_node;
    struct rb_node *parent = NULL;
    struct oe_sched_entity *entry;
    bool most_left = true;

    place_entity(cfs_rq, se, !!(flags & ENQUEUE_INITIAL));

    // 1. 查找插入位置
    while (*new) {
        parent = *new;
        entry = rb_entry(parent, struct oe_sched_entity, node);

        // 按 vruntime 排序
        if (se->vruntime < entry->vruntime) {
            new = &parent->rb_left;
        } else {
            new = &parent->rb_right;
            most_left = false;
        }
    }

    rb_link_node(&se->node, parent, new);
    rb_insert_color_cached(&se->node, &cfs_rq->root, most_left);

    if (task->qos_level == QOS_ONLINE) {
        cfs_rq->nr_running_online++;
    }
    cfs_rq->nr_running++;
}

//压制offline
static void oe_throttle_offline_task(struct oe_cfs_rq *cfs_rq, struct oe_task_struct *task)
{
    //从红黑树移除
    rb_erase_cached(&task->se.node, &cfs_rq->root);
    cfs_rq->nr_running--; 
    
    //加入压制链表  
    list_add_tail(&task->throttle_list, &cfs_rq->qos_throttled_list);
    
    pr_info("OE_SCHED: [QoS] Throttling Offline task %s (vruntime: %llu)\n", 
            task->comm, task->se.vruntime);
}

//解压offline
static void oe_unthrottle_tasks(struct oe_cfs_rq *cfs_rq)
{
    struct oe_task_struct *task, *next;

    if (list_empty(&cfs_rq->qos_throttled_list)) 
        return;

    //如果当前还有Online任务在排队，就不解压
    if (cfs_rq->nr_running_online > 0) 
        return;

    //遍历压制链表，把任务放回红黑树
    list_for_each_entry_safe(task, next, &cfs_rq->qos_throttled_list, throttle_list) {
        list_del_init(&task->throttle_list);    
        enqueue_task(cfs_rq, task, 0); //放回红黑树
        pr_info("OE_SCHED: [QoS] Unthrottling task %s\n", task->comm);
    }
}

//选择调用任务
struct oe_task_struct *pick_next_task(struct oe_cfs_rq *cfs_rq)
{
    struct oe_task_struct *task;
    struct rb_node *most_left;

    if (cfs_rq->nr_running_online == 0) {
        oe_unthrottle_tasks(cfs_rq);
    }

    if (!cfs_rq->nr_running) 
        return NULL;

    while (1) {
        most_left = rb_first_cached(&cfs_rq->root);
        if (!most_left) 
            return NULL;
        
        task = task_of(rb_entry(most_left, struct oe_sched_entity, node));

        //使最左侧为Online，压制offline
        if (cfs_rq->nr_running_online > 0 && task->qos_level == QOS_OFFLINE) {
            oe_throttle_offline_task(cfs_rq, task);
            continue; 
        }

        //要么它是Online，要么全队列都没Online了
        break; 
    }

    if (task) {
        rb_erase_cached(&task->se.node, &cfs_rq->root);
        cfs_rq->nr_running--;
        if (task->qos_level == QOS_ONLINE) {
            cfs_rq->nr_running_online--;
        }
        cfs_rq->curr = &task->se;
    }
    
    return task;
}

oe_core.c（调度器的实现）

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/delay.h>
#include "oe_sched.h"

MODULE_LICENSE("GPL");
MODULE_AUTHOR("User");
MODULE_DESCRIPTION("Kernel module simulation for openEuler CFS QoS");

struct oe_cfs_rq cfs_rq;
LIST_HEAD(all_tasks_list); //用于最后释放内存

//创建任务
static struct oe_task_struct *create_task(int pid, const char *name, u64 needed, int weight, 
    enum qos_level qos)
{
    struct oe_task_struct *task;
    
    task = kmalloc(sizeof(struct oe_task_struct), GFP_KERNEL);
    if (!task) 
        return NULL;

    //初始化调度实体
    task->pid = pid;
    strscpy(task->comm, name, sizeof(task->comm));
    task->qos_level = qos;
    task->needed_time = needed;
    task->exec_start = 0;
    task->sum_exec_runtime = 0;
    task->se.vruntime = 0;
    task->se.weight = weight;
    
    list_add(&task->list, &all_tasks_list);
    return task;
}

// 模块初始化
static int __init sched_sim_init(void)
{
    struct oe_task_struct *t1, *t2, *t3;
    struct oe_task_struct *curr;
    int tick = 0;
    const u64 time_slice = 10; //10ms时间片

    pr_info("OE_SCHED: Module Loaded. Starting Simulation...\n");

    //初始化运行队列
    cfs_rq.root = RB_ROOT_CACHED;
    cfs_rq.nr_running = 0;
    cfs_rq.nr_running_online = 0;
    cfs_rq.min_vruntime = 0;
    cfs_rq.curr = NULL;
    INIT_LIST_HEAD(&cfs_rq.qos_throttled_list);

    //任务A (Offline): 权重低(100)
    t1 = create_task(100, "Video_Off", 100, 100, QOS_OFFLINE);
    //任务B (Online): 权重高(1024)
    t2 = create_task(200, "Nginx_On", 30, 1024, QOS_ONLINE);
    //任务C (Online): 权重高(1024)
    t3 = create_task(201, "SSH_On", 20, 1024, QOS_ONLINE);

    //入队
    pr_info("OE_SCHED: Enqueuing tasks...\n");
    enqueue_task(&cfs_rq, t1, ENQUEUE_INITIAL);
    enqueue_task(&cfs_rq, t2, ENQUEUE_INITIAL);
    enqueue_task(&cfs_rq, t3, ENQUEUE_INITIAL);

    //模拟调度循环
    pr_info("OE_SCHED: --- Start Scheduling Loop ---\n");
    
    while (cfs_rq.nr_running > 0 || cfs_rq.curr != NULL) {
        tick++;
        
        //选核 (Pick Next)
        curr = pick_next_task(&cfs_rq);
        
        if (!curr) 
            break;

        pr_info("OE_SCHED: [Tick %d] Run: %s (QoS:%d, vruntime:%llu)\n", 
                tick, curr->comm, curr->qos_level, curr->se.vruntime);

        // C. 模拟运行 (Update Curr)
        u64 slice = (curr->sum_exec_runtime + time_slice > curr->needed_time) ? 
                    (curr->needed_time - curr->sum_exec_runtime) : time_slice;
        
        update_curr(&cfs_rq, slice);

        //判断结束或放回
        if (curr->sum_exec_runtime >= curr->needed_time) {
            pr_info("OE_SCHED: Task %s finished.\n", curr->comm);
        } else {
            // 时间片用完，放回红黑树
            enqueue_task(&cfs_rq, curr, 0);
        }
        
        // 防止死循环
        if (tick > 50) 
            break;
    }

    return 0;
}

// 模块卸载
static void __exit sched_sim_exit(void)
{
    struct oe_task_struct *task, *next;
    
    // 释放所有任务内存
    list_for_each_entry_safe(task, next, &all_tasks_list, list) {
        kfree(task);
    }
    pr_info("OE_SCHED: Module Unloaded.\n");
}

module_init(sched_sim_init);
module_exit(sched_sim_exit);

Makefile（生成内核模块）

obj-m += oe_sched_sim.o
oe_sched_sim-objs := oe_core.o oe_fair.o

KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)

all:
	$(MAKE) -C $(KDIR) M=$(PWD) modules

clean:
	$(MAKE) -C $(KDIR) M=$(PWD) clean

编译代码

make                        //编译模块
sudo insmod oe_sched_sim.ko //载入内核
sudo dmesg -w               //查看内核日志中的运行模块
sudo rmmod oe_sched_sim     //卸载模块
sudo dmesg -w               //查看卸载模块的信息

结果图片

结果分析：可以看到Video_Off虽然最先创建，但是因为是Offline，只有在Online任务运行完后才能运行，符合openEuler在线任务绝对压制离线任务的策略。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

基于openEuler 进程管理与调度实验

openEuler 进程管理与调度

一、实验目的

二、实验环境

三、实验内容与步骤

1. 基础的进程管理

1.1 进程概述

1.2 进程状态

1.3 进程控制块

1.4 进程状态

1.5 进程管理

2 基础的进程调度算法

2.1 进程调度的评价指标

2.2 先来先服务（FCFS）

2.3 短作业优先（SJF）

2.4 时间片轮转（RR）

3 openEuler的进程调度

3.1 完全公平调度器（CFS）

设计目标：

关键特性：

3.2 openEuler的优化

3.3 openEuler的单核CFS模拟

1. 链表操作 (Linked List)

2. 红黑树操作 (Red-Black Tree - Cached)

3. 内存管理 (Memory Management)

4. 通用宏与辅助工具 (Utilities)

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

基于openEuler 进程管理与调度实验

openEuler 进程管理与调度

一、 实验目的

二、 实验环境

三、 实验内容与步骤

1. 基础的进程管理

1.1 进程概述

1.2 进程状态

1.3 进程控制块

1.4 进程状态

1.5 进程管理

2 基础的进程调度算法

2.1 进程调度的评价指标

2.2 先来先服务（FCFS）

2.3 短作业优先（SJF）

2.4 时间片轮转（RR）

3 openEuler的进程调度

3.1 完全公平调度器（CFS）

设计目标：

关键特性：

3.2 openEuler的优化

3.3 openEuler的单核CFS模拟

1. 链表操作 (Linked List)

2. 红黑树操作 (Red-Black Tree - Cached)

3. 内存管理 (Memory Management)

4. 通用宏与辅助工具 (Utilities)

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品

一、实验目的

二、实验环境

三、实验内容与步骤