- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

Kubernetes 集群分布式存储插件 Rook Ceph部署

zuozewei 发表于 2021/10/03 10:57:51 2021/10/03

【摘要】 Rook 项目是一个基于 Ceph 的 Kubernetes 存储插件（后期也在加入对更多存储的支持）。不过，不同于对 Ceph 的简单封装，Rook 在自己的实现中加入了水平扩展、迁移、灾难备份、监控等大量的企业级功能，使得这个项目变成了一个高度可扩展的分布式存储解决方案，提供对象、文件和块存储。

前言

我们经常会说：容器和 Pod 是短暂的。其含义是它们的生命周期可能很短，会被频繁地销毁和创建。容器销毁时，保存在容器内部文件系统中的数据都会被清除。为了持久化保存容器的数据，可以使用存储插件在容器里挂载一个基于网络或者其他机制的远程数据卷，使得在容器里创建的文件，实际上是保存在远程存储服务器上，或者以分布式的方式保存在多个节点上，而与当前宿主机没有绑定关系。这样，无论在哪个节点上启动新的容器，都可以请求挂载指定的持久化存储卷。

由于 Kubernetes 本身的松耦合设计，绝大多数存储项目，比如 Ceph、GlusterFS、NFS 等，都可以为 Kubernetes 提供持久化存储能力。在这次的部署实践中，选择一个很重要生产级的存储插件项目：Rook。

Rook 介绍

简介

Rook 项目是一个基于 Ceph 的 Kubernetes 存储插件（后期也在加入对更多存储的支持）。不过，不同于对 Ceph 的简单封装，Rook 在自己的实现中加入了水平扩展、迁移、灾难备份、监控等大量的企业级功能，使得这个项目变成了一个高度可扩展的分布式存储解决方案，提供对象、文件和块存储。

Rook 目前支持 Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存储的搭建。

Rook 机制：

Rook 提供了卷插件，来扩展了 K8S 的存储系统，使用 Kubelet 代理程序 Pod 可以挂载 Rook 管理的块设备和文件系统。
Rook Operator 负责启动并监控整个底层存储系统，例如 Ceph Pod、Ceph OSD 等，同时它还管理 CRD、对象存储、文件系统。
Rook Agent 代理部署在 K8S 每个节点上以 Pod 容器运行，每个代理 Pod 都配置一个 Flexvolume 驱动，该驱动主要用来跟 K8S 的卷控制框架集成起来，每个节点上的相关的操作，例如添加存储设备、挂载、格式化、删除存储等操作，都有该代理来完成。

更多参考如下官网：

https://rook.io
https://ceph.com/

2、Rook 架构

Rook 部署

前期规划

准备工作

为了配置 Ceph 存储集群，至少需要以下本地存储选项之一：

原始设备（无分区或格式化的文件系统）
原始分区（无格式文件系统）
可通过 block 模式从存储类别获得 PV

可以使用以下命令确认分区或设备是格式化的文件系统：

$ lsblk -f
NAME            FSTYPE      LABEL UUID                                   MOUNTPOINT
vda
├─vda1          xfs               e16ad84e-8cef-4cb1-b19b-9105c57f97b1   /boot
├─vda2          LVM2_member       Vg3nyB-iW9Q-4xp0-LEIO-gzHc-2eax-D1razB
│ └─centos-root xfs               0bb4bfa4-b315-43ca-a789-2b43e726c10c   /
├─vda3          LVM2_member       VZMibm-DJ8e-apig-YhR3-a1dF-wHYQ-8pjKan
│ └─centos-root xfs               0bb4bfa4-b315-43ca-a789-2b43e726c10c   /
└─vda4

如果该 FSTYPE 字段不为空，则在相应设备的顶部有一个文件系统。在这种情况下，可以将 vda4 用于 Ceph。

获取 YAML

git clone --single-branch --branch master https://github.com/rook/rook.git

部署 Rook Operator

本实验使用k8s-node1、k8s-node2、k8s-node3 三个节点，因此需要如下修改：

kubectl label nodes {k8s-node1,k8s-node2,k8s-node3} ceph-osd=enabled
kubectl label nodes {k8s-node1,k8s-node2,k8s-node3} ceph-mon=enabled
kubectl label nodes k8s-node1 ceph-mgr=enabled

注意：当前版本 rook 中 mgr 只能支持一个节点运行。

执行脚本:

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml

注意：如上创建了相应的基础服务（如 serviceaccounts），同时 rook-ceph-operator 会在每个节点创建 rook-ceph-agent 和 rook-discover。

部署 cluster

配置cluster.yaml

vi cluster.yaml

修改完如下：

#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.

# For example, to create the cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f cluster.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
    # v13 is mimic, v14 is nautilus, and v15 is octopus.
    # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
    # If you want to be more precise, you can always use a timestamp tag such ceph/ceph:v14.2.5-20190917
    # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
    image: ceph/ceph:v15.2.3
    # Whether to allow unsupported versions of Ceph. Currently mimic and nautilus are supported, with the recommendation to upgrade to nautilus.
    # Octopus is the version allowed when this is set to true.
    # Do not set to true in production.
    allowUnsupported: false
  # The path on the host where configuration files will be persisted. Must be specified.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
  dataDirHostPath: /var/lib/rook
  # Whether or not upgrade should continue even if a check fails
  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
  # Use at your OWN risk
  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
  skipUpgradeChecks: false
  # Whether or not continue if PGs are not clean during an upgrade
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  # set the amount of mons to be started
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    modules:
    # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
    # are already enabled by other settings in the cluster CR and the "rook" module is always enabled.
    - name: pg_autoscaler
      enabled: true
  # enable the ceph dashboard for viewing cluster status
  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443
    # serve the dashboard using SSL
    ssl: true
  # enable prometheus alerting for cluster
  monitoring:
    # requires Prometheus to be pre-installed
    enabled: false
    # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
    # Recommended:
    # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
    # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
    # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
    rulesNamespace: rook-ceph
  network:
    # enable host networking
    #provider: host
    # EXPERIMENTAL: enable the Multus network provider
    #provider: multus
    #selectors:
      # The selector keys are required to be `public` and `cluster`.
      # Based on the configuration, the operator will do the following:
      #   1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
      #   2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
      #
      # In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
      #
      #public: public-conf --> NetworkAttachmentDefinition object name in Multus
      #cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
  # enable the crash collector for ceph daemon crash collection
  crashCollector:
    disable: false
  cleanupPolicy:
    # cleanup should only be added to the cluster when the cluster is about to be deleted.
    # After any field of the cleanup policy is set, Rook will stop configuring the cluster as if the cluster is about
    # to be destroyed in order to prevent these settings from being deployed unintentionally.
    # To signify that automatic deletion is desired, use the value "yes-really-destroy-data". Only this and an empty
    # string are valid values for this field.
    confirmation: ""

  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
  # tolerate taints with a key of 'storage-node'.

placement:
  mon:
     nodeAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
         - matchExpressions:
           - key: ceph-mon
             operator: In
             values:
             - enabled
     podAffinity:
     podAntiAffinity:
     topologySpreadConstraints:
     tolerations:
     - key: ceph-mon
       operator: Exists
  osd:
     nodeAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
         - matchExpressions:
           - key: ceph-osd
             operator: In
             values:
             - enabled
     podAffinity:
     podAntiAffinity:
     topologySpreadConstraints:
     tolerations:
     - key: ceph-osd
       operator: Exists
  mgr:
     nodeAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
         - matchExpressions:
           - key: ceph-mgr
             operator: In
             values:
             - enabled
     podAffinity:
     podAntiAffinity:
     topologySpreadConstraints:
     tolerations:
     - key: ceph-mgr
       operator: Exists

#    cleanup:
  annotations:
#    all:
#    mon:
#    osd:
#    cleanup:
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
#   mgr:
  resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
#    mgr:
#      limits:
#        cpu: "500m"
#        memory: "1024Mi"
#      requests:
#        cpu: "500m"
#        memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
#    mon:
#    osd:
#    prepareosd:
#    crashcollector:
#    cleanup:
  # The option to automatically remove OSDs that are out and are safe to destroy.
  removeOSDsIfOutAndSafeToRemove: false
#  priorityClassNames:
#    all: rook-ceph-default-priority-class
#    mon: rook-ceph-mon-priority-class
#    osd: rook-ceph-osd-priority-class
#    mgr: rook-ceph-mgr-priority-class
  storage: # cluster level storage configuration and selection
    useAllNodes: false      #关闭使用所有Node
    useAllDevices: false    #关闭使用所有设备
    deviceFilter: vda4
    config:
      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
      # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
      # osdsPerDevice: "1" # this value can be overridden at the node or device level
      # encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
  nodes:
    - name: "k8s-node1"		   	#指定存储节点主机
      devices:
      - name: "vda4"			    #指定磁盘为sdb
      config:
        storeType: bluestore
    - name: "k8s-node2"
      devices:
      - name: "vda4"
      config:
        storeType: bluestore
    - name: "k8s-node3"
      devices:
      - name: "vda4"
      config:
        storeType: bluestore
  
  # The section for configuring management of daemon disruptions during upgrade or fencing.
  disruptionManagement:
    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
    # block eviction of OSDs by default and unblock them safely when drains are detected.
    managePodBudgets: false
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
    osdMaintenanceTimeout: 30
    # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
    # Only available on OpenShift.
    manageMachineDisruptionBudgets: false
    # Namespace in which to watch for the MachineDisruptionBudgets.
    machineDisruptionBudgetNamespace: openshift-machine-api

更多 cluster 的 CRD 配置参考：

执行cluster.yaml

kubectl create -f cluster.yaml
# 查看部署 log
$ kubectl logs -f -n rook-ceph rook-ceph-operator-567d7945d6-t9rd4
# 等待一定时间，部分中间态容器可能会波动
[7d@k8s-master ceph]$ kubectl get pods -n rook-ceph -o wide
NAME                                                  READY   STATUS      RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
csi-cephfsplugin-dr4dq                                3/3     Running     0          24h   172.16.106.239   k8s-node2   <none>           <none>
csi-cephfsplugin-provisioner-6bcb7cdd75-dtzmn         5/5     
.......

注意：若部署失败

master 节点执行

kubectl delete -f ./

所有 node 节点执行如下清理操作：

rm -rf /var/lib/rook
/dev/mapper/ceph-*
dmsetup ls
dmsetup remove_all
dd if=/dev/zero of=/dev/vda4 bs=512k count=1
wipefs -af /dev/vda4

部署 Toolbox

Toolbox 是一个 Rook 的工具集容器，该容器中的命令可以用来调试、测试 Rook，对 Ceph 临时测试的操作一般在这个容器内执行。

# 启动 rook-ceph-tools pod:
$ kubectl create -f toolbox.yaml
deployment.apps/rook-ceph-tools created

# 等待 rook-ceph-tools 载其容器并进入 running 状态：
$ kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-tools-6d659f5579-knt6x   1/1     Running   0          7s

测试 Rook

$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
# 查看 Ceph 状态
[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph status
cluster:
id:     550e2978-26a6-4f3b-b101-10369ab63cf4
health: HEALTH_OK

services:
mon: 3 daemons, quorum a,b,c (age 2m)
mgr: a(active, since 85s)
osd: 3 osds: 3 up (since 114s), 3 in (since 114s)

data:
pools:   1 pools, 1 pgs
objects: 0 objects, 0 B
usage:   3.0 GiB used, 147 GiB / 150 GiB avail
pgs:     1 active+clean

[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph osd status
ID  HOST        USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
0  k8s-node2  1026M  48.9G      0        0       0        0   exists,up
1  k8s-node1  1026M  48.9G      0        0       0        0   exists,up
2  k8s-node3  1026M  48.9G      0        0       0        0   exists,up
[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    150 GiB  147 GiB  6.2 MiB   3.0 GiB       2.00
TOTAL  150 GiB  147 GiB  6.2 MiB   3.0 GiB       2.00

--- POOLS ---
POOL                   ID  STORED  OBJECTS  USED  %USED  MAX AVAIL
device_health_metrics   1     0 B        0   0 B      0     46 GiB
[root@rook-ceph-tools-6d659f5579-knt6x /]# rados df
POOL_NAME              USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS   WR  USED COMPR  UNDER COMPR
device_health_metrics   0 B        0       0       0                   0        0         0       0  0 B       0  0 B         0 B          0 B
total_objects    0
total_used       3.0 GiB
total_avail      147 GiB
total_space      150 GiB

# 查看 Ceph 所有 keyring
[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph auth ls
installed auth entries:
osd.0
key: AQAjofFe1j9pGhAABnjTXAYZeZdwo2FGHIFv+g==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: AQAjofFeY0LaHhAAMVLxrH1lqXqyYsZE9yJ5dg==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
key: AQAkofFeBjtoDBAAVYW7FursqpbttekW54u2rA==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQDhoPFeqLE4ORAAvBeGwV7p1YY25owP8nS02Q==
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: AQABofFeQclcCxAAEtA9Y4+yF3I6H9RM0/f1DQ==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
key: AQABofFeaeBcCxAA9SEnt+RV7neC4uy/xQb5qg==
caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
key: AQABofFe0/NcCxAAqCKwJpzPlav8MuajRk8xmw==
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
key: AQABofFesAZdCxAAZyWJg+Pa3F0g5Toy4LamPw==
caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
key: AQABofFejRpdCxAA/9NbTQDJILdSoYJZdol7bQ==
caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
key: AQABofFeAi5dCxAAKu67ZyM8PRRPcluTXR3YRw==
caps: [mon] allow profile bootstrap-rgw
client.crash
key: AQAcofFeLDr3KBAAw9UowFd26JiQSGjCFyhx8w==
caps: [mgr] allow profile crash
caps: [mon] allow profile crash
client.csi-cephfs-node
key: AQAcofFeFRh4DBAA7Z8kgcHGM92vHj6cvGbXXg==
caps: [mds] allow rw
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
key: AQAbofFemLuJMRAA4WlGWBjONb1av48rox1q6g==
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
key: AQAbofFepu7rFhAA+vdit2ipDgVFc/yKUpHHug==
caps: [mon] profile rbd
caps: [osd] profile rbd
client.csi-rbd-provisioner
key: AQAaofFe3Yw9OxAAiJzZ6HQne/e9Zob5G311OA==
caps: [mgr] allow rw
caps: [mon] profile rbd
caps: [osd] profile rbd
mgr.a
key: AQAdofFeh4VZHhAA8VL9gH5jgOxzjTDtEaFWBQ==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph version
ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)
[root@rook-ceph-tools-6d659f5579-knt6x /]# exit
exit

这样，一个基于 Rook 持久化存储集群就以容器的方式运行起来了，而接下来在 Kubernetes 项目上创建的所有 Pod 就能够通过 Persistent Volume（PV）和 Persistent Volume Claim（PVC）的方式，在容器里挂载由 Ceph 提供的数据卷了。而 Rook 项目，则会负责这些数据卷的生命周期管理、灾难备份等运维工作。

设置 dashboard

dashboard 是非常有用的工具，可让你大致了解 Ceph 集群的状态，包括总体运行状况，单仲裁状态，mgr，osd 和其他 Ceph 守护程序的状态，查看池和 PG 状态，显示日志用于守护程序等等。Rook 使启用仪表板变得简单。

部署 Node SVC

修改dashboard-external-https.yaml

$ vi dashboard-external-https.yaml
apiVersion: v1
kind: Service
metadata:
name: rook-ceph-mgr-dashboard-external-https
namespace: rook-ceph
labels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
spec:
ports:
- name: dashboard
port: 8443
protocol: TCP
targetPort: 8443
selector:
app: rook-ceph-mgr
rook_cluster: rook-ceph
sessionAffinity: None
type: NodePort

创建 Node SVC

$ kubectl create -f dashboard-external-https.yaml
service/rook-ceph-mgr-dashboard-external-https created
$  kubectl get svc -n rook-ceph
NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics                 ClusterIP   10.102.32.77     <none>        8080/TCP,8081/TCP   17m
csi-rbdplugin-metrics                    ClusterIP   10.101.121.5     <none>        8080/TCP,8081/TCP   17m
rook-ceph-mgr                            ClusterIP   10.99.155.138    <none>        9283/TCP            16m
rook-ceph-mgr-dashboard                  ClusterIP   10.97.61.135     <none>        8443/TCP            16m
rook-ceph-mgr-dashboard-external-https   NodePort    10.108.210.25    <none>        8443:32364/TCP      6s
rook-ceph-mon-a                          ClusterIP   10.102.116.81    <none>        6789/TCP,3300/TCP   16m
rook-ceph-mon-b                          ClusterIP   10.101.141.241   <none>        6789/TCP,3300/TCP   16m
rook-ceph-mon-c                          ClusterIP   10.101.157.247   <none>        6789/TCP,3300/TCP   16m

Rook operator 将启用 ceph-mgr dashboard 模块。将创建一个服务对象以在 Kubernetes 集群中公开该端口。Rook 将启用端口 8443 进行 https 访问。

确认验证

要检索生成的密码，可以运行以下命令：

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Ceph 块存储应用

创建StorageClass

在提供（Provisioning）块存储之前，需要先创建StorageClass 和存储池。K8S 需要这两类资源，才能和Rook 交互，进而分配持久卷（PV）。

cd rook/cluster/examples/kubernetes/ceph/csi/rbd

kubectl create -f csi/rbd/storageclass.yaml

解读：如下配置文件中会创建一个名为 replicapool 的存储池，和rook-ceph-block的 storageClass。

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
# Disallow setting pool with replica 1, this could lead to data loss without recovery.
# Make sure you're *ABSOLUTELY CERTAIN* that is what you want
requireSafeReplicaSize: true
# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
#targetSizeRatio: .5
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
# If you change this namespace, also change the namespace below where the secret namespaces are defined
clusterID: rook-ceph

# If you want to use erasure coded pool with RBD, you need to create
# two pools. one erasure coded and one replicated.
# You need to specify the replicated pool here in the `pool` parameter, it is
# used for the metadata of the images.
# The erasure coded pool must be set as the `dataPool` parameter below.
#dataPool: ec-data-pool
pool: replicapool

# RBD image format. Defaults to "2".
imageFormat: "2"

# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering

# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: ext4
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete

$ kubectl get storageclasses.storage.k8s.io
NAME              PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   44m

创建PVC

$ kubectl create -f pvc.yaml

$ kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-b2b7ce1d-7cad-4b7b-afac-dcf6cd597e88   1Gi        RWO            rook-ceph-block   45m

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS      REASON   AGE
pvc-b2b7ce1d-7cad-4b7b-afac-dcf6cd597e88   1Gi        RWO            Delete           Bound    default/rbd-pvc   rook-ceph-block            45m

解读：如上创建相应的PVC，storageClassName:为基于 rook Ceph 集群的 rook-ceph-block。
pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block

消费块设备

$ kubectl create -f rookpod01.yaml
$ kubectl get pods
NAME                     READY   STATUS      RESTARTS   AGE
rookpod01                0/1     Completed   0          4m1s

解读：创建如上 Pod，并挂载之前所创建的PVC，等待执行完毕

rookpod01.yaml：

apiVersion: v1
kind: Pod
metadata:
name: rookpod01
spec:
restartPolicy: OnFailure
containers:
- name: test-container
image: busybox
volumeMounts:
- name: block-pvc
mountPath: /var/test
command: ['sh', '-c', 'echo "Hello World" > /var/test/data; exit 0']
volumes:
- name: block-pvc
persistentVolumeClaim:
claimName: rbd-pvc
readOnly: false

测试持久性

# 删除rookpod01
$ kubectl delete -f rookpod01.yaml
pod "rookpod01" deleted

# 创建rookpod02
$ kubectl create  -f rookpod02.yaml
pod/rookpod02 created

$ kubectl get pods
NAME                     READY   STATUS      RESTARTS   AGE
rookpod02                0/1     Completed   0          59s

$ kubectl logs rookpod02
Hello World

解读：创建 rookpod02，并使用所创建的 PVC，测试持久性。

rookpod02.yaml：

apiVersion: v1
kind: Pod
metadata:
name: rookpod02
spec:
restartPolicy: OnFailure
containers:
- name: test-container
image: busybox
volumeMounts:
- name: block-pvc
mountPath: /var/test
command: ['sh', '-c', 'cat /var/test/data; exit 0']
volumes:
- name: block-pvc
persistentVolumeClaim:
claimName: rbd-pvc
readOnly: false

遇到问题

dashboard 点击概述`500 Internal Server Error`

解决办法：

创建内置管理员角色的新副本，从该角色中删除 iscsi 权限，然后将此新角色分配给管理员。大概是在上游解决此问题后，可以删除新角色并将管理员角色重新分配给 admin 用户。

ceph dashboard ac-role-create admin-no-iscsi
for scope in dashboard-settings log rgw prometheus grafana nfs-ganesha manager hosts rbd-image config-opt rbd-mirroring cephfs user osd pool monitor; do
ceph dashboard ac-role-add-scope-perms admin-no-iscsi ${scope} create delete read update;
done
ceph dashboard ac-user-set-roles admin admin-no-iscsi

[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph dashboard ac-role-create admin-no-iscsi
{"name": "admin-no-iscsi", "description": null, "scopes_permissions": {}}
[root@rook-ceph-tools-6d659f5579-knt6x /]# for scope in dashboard-settings log rgw prometheus grafana nfs-ganesha manager hosts rbd-image config-opt rbd-mirroring cephfs user osd pool monitor; do
>     ceph dashboard ac-role-add-scope-perms admin-no-iscsi ${scope} create delete read update;
[root@rook-ceph-tools-6d659f5579-knt6x /]# ceph dashboard ac-user-set-roles admin admin-no-iscsi

源码地址：

https://github.com/zuozewei/blog-example/tree/master/Kubernetes/k8s-rook-ceph

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

Kubernetes 集群分布式存储插件 Rook Ceph部署

前言

Rook 介绍

简介

2、Rook 架构

Rook 部署

前期规划

准备工作

获取 YAML

部署 Rook Operator

部署 cluster

部署 Toolbox

测试 Rook

设置 dashboard

部署 Node SVC

确认验证

Ceph 块存储应用

创建StorageClass

创建PVC

消费块设备

测试持久性

遇到问题

dashboard 点击概述`500 Internal Server Error`

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

Kubernetes 集群分布式存储插件 Rook Ceph部署

前言

Rook 介绍

简介

2、Rook 架构

Rook 部署

前期规划

准备工作

获取 YAML

部署 Rook Operator

部署 cluster

部署 Toolbox

测试 Rook

设置 dashboard

部署 Node SVC

确认验证

Ceph 块存储应用

创建StorageClass

创建PVC

消费块设备

测试持久性

遇到问题

dashboard 点击概述500 Internal Server Error

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品

dashboard 点击概述`500 Internal Server Error`