客快物流大数据项目(九十二):ClickHouse的MergeTree系列引擎介绍和MergeTree深入了解

举报
Lansonli 发表于 2022/12/22 13:57:46 2022/12/22
【摘要】 ClickHouse的MergeTree系列引擎介绍和MergeTree深入了解一、MergeTree系列引擎介绍MergeTree(合并树)系列引擎是ClickHouse中最强大的表引擎,是官方主推的存储引擎,几乎支持ClickHouse所有的核心功能。该系列引擎主要用于海量数据分析的场景,支持对表数据进行分区、复制、采样、存储有序、主键索引、稀疏索引和数据TTL等特性。MergeTree...

ClickHouse的MergeTree系列引擎介绍和MergeTree深入了解

一、MergeTree系列引擎介绍

MergeTree(合并树)系列引擎是ClickHouse中最强大的表引擎,是官方主推的存储引擎,几乎支持ClickHouse所有的核心功能。

该系列引擎主要用于海量数据分析的场景,支持对表数据进行分区、复制、采样、存储有序、主键索引、稀疏索引和数据TTL等特性。

MergeTree系列引擎的基本理念是当有大量数据要插入到表中时,需要高效地一批一批的写入数据片段,并希望这些数据片段在后台按照一定规则合并,这种方法比插入期间连续重写存储中的数据效率更高

简而言之就是具有批量数据快速插入和后台并发处理的优势。

MergeTree系列引擎支持ClickHouse所有的SQL语法,但还是有一些SQL语法和MySQL并不太一样。

MergeTree系列引擎包括:

  • ReplacingMergeTree
  • SummingMergeTree
  • AggregatingMergeTree
  • CollapsingMergeTree
  • VersionedCollapsingMergeTree

二、​​​​​​​MergeTree深入了解

MergeTree引擎的表的允许插入主键重复的数据,主键主要作用是生成主键索引来提升查询效率,而不是用来保持记录主键唯一

1、创建MergeTree表的说明

创建MergeTree引擎表的语法

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...]
[SETTINGS name=value, ...]


子句说明

使用方式

ENGINE

ENGINE = MergeTree() --说明:该引擎不需要参数。

PARTITION BY 字段名称

PARTITION by to YYYYMM(cdt)

ORDER BY 字段名称(可以是元组)

ORDER BY cdt或ORDER BY (age,gender)

PRIMARY KEY 字段名称

PRIMARY KEY age

SAMPLE BY 字段名称

SAMPLE BY intHash64(userId)

TTL Date字段或DateTime字段

TTL cdt + INTERVAL 1 DAY

SETTINGS param=value...

SETTINGS index_granularity=8192 说明:索引粒度。即索引中相邻”标记”间的数据行数。设为 8192 可以适用大部分场景。

SETTINGS index_granularity_bytes= 说明:设置数据粒度的最大大小(单位/字节),默认10MB。从大行(数十和数百MB)的表中select数据时,此设置可提高ClickHouse的提高select性能。

enable_mixed_granularity_parts 说明:启用或禁用过渡。

use_minimalistic_part_header_in_zookeeper 说明:在ZK中存储数据部分标题,0是关闭,1是存储的少量数据。

min_merge_bytes_to_use_direct_io 说明:使用对存储磁盘的直接I / O访问所需的最小合并操作数据量。合并数据部分时,ClickHouse会计算要合并的所有数据的总存储量。如果卷超过min_merge_bytes_to_use_direct_io字节,ClickHouse将使用直接I/O接口(O_DIRECT选项)读取数据并将数据写入存储磁盘。如果为min_merge_bytes_to_use_direct_io = 0,则直接I / O被禁用。 默认值:10 * 1024 * 1024 * 1024字节。

merge_with_ttl_timeout 说明:与TTL合并之前的最小延迟(单位/秒),默认86400。

write_final_mark 说明:启用或禁用在数据部分末尾写入最终索引标记,默认1。建议不关闭此设置。

storage_policy 说明:存储策略。

2、创建MergeTree引擎的表

创建MergeTree引擎表有两种方式,一种是集群表,一种是本地表

创建使用MergeTree引擎的集群表test.tbl_testmergetree_users_all,集群表一般都携带_all后缀,而且必须所有节点都存在test数据库,这样所有节点的test库中都有tbl_testmergetree_users_all表。

CREATE TABLE test.tbl_test_mergetree_users_all ON cluster 'ch_cluster'(
    id UInt64, 
    email String, 
    username String, 
    gender UInt8, 
    birthday Date, 
    mobile FixedString(13), 
    pwd String, 
    regDT DateTime, 
    lastLoginDT DateTime, 
    lastLoginIP String
) ENGINE=MergeTree() partition by toYYYYMMDD(regDT) order by id settings index_granularity=8192;


创建使用MergeTree引擎的本地表test.tbl_test_mergetree_users

CREATE TABLE tbl_test_mergetree_users(
    id UInt64, 
    email String, 
    username String, 
    gender UInt8, 
    birthday DATE, 
    mobile FixedString(13), 
    pwd String, 
    regDT DateTime, 
    lastLoginDT DateTime, 
    lastLoginIP String
) ENGINE=MergeTree() partition by toYYYYMMDD(regDT) order by id settings index_granularity=8192;
  • 插入数据到MergeTree引擎的表

2.1、测试数据集

/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* values (1,'wcfr817e@yeah.net','督咏',2,'1992-05-31','13306834911','7f930f90eb6604e837db06908cc95149','2008-08-06 11:48:12','2022-05-08 10:51:41','106.83.54.165'),(2,'xuwcbev9y@ask.com','上磊',1,'1983-10-11','15302753472','7f930f90eb6604e837db06908cc95149','2008-08-10 05:37:32','2020-07-28 23:43:04','121.77.119.233'),(3,'mgaqfew@126.com','涂康',1,'1970-11-22','15200570030','96802a851b4a7295fb09122b9aa79c18','2008-08-10 11:37:55','2019-07-22 23:45:47','171.12.206.122'),(4,'b7zthcdg@163.net','金俊振',1,'2002-02-10','15207308903','96802a851b4a7295fb09122b9aa79c18','2008-08-10 14:47:09','2016-12-26 15:55:02','61.235.143.92'),(5,'ezrvy0p@163.net','阴福',1,'1987-09-01','13005861359','96802a851b4a7295fb09122b9aa79c18','2008-08-12 21:58:11','2015-12-26 15:52:33','182.81.200.32'),(6,'juestiua@263.net','岑山裕',1,'1996-02-12','13008315314','96802a851b4a7295fb09122b9aa79c18','2008-08-14 05:48:16','2014-12-26 15:49:12','36.59.3.248'),(7,'oyyrzd@yahoo.com.cn','姚茗咏',2,'1977-02-06','13303203846','96e79218965eb72c92a549dd5a330112','2008-08-15 10:07:31','2014-12-26 15:55:05','106.91.23.177'),(8,'lhrwkwwf@163.com','巫红影',2,'1996-02-21','15107523748','96802a851b4a7295fb09122b9aa79c18','2008-08-15 14:37:41','2013-12-26 15:55:05','123.234.85.27'),(9,'v2zqak8kh@0355.net','空进',1,'1974-01-16','13903178080','96802a851b4a7295fb09122b9aa79c18','2008-08-16 03:24:44','2013-12-26 15:52:42','121.77.192.123'),(10,'mqqqmf@yahoo.com','西香',2,'1980-10-13','13108330898','96802a851b4a7295fb09122b9aa79c18','2008-08-16 04:42:08','2013-12-26 15:55:08','36.57.21.234'),(11,'sf8oubu@yahoo.com.cn','壤晶媛',2,'1976-03-05','15202615557','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:08:51','2013-12-26 15:55:03','182.83.220.201'),(12,'k6k21ce@qq.com','东平',1,'2005-04-25','13603648382','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:18:20','2013-12-26 15:55:05','210.34.111.155'),(13,'zguxgg@qq.com','夹影悦',2,'2002-08-23','15300056290','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:57:45','2013-12-26 15:55:02','61.232.211.180'),(14,'g2jqhbzrf@aol.com','生晓怡',2,'1974-06-22','13507515420','96802a851b4a7295fb09122b9aa79c18','2008-08-16 08:23:43','2013-12-26 15:55:02','182.86.5.162'),(15,'1evn3spn@126.com','咎梦',2,'1998-04-14','15204567060','060ed80051e6384b77ddfaa26191778b','2008-08-16 08:29:57','2013-12-26 15:55:02','210.30.171.70'),(16,'tqndz6l@googlemail.com','司韵',2,'1992-08-28','15202706709','96802a851b4a7295fb09122b9aa79c18','2008-08-16 14:34:25','2013-12-26 15:55:03','171.10.115.59'),(17,'3472gs22x@live.com','李言',1,'1997-09-08','15701526600','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:04:07','2013-12-26 15:52:39','171.14.80.71'),(18,'p385ii@gmail.com','詹芸',2,'2004-11-05','15001974846','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:26:06','2013-12-26 15:55:02','182.89.147.245'),(19,'mfbncfu@yahoo.com','蒙芬霞',2,'1990-09-10','15505788156','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:30:58','2013-12-26 15:55:05','182.91.15.89'),(20,'l24ffbn@ask.com','后冠',1,'2000-09-02','15608241150','96802a851b4a7295fb09122b9aa79c18','2008-08-17 01:15:55','2014-08-29 00:51:12','36.58.7.85'),(21,'m26lggpe@qq.com','宋美月',2,'2003-01-13','15606561425','96802a851b4a7295fb09122b9aa79c18','2008-08-17 01:24:09','2013-12-26 15:52:36','123.235.233.160'),(22,'ndmfm13qf@0355.net','邬玲',2,'2002-08-11','13207844859','96802a851b4a7295fb09122b9aa79c18','2008-08-17 03:31:11','2013-12-26 15:55:03','36.60.8.4'),(23,'5shmvnd@sina.com','乐心有',1,'1998-05-01','13201212693','96802a851b4a7295fb09122b9aa79c18','2008-08-17 03:33:41','2013-12-26 15:55:02','123.234.184.210'),(24,'pwa0hu@3721.net','任学诚',1,'1978-03-19','15802040152','7f930f90eb6604e837db06908cc95149','2008-08-17 07:24:01','2013-12-26 15:52:34','210.43.167.14'),(25,'1ybjhul@googlemail.com','巫纨',2,'1995-01-20','15900530105','96802a851b4a7295fb09122b9aa79c18','2008-08-17 07:48:06','2013-12-26 15:55:02','222.55.139.104'),(26,'b31me2i8b@yeah.net','石娅',2,'2000-02-25','13908735198','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:17:24','2013-12-26 15:52:45','123.235.99.123'),(27,'qgb2w4n7@163.net','柏菁',2,'1975-02-09','15306552661','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:47:39','2013-12-26 15:55:02','121.77.245.202'),(28,'cfb3ck@sohu.com','鲜梦',2,'1974-01-26','13801751668','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:55:47','2013-12-26 15:55:02','210.26.163.24'),(29,'nfrf6mp@msn.com','鄂珍',2,'1974-04-14','13300247433','96802a851b4a7295fb09122b9aa79c18','2008-08-17 09:02:14','2013-12-26 15:55:08','210.31.214.157'),(30,'o1isumh@126.com',' 法姬',2,'1978-06-16','15607848127','96802a851b4a7295fb09122b9aa79c18','2008-08-17 09:09:59','2013-12-26 15:55:08','222.24.34.19'),(31,'y2wrclkq@msn.com','太以',1,'1998-09-07','13608585923','96802a851b4a7295fb09122b9aa79c18','2008-08-17 11:35:39','2013-12-26 15:52:35','182.89.218.177'),(32,'fv9avnuo@263.net','庚姣欣',2,'1982-09-14','13004625187','96802a851b4a7295fb09122b9aa79c18','2008-08-17 12:50:36','2013-12-26 15:55:02','106.82.225.130'),(33,'o1e96z@yahoo.com','微伟',1,'1981-07-30','13707663880','96802a851b4a7295fb09122b9aa79c18','2008-08-17 15:12:05','2013-12-26 15:49:12','171.13.152.247'),(34,'cm3oz64ja@msn.com','那竹娜',2,'1989-01-09','13607294767','96802a851b4a7295fb09122b9aa79c18','2008-08-17 15:51:08','2013-12-26 15:55:02','171.13.110.67'),(35,'g7impl@msn.com','闾和栋',1,'1994-10-12','13907368366','96802a851b4a7295fb09122b9aa79c18','2008-08-17 16:51:02','2013-12-26 15:55:01','210.28.163.83'),(36,'jz2fjtt@163.com','夏佳悦',2,'2001-03-17','15102554998','7af1b63f0d1f37c35c1274339c12b6a8','1970-01-01 08:00:00','1970-01-01 08:00:00','222.91.138.221'),(37,'klwrtomws@yahoo.com','南义梁',1,'1981-03-19','15105745902','96802a851b4a7295fb09122b9aa79c18','2008-08-18 01:49:17','2013-12-26 15:55:01','36.62.155.17'),(38,'yhzs1nnlk@3721.net','牧元',1,'2001-06-07','13501780163','96802a851b4a7295fb09122b9aa79c18','2008-08-18 04:24:52','2013-12-26 15:55:01','171.15.184.130'),(39,'hem76ot33@gmail.com','凌伟文',1,'1988-03-04','13201417535','96802a851b4a7295fb09122b9aa79c18','2008-08-18 05:34:52','2013-12-26 15:55:14','61.237.105.3'),(40,'ndp40j@sohu.com','弘枝',2,'2000-09-05','13001236425','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:23:48','2013-12-26 15:55:01','106.82.172.45'),(41,'zeyacpr@gmail.com','台凡',2,'1998-05-26','15102913418','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:41:24','2013-12-26 15:55:07','123.233.69.218'),(42,'0ts0wiz@aol.com','任晓红',2,'1984-05-02','13502366778','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:55:16','2013-12-26 15:55:01','210.26.44.18'),(43,'zi7dhzo@googlemail.com','蔡艺艳',2,'1990-08-07','15603954810','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:57:30','2013-12-26 15:55:01','171.12.171.179'),(44,'b0yfzilu@hotmail.com','郎诚',1,'1994-05-18','13602127171','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:02:04','2013-12-26 15:55:02','171.8.22.92'),(45,'er9az5e9s@163.com','台翰',1,'1994-06-22','15900953220','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:05:08','2013-12-26 15:55:14','222.31.141.156'),(46,'e34jy2@yeah.net','彭筠',2,'1983-08-12','15106915420','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:09:37','2013-12-26 15:52:34','36.60.51.67'),(47,'1u0zc56h@163.net','包华婉',2,'1998-10-03','13102518450','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:47:24','2013-12-26 15:55:02','121.76.76.105'),(48,'cs8kyk3@ask.com','淳盛',1,'2002-06-19','13203151569','96802a851b4a7295fb09122b9aa79c18','2008-08-18 08:01:58','2013-12-26 15:55:02','36.60.76.111'),(49,'ibcgi5ll@yahoo.com','车珍枫',2,'1975-07-27','15605361319','96802a851b4a7295fb09122b9aa79c18','2008-08-18 08:12:45','2013-12-26 15:55:01','106.83.110.76'),(50,'gzxcx6vz@live.com','应冰红',2,'2004-04-19','15104154370','96802a851b4a7295fb09122b9aa79c18','2008-08-18 09:00:09','2013-12-26 15:55:01','182.88.181.216');
*/

2.2、插入数据到集群表test.tbl_test_mergetree_users_all

使用SQL语句

insert into test.tbl_test_mergetree_users_all(id, email, username, gender, birthday, mobile, pwd, regDT, lastLoginDT, lastLoginIP)

追加上面的values语句进行数据插入。

查询表中全量数据。

50条数据是按照分区字段regDT划分片段,然后每个段中是根据排序字段id默认使用ASC进行排序的。

查询符合email模糊匹配gmail并且id>30并且分区是20080818条件的数据。

这个表的物理存储是在我们设置的数据路径/export/data/clickhouse中,该路径下的/data/data下是ClickHouse节点中维护的数据库,对应下图中的default、system和test三个文件夹。

然后在test文件夹下,有一个tbl_test_mergetree_users_all文件夹(是我们自己创建的表),该文件夹中有很多日期类型的文件夹(我们创建表时指定的分区字段的值),在此文件夹下有很多具体的数据文件。

这些数据文件中,columns.txt表示tbl_test_mergetree_users_all表20080816_29_29_0分区下所有数据的列信息,如图:

这些数据文件中,count.txt表示tbl_test_mergetree_users_all表20080816_29_29_0分区下所有数据的总条数,如图:

这些数据文件中的*.bin和*.mrk2指的是列字段的数据信息,*.idx指的是索引字段信息。

2.3、插入数据到本地表test.tbl_test_mergetree_users

使用SQL语句

insert into tbl_test_mergetree_users(id, email, username, gender, birthday, mobile, pwd, regDT, lastLoginDT, lastLoginIP)

追加上面的values语句进行数据插入。

查询表中全量数据(与_all表数据一样)

这个本地表的物理存储与上面的集群表的存储路径和文件类似。

进入本地路径:cd /var/lib/clickhouse/data/default/tbl_test_mergetree_users

​编辑随便进入一个目录:cd 20080818_9_9_0

*.bin是按列保存数据的文件 *.mrk保存块偏移量 primary.idx保存主键索引

3、删除MergeTree引擎的表

3.1、删除集群中所有节点的tbl_test_mergetree_users_all表

drop table test.tbl_test_mergetree_users_all on cluster 'ch_cluster';

3.2、删除tbl_test_mergetree_users本地表

drop table tbl_test_mergetree_users;
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。