客快物流大数据项目(九十二):ClickHouse的MergeTree系列引擎介绍和MergeTree深入了解
ClickHouse的MergeTree系列引擎介绍和MergeTree深入了解
一、MergeTree系列引擎介绍
MergeTree(合并树)系列引擎是ClickHouse中最强大的表引擎,是官方主推的存储引擎,几乎支持ClickHouse所有的核心功能。
该系列引擎主要用于海量数据分析的场景,支持对表数据进行分区、复制、采样、存储有序、主键索引、稀疏索引和数据TTL等特性。
MergeTree系列引擎的基本理念是当有大量数据要插入到表中时,需要高效地一批一批的写入数据片段,并希望这些数据片段在后台按照一定规则合并,这种方法比插入期间连续重写存储中的数据效率更高。
简而言之就是具有批量数据快速插入和后台并发处理的优势。
MergeTree系列引擎支持ClickHouse所有的SQL语法,但还是有一些SQL语法和MySQL并不太一样。
MergeTree系列引擎包括:
- ReplacingMergeTree
- SummingMergeTree
- AggregatingMergeTree
- CollapsingMergeTree
- VersionedCollapsingMergeTree
二、MergeTree深入了解
MergeTree引擎的表的允许插入主键重复的数据,主键主要作用是生成主键索引来提升查询效率,而不是用来保持记录主键唯一
1、创建MergeTree表的说明
创建MergeTree引擎表的语法
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...]
[SETTINGS name=value, ...]
子句说明 |
使用方式 |
---|---|
ENGINE |
ENGINE = MergeTree() --说明:该引擎不需要参数。 |
PARTITION BY 字段名称 |
PARTITION by to YYYYMM(cdt) |
ORDER BY 字段名称(可以是元组) |
ORDER BY cdt或ORDER BY (age,gender) |
PRIMARY KEY 字段名称 |
PRIMARY KEY age |
SAMPLE BY 字段名称 |
SAMPLE BY intHash64(userId) |
TTL Date字段或DateTime字段 |
TTL cdt + INTERVAL 1 DAY |
SETTINGS param=value... |
SETTINGS index_granularity=8192 说明:索引粒度。即索引中相邻”标记”间的数据行数。设为 8192 可以适用大部分场景。 |
SETTINGS index_granularity_bytes= 说明:设置数据粒度的最大大小(单位/字节),默认10MB。从大行(数十和数百MB)的表中select数据时,此设置可提高ClickHouse的提高select性能。 |
|
enable_mixed_granularity_parts 说明:启用或禁用过渡。 |
|
use_minimalistic_part_header_in_zookeeper 说明:在ZK中存储数据部分标题,0是关闭,1是存储的少量数据。 |
|
min_merge_bytes_to_use_direct_io 说明:使用对存储磁盘的直接I / O访问所需的最小合并操作数据量。合并数据部分时,ClickHouse会计算要合并的所有数据的总存储量。如果卷超过min_merge_bytes_to_use_direct_io字节,ClickHouse将使用直接I/O接口(O_DIRECT选项)读取数据并将数据写入存储磁盘。如果为min_merge_bytes_to_use_direct_io = 0,则直接I / O被禁用。 默认值:10 * 1024 * 1024 * 1024字节。 |
|
merge_with_ttl_timeout 说明:与TTL合并之前的最小延迟(单位/秒),默认86400。 |
|
write_final_mark 说明:启用或禁用在数据部分末尾写入最终索引标记,默认1。建议不关闭此设置。 |
|
storage_policy 说明:存储策略。 |
2、创建MergeTree引擎的表
创建MergeTree引擎表有两种方式,一种是集群表,一种是本地表。
创建使用MergeTree引擎的集群表test.tbl_testmergetree_users_all,集群表一般都携带_all后缀,而且必须所有节点都存在test数据库,这样所有节点的test库中都有tbl_testmergetree_users_all表。
CREATE TABLE test.tbl_test_mergetree_users_all ON cluster 'ch_cluster'(
id UInt64,
email String,
username String,
gender UInt8,
birthday Date,
mobile FixedString(13),
pwd String,
regDT DateTime,
lastLoginDT DateTime,
lastLoginIP String
) ENGINE=MergeTree() partition by toYYYYMMDD(regDT) order by id settings index_granularity=8192;
创建使用MergeTree引擎的本地表test.tbl_test_mergetree_users
CREATE TABLE tbl_test_mergetree_users(
id UInt64,
email String,
username String,
gender UInt8,
birthday DATE,
mobile FixedString(13),
pwd String,
regDT DateTime,
lastLoginDT DateTime,
lastLoginIP String
) ENGINE=MergeTree() partition by toYYYYMMDD(regDT) order by id settings index_granularity=8192;
- 插入数据到MergeTree引擎的表
2.1、测试数据集
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释
* values (1,'wcfr817e@yeah.net','督咏',2,'1992-05-31','13306834911','7f930f90eb6604e837db06908cc95149','2008-08-06 11:48:12','2022-05-08 10:51:41','106.83.54.165'),(2,'xuwcbev9y@ask.com','上磊',1,'1983-10-11','15302753472','7f930f90eb6604e837db06908cc95149','2008-08-10 05:37:32','2020-07-28 23:43:04','121.77.119.233'),(3,'mgaqfew@126.com','涂康',1,'1970-11-22','15200570030','96802a851b4a7295fb09122b9aa79c18','2008-08-10 11:37:55','2019-07-22 23:45:47','171.12.206.122'),(4,'b7zthcdg@163.net','金俊振',1,'2002-02-10','15207308903','96802a851b4a7295fb09122b9aa79c18','2008-08-10 14:47:09','2016-12-26 15:55:02','61.235.143.92'),(5,'ezrvy0p@163.net','阴福',1,'1987-09-01','13005861359','96802a851b4a7295fb09122b9aa79c18','2008-08-12 21:58:11','2015-12-26 15:52:33','182.81.200.32'),(6,'juestiua@263.net','岑山裕',1,'1996-02-12','13008315314','96802a851b4a7295fb09122b9aa79c18','2008-08-14 05:48:16','2014-12-26 15:49:12','36.59.3.248'),(7,'oyyrzd@yahoo.com.cn','姚茗咏',2,'1977-02-06','13303203846','96e79218965eb72c92a549dd5a330112','2008-08-15 10:07:31','2014-12-26 15:55:05','106.91.23.177'),(8,'lhrwkwwf@163.com','巫红影',2,'1996-02-21','15107523748','96802a851b4a7295fb09122b9aa79c18','2008-08-15 14:37:41','2013-12-26 15:55:05','123.234.85.27'),(9,'v2zqak8kh@0355.net','空进',1,'1974-01-16','13903178080','96802a851b4a7295fb09122b9aa79c18','2008-08-16 03:24:44','2013-12-26 15:52:42','121.77.192.123'),(10,'mqqqmf@yahoo.com','西香',2,'1980-10-13','13108330898','96802a851b4a7295fb09122b9aa79c18','2008-08-16 04:42:08','2013-12-26 15:55:08','36.57.21.234'),(11,'sf8oubu@yahoo.com.cn','壤晶媛',2,'1976-03-05','15202615557','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:08:51','2013-12-26 15:55:03','182.83.220.201'),(12,'k6k21ce@qq.com','东平',1,'2005-04-25','13603648382','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:18:20','2013-12-26 15:55:05','210.34.111.155'),(13,'zguxgg@qq.com','夹影悦',2,'2002-08-23','15300056290','96802a851b4a7295fb09122b9aa79c18','2008-08-16 06:57:45','2013-12-26 15:55:02','61.232.211.180'),(14,'g2jqhbzrf@aol.com','生晓怡',2,'1974-06-22','13507515420','96802a851b4a7295fb09122b9aa79c18','2008-08-16 08:23:43','2013-12-26 15:55:02','182.86.5.162'),(15,'1evn3spn@126.com','咎梦',2,'1998-04-14','15204567060','060ed80051e6384b77ddfaa26191778b','2008-08-16 08:29:57','2013-12-26 15:55:02','210.30.171.70'),(16,'tqndz6l@googlemail.com','司韵',2,'1992-08-28','15202706709','96802a851b4a7295fb09122b9aa79c18','2008-08-16 14:34:25','2013-12-26 15:55:03','171.10.115.59'),(17,'3472gs22x@live.com','李言',1,'1997-09-08','15701526600','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:04:07','2013-12-26 15:52:39','171.14.80.71'),(18,'p385ii@gmail.com','詹芸',2,'2004-11-05','15001974846','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:26:06','2013-12-26 15:55:02','182.89.147.245'),(19,'mfbncfu@yahoo.com','蒙芬霞',2,'1990-09-10','15505788156','96802a851b4a7295fb09122b9aa79c18','2008-08-16 15:30:58','2013-12-26 15:55:05','182.91.15.89'),(20,'l24ffbn@ask.com','后冠',1,'2000-09-02','15608241150','96802a851b4a7295fb09122b9aa79c18','2008-08-17 01:15:55','2014-08-29 00:51:12','36.58.7.85'),(21,'m26lggpe@qq.com','宋美月',2,'2003-01-13','15606561425','96802a851b4a7295fb09122b9aa79c18','2008-08-17 01:24:09','2013-12-26 15:52:36','123.235.233.160'),(22,'ndmfm13qf@0355.net','邬玲',2,'2002-08-11','13207844859','96802a851b4a7295fb09122b9aa79c18','2008-08-17 03:31:11','2013-12-26 15:55:03','36.60.8.4'),(23,'5shmvnd@sina.com','乐心有',1,'1998-05-01','13201212693','96802a851b4a7295fb09122b9aa79c18','2008-08-17 03:33:41','2013-12-26 15:55:02','123.234.184.210'),(24,'pwa0hu@3721.net','任学诚',1,'1978-03-19','15802040152','7f930f90eb6604e837db06908cc95149','2008-08-17 07:24:01','2013-12-26 15:52:34','210.43.167.14'),(25,'1ybjhul@googlemail.com','巫纨',2,'1995-01-20','15900530105','96802a851b4a7295fb09122b9aa79c18','2008-08-17 07:48:06','2013-12-26 15:55:02','222.55.139.104'),(26,'b31me2i8b@yeah.net','石娅',2,'2000-02-25','13908735198','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:17:24','2013-12-26 15:52:45','123.235.99.123'),(27,'qgb2w4n7@163.net','柏菁',2,'1975-02-09','15306552661','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:47:39','2013-12-26 15:55:02','121.77.245.202'),(28,'cfb3ck@sohu.com','鲜梦',2,'1974-01-26','13801751668','96802a851b4a7295fb09122b9aa79c18','2008-08-17 08:55:47','2013-12-26 15:55:02','210.26.163.24'),(29,'nfrf6mp@msn.com','鄂珍',2,'1974-04-14','13300247433','96802a851b4a7295fb09122b9aa79c18','2008-08-17 09:02:14','2013-12-26 15:55:08','210.31.214.157'),(30,'o1isumh@126.com',' 法姬',2,'1978-06-16','15607848127','96802a851b4a7295fb09122b9aa79c18','2008-08-17 09:09:59','2013-12-26 15:55:08','222.24.34.19'),(31,'y2wrclkq@msn.com','太以',1,'1998-09-07','13608585923','96802a851b4a7295fb09122b9aa79c18','2008-08-17 11:35:39','2013-12-26 15:52:35','182.89.218.177'),(32,'fv9avnuo@263.net','庚姣欣',2,'1982-09-14','13004625187','96802a851b4a7295fb09122b9aa79c18','2008-08-17 12:50:36','2013-12-26 15:55:02','106.82.225.130'),(33,'o1e96z@yahoo.com','微伟',1,'1981-07-30','13707663880','96802a851b4a7295fb09122b9aa79c18','2008-08-17 15:12:05','2013-12-26 15:49:12','171.13.152.247'),(34,'cm3oz64ja@msn.com','那竹娜',2,'1989-01-09','13607294767','96802a851b4a7295fb09122b9aa79c18','2008-08-17 15:51:08','2013-12-26 15:55:02','171.13.110.67'),(35,'g7impl@msn.com','闾和栋',1,'1994-10-12','13907368366','96802a851b4a7295fb09122b9aa79c18','2008-08-17 16:51:02','2013-12-26 15:55:01','210.28.163.83'),(36,'jz2fjtt@163.com','夏佳悦',2,'2001-03-17','15102554998','7af1b63f0d1f37c35c1274339c12b6a8','1970-01-01 08:00:00','1970-01-01 08:00:00','222.91.138.221'),(37,'klwrtomws@yahoo.com','南义梁',1,'1981-03-19','15105745902','96802a851b4a7295fb09122b9aa79c18','2008-08-18 01:49:17','2013-12-26 15:55:01','36.62.155.17'),(38,'yhzs1nnlk@3721.net','牧元',1,'2001-06-07','13501780163','96802a851b4a7295fb09122b9aa79c18','2008-08-18 04:24:52','2013-12-26 15:55:01','171.15.184.130'),(39,'hem76ot33@gmail.com','凌伟文',1,'1988-03-04','13201417535','96802a851b4a7295fb09122b9aa79c18','2008-08-18 05:34:52','2013-12-26 15:55:14','61.237.105.3'),(40,'ndp40j@sohu.com','弘枝',2,'2000-09-05','13001236425','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:23:48','2013-12-26 15:55:01','106.82.172.45'),(41,'zeyacpr@gmail.com','台凡',2,'1998-05-26','15102913418','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:41:24','2013-12-26 15:55:07','123.233.69.218'),(42,'0ts0wiz@aol.com','任晓红',2,'1984-05-02','13502366778','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:55:16','2013-12-26 15:55:01','210.26.44.18'),(43,'zi7dhzo@googlemail.com','蔡艺艳',2,'1990-08-07','15603954810','96802a851b4a7295fb09122b9aa79c18','2008-08-18 06:57:30','2013-12-26 15:55:01','171.12.171.179'),(44,'b0yfzilu@hotmail.com','郎诚',1,'1994-05-18','13602127171','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:02:04','2013-12-26 15:55:02','171.8.22.92'),(45,'er9az5e9s@163.com','台翰',1,'1994-06-22','15900953220','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:05:08','2013-12-26 15:55:14','222.31.141.156'),(46,'e34jy2@yeah.net','彭筠',2,'1983-08-12','15106915420','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:09:37','2013-12-26 15:52:34','36.60.51.67'),(47,'1u0zc56h@163.net','包华婉',2,'1998-10-03','13102518450','96802a851b4a7295fb09122b9aa79c18','2008-08-18 07:47:24','2013-12-26 15:55:02','121.76.76.105'),(48,'cs8kyk3@ask.com','淳盛',1,'2002-06-19','13203151569','96802a851b4a7295fb09122b9aa79c18','2008-08-18 08:01:58','2013-12-26 15:55:02','36.60.76.111'),(49,'ibcgi5ll@yahoo.com','车珍枫',2,'1975-07-27','15605361319','96802a851b4a7295fb09122b9aa79c18','2008-08-18 08:12:45','2013-12-26 15:55:01','106.83.110.76'),(50,'gzxcx6vz@live.com','应冰红',2,'2004-04-19','15104154370','96802a851b4a7295fb09122b9aa79c18','2008-08-18 09:00:09','2013-12-26 15:55:01','182.88.181.216');
*/
2.2、插入数据到集群表test.tbl_test_mergetree_users_all
使用SQL语句
insert into test.tbl_test_mergetree_users_all(id, email, username, gender, birthday, mobile, pwd, regDT, lastLoginDT, lastLoginIP)
追加上面的values语句进行数据插入。
查询表中全量数据。
50条数据是按照分区字段regDT划分片段,然后每个段中是根据排序字段id默认使用ASC进行排序的。
查询符合email模糊匹配gmail并且id>30并且分区是20080818条件的数据。
这个表的物理存储是在我们设置的数据路径/export/data/clickhouse中,该路径下的/data/data下是ClickHouse节点中维护的数据库,对应下图中的default、system和test三个文件夹。
然后在test文件夹下,有一个tbl_test_mergetree_users_all文件夹(是我们自己创建的表),该文件夹中有很多日期类型的文件夹(我们创建表时指定的分区字段的值),在此文件夹下有很多具体的数据文件。
这些数据文件中,columns.txt表示tbl_test_mergetree_users_all表20080816_29_29_0分区下所有数据的列信息,如图:
这些数据文件中,count.txt表示tbl_test_mergetree_users_all表20080816_29_29_0分区下所有数据的总条数,如图:
这些数据文件中的*.bin和*.mrk2指的是列字段的数据信息,*.idx指的是索引字段信息。
2.3、插入数据到本地表test.tbl_test_mergetree_users
使用SQL语句
insert into tbl_test_mergetree_users(id, email, username, gender, birthday, mobile, pwd, regDT, lastLoginDT, lastLoginIP)
追加上面的values语句进行数据插入。
查询表中全量数据(与_all表数据一样)
这个本地表的物理存储与上面的集群表的存储路径和文件类似。
进入本地路径:cd /var/lib/clickhouse/data/default/tbl_test_mergetree_users
编辑随便进入一个目录:cd 20080818_9_9_0
*.bin是按列保存数据的文件 *.mrk保存块偏移量 primary.idx保存主键索引
3、删除MergeTree引擎的表
3.1、删除集群中所有节点的tbl_test_mergetree_users_all表
drop table test.tbl_test_mergetree_users_all on cluster 'ch_cluster';
3.2、删除tbl_test_mergetree_users本地表
drop table tbl_test_mergetree_users;
- 点赞
- 收藏
- 关注作者
评论(0)