从零开始实施推荐系统的落地部署——25.推荐系统案例(十五)ClickHouse的基础操作
上一篇提供的下载clickhouse的下载路径有问题,现在补上https://mirrors.tuna.tsinghua.edu.cn/clickhouse/rpm/stable/x86_64/,再根据自己的需要下载相应的版本安装。Clickhouse的官网上的示例数据太大了,我的笔记本配置一般,不适合对示例数据简单操作。刚好mas5的docker版mysql里有myemployees的数据,这样可以使用2种方法把mysql的数据导入到clickhouse。启动mysql:sudo systemctl start clickhouse-server
(1) client 连接 server,创建表要注意mysql和clickhouse数据类型的对应
mysql |
clickhouse |
UNSIGNED TINYINT |
UInt8 |
TINYINT |
Int8 |
UNSIGNED SMALLINT |
UInt16 |
SMALLINT |
Int16 |
UNSIGNED INT, UNSIGNED MEDIUMINT |
UInt32 |
INT, MEDIUMINT |
Int32 |
UNSIGNED BIGINT |
UInt64 |
BIGINT |
Int64 |
FLOAT |
Float32 |
DOUBLE |
Float64 |
DATE |
Date |
DATETIME, TIMESTAMP |
DateTime |
BINARY |
FixedString |
mas2.wuyi.com :) insert into myemployees.employees SELECT * FROM mysql('mas5.wuyi.com:3306', 'myemployees', 'employees', 'root', '123456');
INSERT INTO myemployees.employees SELECT *
FROM mysql('mas5.wuyi.com:3306', 'myemployees', 'employees', 'root', '123456')
Received exception from server (version 20.4.5):
Code: 349. DB::Exception: Received from localhost:9000. DB::Exception: Cannot convert NULL value to non-Nullable type: while converting source column commission_pct to destination column commission_pct.
解决的办法是:drop table myemployees.employees;
重新建立表,给带空值的字段添加Nullable(),如下图所示:
因为department_id的数据有null值,不能放在ORDER BY里,否则会出现一个循环错误,给department_id添加Nullable(Int8),能成功创建表,但是插入数据会报错Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: Sorting key cannot contain nullable columns.如果department_id不添加Nullable(),创建表格提示Code: 349. DB::Exception: Received from localhost:9000. DB::Exception: Cannot convert NULL value to non-Nullable type: while converting source column commission_pct to destination column commission_pct。正确的是把ORDER B里的department_id去掉,再给department_id添加Nullable(Int8)即可,如下建表所示:(注意:主键不能设置为Nullable)
CREATE TABLE myemployees.employees (
employee_id Int8,
first_name Nullable(String),
last_name Nullable(String),
email Nullable(String),
phone_number Nullable(String),
job_id Nullable(String),
salary Nullable(Float64),
commission_pct Nullable(Float64),
manager_id Nullable(Int8),
department_id Nullable(Int8),
hiredate DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(hiredate)
ORDER BY (hiredate, intHash32(employee_id));
插入mysql数据到clickhouse
insert into myemployees.employees SELECT * FROM mysql('mas5.wuyi.com:3306', 'myemployees', 'employees', 'root', '123456');
查询表数据是否插入成功:
测试clickhouse是否成功写入,把mas5的mysql关闭。再查询一下clickhouse还有数据,发现数据还在说明mysql数据已经成功写入到clickhouse里。
(2) 用Python来写入:
删除表和重新建立表,进入python环境:
运行from clickhouse_driver import Client
出现:ModuleNotFoundError: No module named 'clickhouse_driver'
退出python,安装pip install clickhouse_driver
重新进入python环境:
from clickhouse_driver import Client
client = Client("mas2.wuyi.com", "9000","myemployees")
client.execute("INSERT INTO employees SELECT * FROM mysql('mas5.wuyi.com:3306', 'myemployees', 'employees', 'root', '123456')")
执行完成后,如图所示:
查询clickhouse数据,检测是否成功写入数据。
建立employees这个表,ENGINE = MergeTree()说明使用的是mergeTree引擎,MergeTree 是以列文件+索引文件+表定义文件组成的,PARTITION BY toYYYYMM(hiredate)说明以聘用时间hiredate做分区。查询出来的数据都是以hiredate分组,order by 设定了分区内的数据按照hiredate和employee_id顺序进行有序保存。ClickHouse 中的主键有一个特点跟其他数据库不一样的是只提供了数据的一级索引,但是却不是唯一约束,这样会导致可以增加主键。通过主键进行二分查找,能够定位到对应的 index granularity,避免了全表扫描。
- 点赞
- 收藏
- 关注作者
评论(0)