clickhouse和ELB的相关问题
mrs clickhouse在公有云上推荐用户使用ELB进行负载均衡,后端连接多台clickhouse。过程中遇到的一些问题逐渐补充在这篇博客中。
- 业务报错Connection reset
报错堆栈如下:
ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002, host: 100.85.220.119, port: 9090; Connection reset
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.getException(ClickHouseExceptionSpecifier.java:91)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:55)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:24)
at ru.yandex.clickhouse.ClickHouseStatementImpl.getInputStream(ClickHouseStatementImpl.java:761)
at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:196)
at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:166)
at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:161)
at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:156)
at ru.yandex.clickhouse.ClickHouseStatementImpl.execute(ClickHouseStatementImpl.java:328)
at Main.main(Main.java:88)
ELB配置如下(未开启健康检测):
cap报文如下:
clickhouse存在一些SQL,执行的时间过长,如果超过了ELB的空闲超时时间,ELB就会将连接RST,导致业务上报异常:Connection reset。不过当前ELB可配置的最大空闲超时时间为4000s。如果有客户存在极端的业务场景,需要最大空闲超时超过4000s,则在ELB上无法需要,需要在业务客户端上做简单的修改实现tcp保活。
java修改方式参考:https://stackoverflow.com/questions/53375222/how-to-set-tcp-keep-alive-from-httpclient
C代码修改方式参考:https://feichashao.com/linux_tcp_keepalive/
clickhouse上可以通过修改jdbc代码实现相应的保活功能:
步骤
- 业务上获取连接时要按需设置socket timeout(解决read time out)
ClickHouseProperties properties = new ClickHouseProperties();
properties.setSocketTimeout(3600 * 1000 * 2); # 根据业务实际配置
- Clickhouse-jdbc代码在下面文件中增加一行代码,重新编译后使用:
clickhouse-jdbc\src\main\java\ru\yandex\clickhouse\util\ClickHouseHttpClientBuilder.java
connectionManager.setDefaultSocketConfig(SocketConfig.custom().setSoKeepAlive(true).build());
该行代码配置链接SO_KEEPALIVE, 从客户端实现在空闲连接发生ACK包。
本地TEST:通过长SQL模拟空闲连接
- ELB配置300s超时
- 系统参数未做改变,clickhouse服务端参数未做改变
- 编写测试代码
import ru.yandex.clickhouse.ClickHouseDataSource;
import ru.yandex.clickhouse.ClickHouseStatement;
import ru.yandex.clickhouse.settings.ClickHouseProperties;
import java.io.IOException;
import java.sql.*;
public class Main {
public static void main(String[] args) throws IOException, SQLException {
Connection connection = null;
int testPort=9090;
String url = "jdbc:clickhouse://ip:" + testPort;
ClickHouseProperties properties = new ClickHouseProperties();
properties.setDatabase("default");
properties.setCompress(false);
properties.setExtremes(true);
properties.setSocketTimeout(3600 * 1000 * 2);
ClickHouseDataSource dataSource = new ClickHouseDataSource(url, properties);
connection = dataSource.getConnection();
ClickHouseStatement stmt = (ClickHouseStatement) connection.createStatement();
try {
stmt.execute("CREATE TABLE lineorder_flat\n" +
" ENGINE = MergeTree\n" +
" PARTITION BY toYear(LO_ORDERDATE)\n" +
" ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS\n" +
" SELECT\n" +
" l.LO_ORDERKEY AS LO_ORDERKEY,\n" +
" l.LO_LINENUMBER AS LO_LINENUMBER,\n" +
" l.LO_CUSTKEY AS LO_CUSTKEY,\n" +
" l.LO_PARTKEY AS LO_PARTKEY,\n" +
" l.LO_SUPPKEY AS LO_SUPPKEY,\n" +
" l.LO_ORDERDATE AS LO_ORDERDATE,\n" +
" l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,\n" +
" l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,\n" +
" l.LO_QUANTITY AS LO_QUANTITY,\n" +
" l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,\n" +
" l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,\n" +
" l.LO_DISCOUNT AS LO_DISCOUNT,\n" +
" l.LO_REVENUE AS LO_REVENUE,\n" +
" l.LO_SUPPLYCOST AS LO_SUPPLYCOST,\n" +
" l.LO_TAX AS LO_TAX,\n" +
" l.LO_COMMITDATE AS LO_COMMITDATE,\n" +
" l.LO_SHIPMODE AS LO_SHIPMODE,\n" +
" c.C_NAME AS C_NAME,\n" +
" c.C_ADDRESS AS C_ADDRESS,\n" +
" c.C_CITY AS C_CITY,\n" +
" c.C_NATION AS C_NATION,\n" +
" c.C_REGION AS C_REGION,\n" +
" c.C_PHONE AS C_PHONE,\n" +
" c.C_MKTSEGMENT AS C_MKTSEGMENT,\n" +
" s.S_NAME AS S_NAME,\n" +
" s.S_ADDRESS AS S_ADDRESS,\n" +
" s.S_CITY AS S_CITY,\n" +
" s.S_NATION AS S_NATION,\n" +
" s.S_REGION AS S_REGION,\n" +
" s.S_PHONE AS S_PHONE,\n" +
" p.P_NAME AS P_NAME,\n" +
" p.P_MFGR AS P_MFGR,\n" +
" p.P_CATEGORY AS P_CATEGORY,\n" +
" p.P_BRAND AS P_BRAND,\n" +
" p.P_COLOR AS P_COLOR,\n" +
" p.P_TYPE AS P_TYPE,\n" +
" p.P_SIZE AS P_SIZE,\n" +
" p.P_CONTAINER AS P_CONTAINER\n" +
"FROM lineorder AS l\n" +
"INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY\n" +
"INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY\n" +
"INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;");
System.out.println("CREATE SQL EXECUTE OK!");
} catch (SQLException e) {
e.printStackTrace();
}
connection.close();
}
}
- 在测试代码中引入修改后的jdbc,编译成jar包放到linux集群运行(通过netstat -nop查看连接可以看到keepalive字样说明修改参数成功)
- 测试结果
查询结果正常完成没有connection reset:
查询耗时超过1000s:
客户端抓包:SQL请求发送后带有keepalive ack
Clickhouse服务端抓包:
- 点赞
- 收藏
- 关注作者
评论(0)