clickhouse和ELB的相关问题

举报
陈方业 发表于 2021/10/21 15:56:45 2021/10/21
【摘要】 mrs clickhouse在公有云上推荐用户使用ELB进行负载均衡,后端连接多台clickhouse。过程中遇到的一些问题逐渐补充在这篇博客中。

mrs clickhouse在公有云上推荐用户使用ELB进行负载均衡,后端连接多台clickhouse。过程中遇到的一些问题逐渐补充在这篇博客中。

  1. 业务报错Connection reset

报错堆栈如下:

ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002, host: 100.85.220.119, port: 9090; Connection reset
	at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.getException(ClickHouseExceptionSpecifier.java:91)
	at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:55)
	at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:24)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.getInputStream(ClickHouseStatementImpl.java:761)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:196)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:166)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:161)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:156)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.execute(ClickHouseStatementImpl.java:328)
	at Main.main(Main.java:88)

ELB配置如下(未开启健康检测):

20211021-155253(WeLinkPC).png

cap报文如下:

clickhouse存在一些SQL,执行的时间过长,如果超过了ELB的空闲超时时间,ELB就会将连接RST,导致业务上报异常:Connection reset。不过当前ELB可配置的最大空闲超时时间为4000s。如果有客户存在极端的业务场景,需要最大空闲超时超过4000s,则在ELB上无法需要,需要在业务客户端上做简单的修改实现tcp保活。

java修改方式参考:https://stackoverflow.com/questions/53375222/how-to-set-tcp-keep-alive-from-httpclient

C代码修改方式参考:https://feichashao.com/linux_tcp_keepalive/

clickhouse上可以通过修改jdbc代码实现相应的保活功能:

步骤

  1. 业务上获取连接时要按需设置socket timeout(解决read time out
ClickHouseProperties properties = new ClickHouseProperties();

properties.setSocketTimeout(3600 * 1000 * 2);  # 根据业务实际配置

 

  1. Clickhouse-jdbc代码在下面文件中增加一行代码,重新编译后使用:

clickhouse-jdbc\src\main\java\ru\yandex\clickhouse\util\ClickHouseHttpClientBuilder.java


connectionManager.setDefaultSocketConfig(SocketConfig.custom().setSoKeepAlive(true).build());

该行代码配置链接SO_KEEPALIVE, 从客户端实现在空闲连接发生ACK包。

本地TEST:通过长SQL模拟空闲连接

  • ELB配置300s超时

  • 系统参数未做改变,clickhouse服务端参数未做改变

  • 编写测试代码
import ru.yandex.clickhouse.ClickHouseDataSource;
import ru.yandex.clickhouse.ClickHouseStatement;
import ru.yandex.clickhouse.settings.ClickHouseProperties;

import java.io.IOException;
import java.sql.*;

public class Main {

    public static void main(String[] args) throws IOException, SQLException {

        Connection connection = null;
        int testPort=9090;
        String url = "jdbc:clickhouse://ip:" + testPort;
        ClickHouseProperties properties = new ClickHouseProperties();
        properties.setDatabase("default");
        properties.setCompress(false);
        properties.setExtremes(true);
        properties.setSocketTimeout(3600 * 1000 * 2);
        ClickHouseDataSource dataSource = new ClickHouseDataSource(url, properties);
        connection = dataSource.getConnection();
        ClickHouseStatement stmt = (ClickHouseStatement) connection.createStatement();
        try {
            stmt.execute("CREATE TABLE lineorder_flat\n" +
                    "   ENGINE = MergeTree\n" +
                    "   PARTITION BY toYear(LO_ORDERDATE)\n" +
                    "   ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS\n" +
                    "   SELECT\n" +
                    "    l.LO_ORDERKEY AS LO_ORDERKEY,\n" +
                    "    l.LO_LINENUMBER AS LO_LINENUMBER,\n" +
                    "    l.LO_CUSTKEY AS LO_CUSTKEY,\n" +
                    "    l.LO_PARTKEY AS LO_PARTKEY,\n" +
                    "    l.LO_SUPPKEY AS LO_SUPPKEY,\n" +
                    "    l.LO_ORDERDATE AS LO_ORDERDATE,\n" +
                    "    l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,\n" +
                    "    l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,\n" +
                    "    l.LO_QUANTITY AS LO_QUANTITY,\n" +
                    "    l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,\n" +
                    "    l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,\n" +
                    "    l.LO_DISCOUNT AS LO_DISCOUNT,\n" +
                    "    l.LO_REVENUE AS LO_REVENUE,\n" +
                    "    l.LO_SUPPLYCOST AS LO_SUPPLYCOST,\n" +
                    "    l.LO_TAX AS LO_TAX,\n" +
                    "    l.LO_COMMITDATE AS LO_COMMITDATE,\n" +
                    "    l.LO_SHIPMODE AS LO_SHIPMODE,\n" +
                    "    c.C_NAME AS C_NAME,\n" +
                    "    c.C_ADDRESS AS C_ADDRESS,\n" +
                    "    c.C_CITY AS C_CITY,\n" +
                    "    c.C_NATION AS C_NATION,\n" +
                    "    c.C_REGION AS C_REGION,\n" +
                    "    c.C_PHONE AS C_PHONE,\n" +
                    "    c.C_MKTSEGMENT AS C_MKTSEGMENT,\n" +
                    "    s.S_NAME AS S_NAME,\n" +
                    "    s.S_ADDRESS AS S_ADDRESS,\n" +
                    "    s.S_CITY AS S_CITY,\n" +
                    "    s.S_NATION AS S_NATION,\n" +
                    "    s.S_REGION AS S_REGION,\n" +
                    "    s.S_PHONE AS S_PHONE,\n" +
                    "    p.P_NAME AS P_NAME,\n" +
                    "    p.P_MFGR AS P_MFGR,\n" +
                    "    p.P_CATEGORY AS P_CATEGORY,\n" +
                    "    p.P_BRAND AS P_BRAND,\n" +
                    "    p.P_COLOR AS P_COLOR,\n" +
                    "    p.P_TYPE AS P_TYPE,\n" +
                    "    p.P_SIZE AS P_SIZE,\n" +
                    "    p.P_CONTAINER AS P_CONTAINER\n" +
                    "FROM lineorder AS l\n" +
                    "INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY\n" +
                    "INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY\n" +
                    "INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;");

            System.out.println("CREATE SQL EXECUTE OK!");
        } catch (SQLException e) {
            e.printStackTrace();
        }

        connection.close();
    }
}
  • 在测试代码中引入修改后的jdbc,编译成jar包放到linux集群运行(通过netstat -nop查看连接可以看到keepalive字样说明修改参数成功)
  • 测试结果

查询结果正常完成没有connection reset

查询耗时超过1000s

客户端抓包:SQL请求发送后带有keepalive ack

Clickhouse服务端抓包:

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。