Vertica的这些事(十六)——Vertica如何建表

举报
数据社 发表于 2022/09/25 05:22:50 2022/09/25
【摘要】 大家看到题目可能会想,建表谁不会呀,还用讲嘛?但是如何建表能使你的应用查询更加快呢? ####Anatomy of a Projection The [CREATE PROJECTION...

大家看到题目可能会想,建表谁不会呀,还用讲嘛?但是如何建表能使你的应用查询更加快呢?

####Anatomy of a Projection
The [CREATE PROJECTION]statement defines the individual elements of a projection, as the following graphic shows.
image.pngThe previous example contains the following significant elements:
####Column List and Encoding
Lists every column in the projection and defines the encoding for each column. Unlike traditional database architectures, HP Vertica operates on encoded data representations. Therefore, HP recommends that you use data encoding because it results in less disk I/O.
####Base Query
Identifies all the columns to incorporate in the projection through column name and table name references. The base query for large table projections can contain PK/FK joins to smaller tables.
####Sort Order
The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection’s sort order is (x, y), and the query’s WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection’s sort order to the query’s GROUP BY clause. If you do not specify a sort order, HP Vertica uses the order in which columns are specified in the column definition as the projection’s sort order.
The ORDER BY clause specifies a projection’s sort order, which localizes logically grouped values so that a disk read can pick up many results at once. For maximum performance, do not sort projections on LONG VARBINARY and LONG VARCHAR columns.
####Segmentation
The segmentation clause determines whether a projection is segmented across nodes within the database. Segmentation distributes contiguous pieces of projections, calledsegments, for large and medium tables across database nodes. Segmentation maximizes database performance by distributing the load. Use SEGMENTED BY HASH to segment large table projections.
For small tables, use the UNSEGMENTED keyword to direct HP Vertica to replicate these tables, rather than segment them. Replication creates and stores identical copies of projections for small tables across all nodes in the cluster. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG VARCHAR columns.
####以上来自官网,理解如下:
#####Projection的解析
#####Sort Order

1、	order by 后表中插入的数据是有序的,所以order by 的列就源自于你在查询语句时使用的where 字句的内容。例如,如果字句查询中有where x=1 and y=2,那么建立projection时order by (x, y)查询的时候就会迅速定位到符合条件的数据
2、	group by 后面的字段,出现在order by 中也可以优化查询。
3、	order by 不要建立在LONG VARBINARY and LONG VARCHAR的列

  
 
  • 1
  • 2
  • 3

#####Segmentation

1、	Segmentation by hash()就是按照某一列,打散数据,把数据均匀的分布在各个节点上,对于大表,要记得使用。所以 hash里的列是主键最好,也就是说该列数据不重复的值越多,越适合做hash.
2、	Segmentation by 的列不要用LONG VARBINARY and LONG VARCHAR columns.

  
 
  • 1
  • 2

根据以上规则建表,可以更大程度优化你的查询性能。

文章来源: dataclub.blog.csdn.net,作者:数据社,版权归原作者所有,如需转载,请联系作者。

原文链接:dataclub.blog.csdn.net/article/details/84312828

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。