- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

大数据数仓操作

南安发表于 2020/10/28 20:27:36 2020/10/28

【摘要】 Hive数据在HDFS上的默认存储路径是/user/hive/warehouse/*.db 针对时间的函数：1、日期时间转日期函数select to_date('2015-04-02 13:34:12');输出：2015-04-022、转化unix时间戳到当前时区的时间格式select from_unixtime(1323308943,’yyyyMMdd’);输出：201112083、获取当...

Hive数据在HDFS上的默认存储路径是/user/hive/warehouse/*.db

hadoop fs -chmod o+w /user/hive/warehouse

针对时间的函数：

1、日期时间转日期函数

select to_date('2015-04-02 13:34:12');

输出：2015-04-02

2、转化unix时间戳到当前时区的时间格式

select from_unixtime(1323308943,’yyyyMMdd’);

输出：20111208

3、获取当前unix时间戳

select unix_timestamp();

输出：1430816254

4、将日期进行格式化

date_format('2019/07/03','yyyy-MM-dd')

5、date_add：对当前日期增加天数（yyyy-MM-dd，±*）

date_sub：对当前日期减少天数（yyyy-MM-dd，±*）

6、regexp_replace：更改日期分隔符

regexp_replace(‘2019/07/03’,’/’,’-’);

截取字符串：

substr：截取x至y的字符串

substring（*,x,y）= ‘   ’

去除空字符

where * is not null and ****

转换数据

cast('X' as int)

转换内外部表

内转外和外转内

alter table * set tblproperties('EXTERNAL'='TRUE');     
alter table * set tblproperties('EXTERNAL'='FALSE');

创建分区表

create table dept_partition(* int, * string, * string) partitioned by (* string)

转换分区表

alter table dept_partition add partition(*='201706') ;
alter table dept_partition drop partition (month='201707’)

修改表

alter table dept_partition2 rename to dept_partition3;

插入数据

insert into table  student partition(month='201709') values(1,'wangwu');

like和Rlike语句

select * from emp where sal LIKE 'number%';   （查找sal以number开头的信息）
hselect * from emp where sal LIKE '_number%';    （查找sal第二个数值为number的信息）
select * from emp where sal RLIKE '[number]';    （查找sal中含有number的信息）

sum(if(today='1',1,'))

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

大数据数仓操作

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

大数据数仓操作

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品