JAVA开发搞了一年多的大数据，究竟干了点啥( 二 ) _生活百科

1.4case when <条件1> then <结果1> when <条件2> then <结果2>else <剩余数据的结果> end as 字段名-- 将用户年龄按照18岁及以下，18岁至65岁，65岁以上分类selectcase when age<=18 then '未成年' when age>18 and age <=65 then '青中年' else '老年' end as age_group,name,age,sexfrom usercase when 语法其实就是java语言的if...else if ...else if...else，当满足条件时就进入该分支，不满足的话就一直进入下面的分支，最后所有条件都不满足则进入else分支，通常在Sql中我们使用case when then进行一些归纳分类，譬如我们的电商涉及到的商品种类众多，可能需要按照某些规则进行分类，就免不了使用该语法。
1.5union,union allselect namefrom Aunionselect name from Bselect name from Aunion allselect name from B

想去重使用union，不去重完全放一起使用union all
假设A表中某列有重复数据，然后A表和B表进行union,A表中的那列数据自动的去重，不仅仅是把B表中的那列和A表重复的数据去重。像案例中的union后的结果一样，所得的name不会有一条重复数据，相当于整体的distinct了一下。
union 和 union all查询数据结果只以第一句sql的字段名称为准，后续的sql只按照顺序匹配，不会识别字段名称

1.6partition分区使用

-- 创建hive分区表create table db_demo.tb_demo (filed1 string comment '字段1', filed2 int comment '字段2')PARTITIONED BY(l_date string) ;-- 删除表分区altertable db_demo.tb_demo drop if exists partition(l_date = '${v_date}')--将数据写入表分区insert into table db_demo.tb_demo partition(l_date = '${v_date}')select * from db_demo.tb_demo_v0 where ......--覆盖指定分区表数据insert overwrite table db_demo.tb_demo partition(l_date = '${v_date}')select * from db_demo.tb_demo_v0 where ......

分区表指的是在创建表时指定的partition的分区空间。
一个表可以拥有一个或者多个分区，每个分区以文件夹的形式单独存在表文件夹的目录下。
分区字段会作为表的最后一个字段出现。

1.7JSON处理

-- 取出JSON串中指定key的value值-- 语法get_json_object('{key1:value1,key2:value2}','$.key')--比如取出JSON串中的name信息select get_json_object('{"age":1089,"name":"tom"}','$.name')

1.8日期函数

-- to_date：日期时间转日期select to_date(create_time) from demo_db.demo_table;-- current_date ：当前日期select current_date-- date_sub : 返回日期前n天的日期selectdate_sub(pay_time,9) from demo_db.demo_table-- date_add : 返回日期后n天的日期,即使放入时间参数，得到的也是日期，上一个同理，只比较日期位 。selectdate_add(pay_time,9) from demo_db.demo_table-- unix_timestamp：获取当前unix时间戳select unix_timestamp('2022-10-10 10:22:11')-- datediff：返回开始日期减去结束日期的天数,只比较日期位select datediff('2022-10-10 23:22:11','2022-10-09 00:22:11')-- 获取当前月select substr(current_date,1,7);--获取上个月最后一天select DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()),DAY(FROM_UNIXTIME(UNIX_TIMESTAMP())))

1.9炸裂函数

Hive版本:selectid,type_id_newfrom table_onelateral view explode(split(type_id,",")) table_one_temp as type_id_new;Mysql版本:SELECTa.id, substring_index(substring_index(a.type_id,',',b.help_topic_id + 1),',' ,- 1) AS type_idFROM(select id, type_id from table_one) aJOIN mysql.help_topic b ON b.help_topic_id <(length(a.type_id) - length( replace(a.type_id, ',', '')) + 1)

简而言之，炸裂函数从命名上就可以看出，这是一个由1到多的过程，由一个裂变成多个。具体场景大概是某条数据的某个字段里面存放的是被相同符号分割的字符串，我们暂时用逗号分割来讲述，拿我们案例来讲，假设一条数据的id = 1，type_id = 1,2,3 ,通过以上的炸裂函数处理之后，该条查询结果将变成3条，分别为id=1、type_id_new =1，id=1、type_id_new =2，id=1、type_id_new =3，也就是被炸裂的字段数据分割，剩余字段全部保持不变。
1.10COALESCE ( expression,value1,value2……,valuen)

selectcoalesce(demo_id1,demo_id2,demo_id3) from demo_db.demo_table;selectcoalesce(case when demo_name like '%杰伦%' then '杰粉' when demo_name like '%许嵩% ' then '嵩鼠' else '小泷包' end,demo_name2);

coalesce函数其实就是找到第一个不为NULL的表达式，将其结果返回，假设全部为NULL，最后只能返回NULL,从我以上案例可以看出来，每个参数不仅仅可以写字段，也可以嵌入其他的表达式，像第二行嵌入了一串case when then，那也仅仅是一个参数而已。
注意点，coalesce函数只是判断是否为NULL，它不会判断空串，假设第一个不为NULL的参数为空串‘’，那么它也会将这个空串当做有值查出来的。