网站开发代理江苏,the7 wordpress theme,windows优化大师怎么下载,网站上传的视频播放卡1.概述 在离线数仓处理通过HQL业务数据时#xff0c;经常会遇到行转列或者列转行之类的操作#xff0c;就像concat_ws之类的函数被广泛使用#xff0c;今天这个也是经常要使用的拓展方法。
2.explode函数
2.1 函数语法
-- explode(a) - separates the elements of array …1.概述 在离线数仓处理通过HQL业务数据时经常会遇到行转列或者列转行之类的操作就像concat_ws之类的函数被广泛使用今天这个也是经常要使用的拓展方法。
2.explode函数
2.1 函数语法
-- explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode
Function type:BUILTIN
-- explode()用于array的语法如下
select explode(arraycol) as newcol from tablename;
-- explode()用于map的语法如下
select explode(mapcol) as (keyname,valuename) from tablename;2.2 函数说明
explode 函数是UDTF函数将hive一列中复杂的array或者map结构拆分成多行。Explode函数是不允许在select再有其他字段, explode(ARRAY) 列表中的每个元素生成一行。explode(MAP) map中每个key-value对生成一行key为一列value为一列。
2.3 使用案例
-- explode (array)
select explode(array(A,B,C));
select explode(array(A,B,C)) as col;
select tf.* from (select 0) t lateral view explode(array(A,B,C)) tf;
select tf.* from (select 0) t lateral view explode(array(A,B,C)) tf as col;
-- 结果
col
A
B
C
-- explode (map)
select explode(map(A,10,B,20,C,30));
select explode(map(A,10,B,20,C,30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map(A,10,B,20,C,30)) tf;
select tf.* from (select 0) t lateral view explode(map(A,10,B,20,C,30)) tf as key,value;
-- 结果
key value
A 10
B 20
C 303.posexplode函数
2.1 函数语法
-- posexplode(a) - behaves like explode for arrays, but includes the position of items in the original array
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFPosExplode
Function type:BUILTIN
select posexplode(ARRAYT a)
-- Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.2.2 函数说明
posexplode 函数将ARRAY数组a展开每个Value一行每行两列分别对应数组从0开始的下标和数组元素。
2.3 使用案例
-- posexplode (array)
select posexplode(array(A,B,C));
select posexplode(array(A,B,C)) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array(A,B,C)) tf;
select tf.* from (select 0) t lateral view posexplode(array(A,B,C)) tf as pos,val;
-- 结果
pos val
0 A
1 B
2 C4.later view
4.1 语法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (, columnAlias)*
fromClause: FROM baseTable (lateralView)*
-- columnAlias是给udtf(expression)列起的别名。
-- tableAlias 虚拟表的别名。4.2 用法描述
lateral view为侧视图,意义是为了配合UDTF来使用,把某一行数据拆分成多行数据。不加lateral view的UDTF只能提取单个字段拆分,并不能塞回原来数据表中。加上lateral view就可以将拆分的单个字段数据与原始表数据关联上。lateral view函数会将UDTF生成的结果放到一个虚拟表中然后虚拟表中的数据和输入行进行join来达到连接UDTF外的select字段的目的。(本质是笛卡尔积)
4.3 使用案例
4.3.1 准备数据
下表 pageAds. 它有两个字段: pageid (页码) and adid_list (页面上的adid):
Column nameColumn typepageidSTRINGadid_listArray
表中数据如下
pageidadid_listfront_page[1, 2, 3]contact_page[3, 4, 5] 需求 统计各个页面出现的广告的次数 4.3.2 代码实现
第一步 使用 lateral view 和 explore() 函数将 adid_list 列的 list 拆分sql代码如下
select pageid, adid
FROM pageAds lateral view explode(adid_list) ad_view as adid;可的如下结果
pageidadidfront_page1front_page2front_page3contact_page4contact_page5
第二步 使用 count/group by 语句统计出每个adid出现的次数
select adid,count(1) as cnt
FROM pageAds lateral view explode(adid_list) ad_view as adid
group by adid;adidcnt1121324151
4.4 Multiple Lateral Views
FROM子句可以有多个LATERAL VIEW子句。 后面的LATERAL VIEWS子句可以引用出现在LATERAL VIEWS左侧表的任何列。
例如如下查询
SELECT * FROM exampleTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;例如使用以下基表
Array pageid_listArray adid_list[1, 2, 3][“a”, “b”, “c”][3, 4][“c”, “d”]
单个Lateral View查询:
SELECT pageid_list, adid
FROM pageAds_1LATERAL VIEW explode(adid_list) adTable AS adid;
[1,2,3] a
[1,2,3] b
[1,2,3] c
[4,5] c
[4,5] d多个Lateral View查询
select pageid,adid FROM pageAds_1
lateral view explode(pageid_list) adTable as pageid
lateral view explode(adid_list) adTable as adid;
1,a
1,b
1,c
2,a
2,b
2,c
3,a
3,b
3,c
3,c
3,d
4,c
4,d4.5 later view json_tuple()
4.5.1 准备数据
create table lateral_tal_3
(id int,col1 string,col2 string
);insert into lateral_tal_3 values(1234,{part1 : 61, total : 623, part2 : 560, part3 : 1, part4 : 1}, {to_part2 : 0, to_part4 : 0, to_up : 0, to_part3 : 0, to_part34 : 0}),
(4567,{part1 : 451, total : 89928, part2 : 88653, part3 : 789, part4 : 35},{to_part2 : 54, to_part4 : 6, to_up : 65, to_part3 : 2, to_part34 : 3}),
(7890,{part1 : 142, total : 351808, part2 : 346778, part3 : 4321, part4 : 567},{to_part2 : 76, to_part4 : 23, to_up : 65, to_part3 : 14, to_part34 : 53});idcol1col21234{“part1” : “61”, “total” : “623”, “part2” : “560”, “part3” : “1”, “part4” : “1”}{“to_part2” : “0”, “to_part4” : “0”, “to_up” : “0”, “to_part3” : “0”, “to_part34” : “0”}4567{“part1” : “451”, “total” : “89928”, “part2” : “88653”, “part3” : “789”, “part4” : “35”}{“to_part2” : “54”, “to_part4” : “6”, “to_up” : “65”, “to_part3” : “2”, “to_part34” : “3”}7890{“part1” : “142”, “total” : “351808”, “part2” : “346778”, “part3” : “4321”, “part4” : “567”}{“to_part2” : “76”, “to_part4” : “23”, “to_up” : “65”, “to_part3” : “14”, “to_part34” : “53”} 需求 解析非结构化的json数据类型 “json_tuple(jsonStr, p1, p2, …, pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string.” Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple Function type:BUILTIN json_tuple : 第一个参数是json 字符串所在的列名,其它参数是获取 json 字符串中的哪些key值; 4.5.2 代码实现
SELECT id,part1,part3,part4,to_part2,to_part3,to_part4,IF(part3 0, 0.0, to_part3 / part3) as ratio3,IF(part4 0, 0.0, to_part4 / part4) as ratio4
FROM lateral_tal_3lateral VIEW json_tuple(col1, part3, part4, part1) json1 AS part3, part4, part1lateral VIEW json_tuple(col2, to_part2,to_part3, to_part4) json2 AS to_part2, to_part3, to_part4
;1234,61,1,1,0,0,0,0,0
4567,451,789,35,54,2,6,0.0025348542458808617,0.17142857142857143
7890,142,4321,567,76,14,23,0.0032399907428835918,0.04056437389770723
5.使用案例 需求1 如何产生1-100的连续的数字 --方式1 结合space函数与split函数posexplode函数lateral view函数获得
select id_start pos as id
from (select 1 as id_start,100 as id_end) m lateral view posexplode(split(space(id_end - id_start), )) t as pos, val;-- 方式2结合space函数与split函数explode函数lateral view函数窗口函数获得
select row_number() over () as id
from (select split(space(99), ) as x) tlateral viewexplode(x) ex;
-- 方式2结合space函数与split函数posexplode函数lateral view函数获取
from (select split(space(99), ) as x) tlateral viewposexplode(x) ex as pos,val;需求2 获取2024-07-15至2024-07-29间所有的日期 SELECT pos,date_add(start_date, pos) dd
FROM (SELECT 2024-07-15 AS start_date, 2024-07-29 AS end_date) templateral VIEWposexplode(split(space(datediff(end_date, start_date)), )) tAS pos, val;