HQL常用函数

explode

hive explode函数可以将一个array或者map展开，其中explode(array)使得结果中将array列表里的每个元素生成一行；explode(map)使得结果中将map里的每一对元素作为一行，key为一列，value为一列。

样式	描述
explode(ARRAY array)	返回一列。0到多行结果，每行对应输入的array数组中的一个元素。
explode(MAP map)	返回两列，其中一列是map的键，另一列是map的值。0到多行结果，每行对应每个map键值对

以array类型原数据为例：
1
2
> select explode(array(1,2,3));
>

输出：

> +--+
> |a |
> +--+
> |1 |
> |2 |
> |3 | 
> +--+
>

以map类型原数据为例：

> +--------------------------------------------------------+
> |key_values                                              |
> +--------------------------------------------------------+
> |"appVersion":"4.10.1","mId":"1377205","productId":"513" | 
> +--------------------------------------------------------+
>

使用explode函数。explode用于将一行转化为多行，也就是array/map里的每一个元素独立成列；配合上lateral view，生成的每一列相当于对原表做了个join：
1
2
> select explode(key_values);
>

解析后将输出多行：

> +-------------+----------+
> | col1        | col2     |
> +------------------------+
> |"appVersion" |"4.10.1"  |
> |"mId"        |"1377205" |
> |"productId"  |"513"     |
> +------------------------+
>

一般情况下，explode直接使用即可，但是遇到以下情况时需要结合lateral view 使用。Lateral View语法将值展开为一个新的虚拟表，与原表explode_lateral_view笛卡尔积关联。

No other expressions are allowed in SELECT

1	SELECT pageid, explode(adid_list) AS myCol ... is not supported

UDTF’s can’t be nested

1	SELECT explode(explode(adid_list)) AS myCol ... is not supported

GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported

1	SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported

2. 使用方法

a.多列

例如在上述例1中，如果想同时取explode(adid_list)和 pageid可以改写为:

1
2
3

SELECT pageid,dekey,devalue
from table 
LATERAL VIEW explode(adid_list) dedView as dekey,devalue // 注意这里输出两列，必须写别名

b.多重使用

from语句后可以跟多个Lateral View。

A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.

1
2
3

SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2

c.outer join

由于原表使用lateral view是使用join方法，如果explode的列为空，则关联后输出结果为空。

在Hive0.12里面会支持outer关键字，如果UDTF的结果是空，默认会被忽略输出。

如果加上outer关键字，则会像left outer join 一样，还是会输出select出的列，而UDTF的输出结果是NULL。

1 2	SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW OUTER explode(col1) myTable1 AS myCol1

coalesce

语法

语法：COALESCE ( expression,value1,value2……,valuen)

说明：COALESCE()函数的第一个参数expression为待检测的表达式，而其后的参数个数不定。

COALESCE()函数将会返回包括expression在内的所有参数中的第一个非空表达式。

如果expression不为空值则返回expression；否则判断value1是否是空值，

如果value1不为空值则返回value1；否则判断value2是否是空值，

如果value2不为空值则返回value2；……以此类推，
如果所有的表达式都为空值，则返回NULL。

with cube

cube，意为立方体，是多维模型的一个形象的说法。我们先来看没有with cube的情况：

select 
	col1,
	col2,         --维度字段
	count(col3) as cnt,  --聚合字段
from table 
group by col1,col2

数据为：

userid	col2	cnt
336649606	1	16
336649606	2	10
336649606	3	20
336649606	4	15
336649606	5	12

with cube相当于把所有维度，即col1和col2的所有排列组合方式进行group by，完成所有维度的聚合。

cube(a,b,c)首先会对(a,b,c)进行group by
然后依次是(a,b),(a,c),(a),(b,c),(b),(c)
cube函数等价于按上述维度分别group by后，通过union all来汇总

with cube在必须与group by一起使用，一般语法为：

select 
	col1,
	col2,         --维度字段
	count(col3),  --聚合字段
from table 
group by col1,col2  --维度字段都要出现在group by中,这里不能使用1,2代替
with cube;          --使用cube函数

得到的结果为：

col1	col2	cnt	note
336649606	1	16
336649606	2	10
336649606	3	20
336649606	4	15
336649606	5	12
336649606		73	group by col1
	1	16	group by col2(=1)
	2	10	group by col2(=2)
	3	20	group by col2(=3)
	4	15	group by col2(=4)
	5	12	group by col2(=5)

表中空格部分，其实就是group by的时候没有该字段，因此取值为空。如果给该种情况填充某个值，可以结合coalesce函数使用。比如：

select 
	coalesce(col1,'total') as col1,
	coalesce(col2,'total') as col2,
	count(col3),
from table 
group by col1,col2  
with cube;

最终结果形式会更便于理解：

col1	col2	cnt
336649606	1	16
336649606	2	10
336649606	3	20
336649606	4	15
336649606	5	12
336649606	total	73
total	1	16
total	2	10
total	3	20
total	4	15
total	5	12