故事网站模版,wordpress 优酷视频插件,网站html5自适应屏幕,长沙系统开发公司Hive 开窗函数
Hive窗口函数是一种特殊的函数#xff0c;允许用户在查询中对一组行进行计算#xff0c;而不仅仅是单独的行。窗口函数可以在 SQL 查询中进行聚合、排名、累积计算等。这使得窗口函数在数据分析和报告生成中非常有用。
窗口函数的基本组成部分
函数类型允许用户在查询中对一组行进行计算而不仅仅是单独的行。窗口函数可以在 SQL 查询中进行聚合、排名、累积计算等。这使得窗口函数在数据分析和报告生成中非常有用。
窗口函数的基本组成部分
函数类型如 ROW_NUMBER(), RANK(), DENSE_RANK(), SUM(), AVG() 等。OVER 子句定义窗口的范围和分区用于指定在哪些行上应用窗口函数。
窗口边界标识符 CURRENT ROW: 表示窗口的当前行。通常用于窗口的结束范围。 n PRECEDING: 表示当前行之前的n行。例如ROWS BETWEEN 1 PRECEDING AND CURRENT ROW 表示从当前行向上看一行到当前行。 n FOLLOWING: 表示当前行之后的n行。例如ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING 表示从当前行到当前行的后两行。 UNBOUNDED: 表示没有边界通常用于定义窗口的起点或终点。 UNBOUNDED PRECEDING: 表示从窗口的起点开始不限行数。 UNBOUNDED FOLLOWING: 表示到窗口的终点结束不限行数。
窗口边界函数 LAG(col, n): 这个函数用于获取当前行的前n行中的指定列的值。它可以用于比较当前行与前几行的数据。 SELECT employee_id, salary, LAG(salary, 1) OVER (ORDER BY employee_id) AS previous_salary
FROM employees;这个示例显示了每个员工的当前工资和前一个员工的工资。 LEAD(col, n): 这个函数用于获取当前行的后n行中的指定列的值。与 LAG 类似但它是向下查找。 SELECT employee_id, salary, LEAD(salary, 1) OVER (ORDER BY employee_id) AS next_salary
FROM employees;这个示例显示了每个员工的当前工资和下一个员工的工资。
示例数据集
假设我们有一个名为 business 的表内容如下
nameorderdatecostAlice2017-04-01100Bob2017-04-05150Alice2017-04-10200Charlie2017-05-01300Bob2017-05-10100Alice2017-05-15250Charlie2017-06-01400
SQL 查询运行结果
1. 查询在2017年4月份购买过的顾客及总人数
SELECT name, COUNT(*) OVER()
FROM business
WHERE SUBSTRING(orderdate, 1, 7) 2017-04;结果
nameCOUNT(*)Alice3Bob3Alice3
2. 查询顾客的购买明细及月购买总额
顾客购买明细及购买总额
SELECT name, orderdate, cost, SUM(cost) OVER()
FROM business;结果
nameorderdatecostSUM(cost)Alice2017-04-011001300Bob2017-04-051501300Alice2017-04-102001300Charlie2017-05-013001300Bob2017-05-101001300Alice2017-05-152501300Charlie2017-06-014001300
明细及月购买总额
SELECT name, orderdate, cost, SUM(cost) OVER(PARTITION BY MONTH(orderdate))
FROM business;结果
nameorderdatecostSUM(cost)Alice2017-04-01100300Bob2017-04-05150300Alice2017-04-10200300Charlie2017-05-01300700Bob2017-05-10100700Alice2017-05-15250700Charlie2017-06-01400400
顾客购买明细及顾客购买总额
SELECT name, orderdate, cost, SUM(cost) OVER(PARTITION BY name)
FROM business;结果
nameorderdatecostSUM(cost)Alice2017-04-01100550Bob2017-04-05150250Alice2017-04-10200550Charlie2017-05-01300700Bob2017-05-10100250Alice2017-05-15250550Charlie2017-06-01400400
顾客购买明细及顾客月购买总额
SELECT name, orderdate, cost, SUM(cost) OVER(PARTITION BY name, MONTH(orderdate))
FROM business;结果
nameorderdatecostSUM(cost)Alice2017-04-01100300Bob2017-04-05150150Alice2017-04-10200300Charlie2017-05-01300300Bob2017-05-10100100Alice2017-05-15250250Charlie2017-06-01400400
3. 按照日期进行累加
按照日期逐步累加购买总额
SELECT name, orderdate, cost,
SUM(cost) OVER(PARTITION BY name ORDER BY orderdate)
FROM business;方法2边界从起点到当前行
SELECT name, orderdate, cost,
SUM(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sample4
FROM business;结果
nameorderdatecostSUM(cost)Alice2017-04-01100100Alice2017-04-10200300Alice2017-05-15250550Bob2017-04-05150150Bob2017-05-10100250Charlie2017-05-01300300Charlie2017-06-01400700
当前行和前面一行的聚合
SELECT name, orderdate, cost,
SUM(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sample5
FROM business;结果
nameorderdatecostsample5Alice2017-04-01100100Alice2017-04-10200300Alice2017-05-15250450Bob2017-04-05150150Bob2017-05-10100250Charlie2017-05-01300300Charlie2017-06-01400400
当前行和前后各一行的聚合
SELECT name, orderdate, cost,
SUM(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sample6
FROM business;结果
nameorderdatecostsample6Alice2017-04-01100300Alice2017-04-10200550Alice2017-05-15250250Bob2017-04-05150250Bob2017-05-10100100Charlie2017-05-01300700Charlie2017-06-01400400
当前行及后面所有行
SELECT name, orderdate, cost,
SUM(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS sample7
FROM business;结果
nameorderdatecostsample7Alice2017-04-01100550Alice2017-04-10200250Alice2017-05-15250250Bob2017-04-05150100Bob2017-05-10100100Charlie2017-05-01300400Charlie2017-06-01400400
4. 查询顾客上次的购买时间
使用 LAG 函数
SELECT name, orderdate, cost,
LAG(orderdate, 1) OVER(PARTITION BY name ORDER BY orderdate) AS last_purchase_date
FROM business;结果
nameorderdatecostlast_purchase_dateAlice2017-04-01100NULLAlice2017-04-102002017-04-01Alice2017-05-152502017-04-10Bob2017-04-05150NULLBob2017-05-101002017-04-05Charlie2017-05-01300NULLCharlie2017-06-014002017-05-01