开窗函数
# 概述
窗口函数是MySQL8新加的功能,在MySQL8之前,遇到复杂的分析类查询,比如查询分组排名,需要通过变量来实现,写出的SQL非常复杂,可读性差,不好维护,而开窗函数可以大大简化我们的SQL语句。窗口函数分为两类,专用开窗函数和聚合开窗函数。聚合开窗函数,和普通的聚合函数是不一样的,普通聚合函数分组后每组只能返回一条记录,而聚合开窗函数分组后返回全部数据。
参考: MySQL开窗函数 - 简书 (jianshu.com) (opens new window)
MySQL操作实战(二):窗口函数_陆-CSDN博客_mysql 窗口函数 (opens new window)
MySQL · 最佳实践 · 8.0 CTE和窗口函数的用法 (taobao.org) (opens new window)
# 变量使用
可以使用 @
来声明变量,:=
来赋值变量。 具体可以参考以下案例及博客:
select x.id, x.name, x.course, x.score
from (
-- if(@_course=t.cour se ,@rn:=@rn+1,@rn:=1) 计算排名 ,赋值变量 @_course:=t.course
select t.*, if(@_course = t.course, @rn:=@rn+1, @rn:=1) as rn, @_course:=t.course as _course
from
(select t.* from mytest t order by course, score desc ) t,
-- 定义变量
(select @rn := 0 rn, @_course := '') b
) x
WHERE rn = 1
ORDER BY course;
2
3
4
5
6
7
8
9
10
11
MySQL · 最佳实践 · 8.0 CTE和窗口函数的用法 (taobao.org) (opens new window)
(4条消息) 关于 mysql 中的 select * from table_a,table_b 的问题_weixin_33827731的博客-CSDN博客 (opens new window)
(4条消息) mysql :=和=的区别_求知若渴,虚怀若谷
-CSDN博客 (opens new window)
(4条消息) Mysql@和@@符号的详细使用说明_不甘于平凡的溃败的博客-CSDN博客 (opens new window)
select a,@a:=@a+1 rn from c, (select @a:=0) as a; 的理解 - _吟游诗人 - 博客园 (cnblogs.com) (opens new window)
# 语法结构
<开窗函数> OVER ([PARTITION BY <列清单>]
ORDER BY <排序用列清单>
)
# 注意
:[]中的内容可以不用写
。
2
3
4
5
6
# 窗口函数分类
# 专用开窗函数
专用开窗函数按照功能分为五类,序号函数、分布函数、前后函数、头尾函数、其他函数。
-- 序号函数/排名函数:
-- 排序:1,2,3
ROW_NUMBER
()
-- 排序:1,1,3
RANK()
-- 排序:1,1,2
DENSE_RANK()
-- 分布(范围)函数:
-- (rank-1)/(rows-1)
PERCENT_RANK()
-- <=当前rank值的函数/总函数
CUME_DIST()
-- 前后函数:
-- 返回当前行的前n行的expr值;
LAG(expr,n)
-- 返回当前行的后n行的expr值;
LEAD(expr,n)
-- 头尾函数:
-- 返回第一个expr值;
FIRST_VALUE(expr)
-- 返回最后个expr值;
LAST_VALUE(expr)
-- 其他函数:
-- 返回第n个expr值;
NTH_VALU(expr,n)
-- 将有序数据分为n个桶,记录等级数。返回每一行在第几桶。
NTILE(n)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 聚合开窗函数
聚合函数:(sum,avg,count,max,min) 聚合开窗函数:聚合函数()+over()
普通的聚合函数,聚合的行集是组,开窗函数聚合的行集是窗口。因此,普通的聚合函数每组(Group by)只返回一个值,而开窗函数则可为窗口中的每行都返回一个值。简单理解,就是对查询的结果多出一列,这一列可以是聚合值,也可以是排序值。
窗口函数和普通聚合函数的区别:
- 聚合函数是将多条记录聚合为一条;窗口函数是每条记录都会执行,有几条记录执行完还是几条。
- 聚合函数也可以用于窗口函数。
注意:聚合开窗函数只能使用PARTITION BY子句,ORDER BY不能与聚合开窗函数一同使用。
# 应用举例
# 1,准备数据:
创建表
student_scores
:
mysql>
create table student_scores
(
->
id
int, ->
studentId
int, ->
language
int, ->
math
int, ->
english
int, ->
classId
varchar
(
30
), -> departmentId varchar(30)
-> );
插入数据
:
mysql> insert into student_scores values
-> (1,111,68,69,90,'class1','department1'),
-> (2,112,73,80,96,'class1','department1'),
-> (3,113,90,74,75,'class1','department1'),
-> (4,114,89,94,93,'class1','department1'),
-> (5,115,99,93,89,'class1','department1'),
-> (6,121,96,74,79,'class2','department1'),
-> (7,122,89,86,85,'class2','department1'),
-> (8,123,70,78,61,'class2','department1'),
-> (9,124,76,70,76,'class2','department1'),
-> (10,211,89,93,60,'class1','department2'),
-> (11,212,76,83,75,'class1','department2'),
-> (12,213,71,94,90,'class1','department2'),
-> (13,214,94,94,66,'class1','department2'),
-> (14,215,84,82,73,'class1','department2'),
-> (15,216,85,74,93,'class1','department2'),
-> (16,221,77,99,61,'class2','department2'),
-> (17,222,80,78,96,'class2','department2'),
-> (18,223,79,74,96,'class2','department2'),
-> (19,224,75,80,78,'class2','department2'),
-> (20,225,82,85,63,'class2','department2');
-- mysql> select *from student_scores;
+------+-----------+----------+------+---------+---------+--------------+
| id | studentId | language | math | english | classId | departmentId |
+------+-----------+----------+------+---------+---------+--------------+
| 1 | 111 | 68 | 69 | 90 | class1 | department1 |
| 2 | 112 | 73 | 80 | 96 | class1 | department1 |
| 3 | 113 | 90 | 74 | 75 | class1 | department1 |
| 4 | 114 | 89 | 94 | 93 | class1 | department1 |
| 5 | 115 | 99 | 93 | 89 | class1 | department1 |
| 6 | 121 | 96 | 74 | 79 | class2 | department1 |
| 7 | 122 | 89 | 86 | 85 | class2 | department1 |
| 8 | 123 | 70 | 78 | 61 | class2 | department1 |
| 9 | 124 | 76 | 70 | 76 | class2 | department1 |
| 10 | 211 | 89 | 93 | 60 | class1 | department2 |
| 11 | 212 | 76 | 83 | 75 | class1 | department2 |
| 12 | 213 | 71 | 94 | 90 | class1 | department2 |
| 13 | 214 | 94 | 94 | 66 | class1 | department2 |
| 14 | 215 | 84 | 82 | 73 | class1 | department2 |
| 15 | 216 | 85 | 74 | 93 | class1 | department2 |
| 16 | 221 | 77 | 99 | 61 | class2 | department2 |
| 17 | 222 | 80 | 78 | 96 | class2 | department2 |
| 18 | 223 | 79 | 74 | 96 | class2 | department2 |
| 19 | 224 | 75 | 80 | 78 | class2 | department2 |
| 20 | 225 | 82 | 85 | 63 | class2 | department2 |
+------+-----------+----------+------+---------+---------+--------------+
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# 2,聚合类开窗函数
- count() 开窗函数
-- 查询:
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
count(math) over() as count1,
-- 以按classId分组的所有行作为窗口 count(math) over(partition by classId) as count2,
-- 以按classId分组、按math排序的所有行作为窗口 count(math) over(partition by classId order by math) as count3,
-- 以按classId分组、按math排序、按 当前行+往前1行+往后2行的行作为窗口 count(math) over(partition by classId order by math rows between 1 preceding and 2 following) as count4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+--------+--------+--------+--------+
| studentId | math | departmentId | classId | count1 | count2 | count3 | count4 |
+-----------+------+--------------+---------+--------+--------+--------+--------+
| 111 | 69 | department1 | class1 | 9 | 5 | 1 | 3 |
| 113 | 74 | department1 | class1 | 9 | 5 | 2 | 4 |
| 112 | 80 | department1 | class1 | 9 | 5 | 3 | 4 |
| 115 | 93 | department1 | class1 | 9 | 5 | 4 | 3 |
| 114 | 94 | department1 | class1 | 9 | 5 | 5 | 2 |
| 124 | 70 | department1 | class2 | 9 | 4 | 1 | 3 |
| 121 | 74 | department1 | class2 | 9 | 4 | 2 | 4 |
| 123 | 78 | department1 | class2 | 9 | 4 | 3 | 3 |
| 122 | 86 | department1 | class2 | 9 | 4 | 4 | 2 |
+-----------+------+--------------+---------+--------+--------+--------+--------+
/*
结果解释:
studentid=115,count1为所有的行数9,count2为分区class1中的行数5,count3为分区class1中math值<=93的行数4,count4为分区class1中math值向前+1行向后+2行(实际只有1行)的总行数3。
*/
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
- sum() 开窗函数
#
sum开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
sum(math) over() as sum1,
-- 以按classId分组的所有行作为窗口 sum(math) over(partition by classId) as sum2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 sum(math) over(partition by classId order by math) as sum3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 sum(math) over(partition by classId order by math rows between 1 preceding and 2 following) as sum4
from student_scores
where departmentId = 'department1';
#
结果
+-----------+------+--------------+---------+------+------+------+------+
| studentId | math | departmentId | classId | sum1 | sum2 | sum3 | sum4 |
+-----------+------+--------------+---------+------+------+------+------+
| 111 | 69 | department1 | class1 | 718 | 410 | 69 | 223 |
| 113 | 74 | department1 | class1 | 718 | 410 | 143 | 316 |
| 112 | 80 | department1 | class1 | 718 | 410 | 223 | 341 |
| 115 | 93 | department1 | class1 | 718 | 410 | 316 | 267 |
| 114 | 94 | department1 | class1 | 718 | 410 | 410 | 187 |
| 124 | 70 | department1 | class2 | 718 | 308 | 70 | 222 |
| 121 | 74 | department1 | class2 | 718 | 308 | 144 | 308 |
| 123 | 78 | department1 | class2 | 718 | 308 | 222 | 238 |
| 122 | 86 | department1 | class2 | 718 | 308 | 308 | 164 |
+-----------+------+--------------+---------+------+------+------+------+
# 结果解释:同count开窗函数
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
- min() 开窗函数
-- min 开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
min(math) over() as min1,
-- 以按classId分组的所有行作为窗口 min(math) over(partition by classId) as min2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 min(math) over(partition by classId order by math) as min3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 min(math) over(partition by classId order by math rows between 1 preceding and 2 following) as min4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+------+------+------+------+
| studentId | math | departmentId | classId | min1 | min2 | min3 | min4 |
+-----------+------+--------------+---------+------+------+------+------+
| 111 | 69 | department1 | class1 | 69 | 69 | 69 | 69 |
| 113 | 74 | department1 | class1 | 69 | 69 | 69 | 69 |
| 112 | 80 | department1 | class1 | 69 | 69 | 69 | 74 |
| 115 | 93 | department1 | class1 | 69 | 69 | 69 | 80 |
| 114 | 94 | department1 | class1 | 69 | 69 | 69 | 93 |
| 124 | 70 | department1 | class2 | 69 | 70 | 70 | 70 |
| 121 | 74 | department1 | class2 | 69 | 70 | 70 | 70 |
| 123 | 78 | department1 | class2 | 69 | 70 | 70 | 74 |
| 122 | 86 | department1 | class2 | 69 | 70 | 70 | 78 |
+-----------+------+--------------+---------+------+------+------+------+
# 结果解释:同count开窗函数
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
- max() 开窗函数
-- min 开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
min(math) over() as min1,
-- 以按classId分组的所有行作为窗口 min(math) over(partition by classId) as min2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 min(math) over(partition by classId order by math) as min3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 min(math) over(partition by classId order by math rows between 1 preceding and 2 following) as min4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+------+------+------+------+
| studentId | math | departmentId | classId | max1 | max2 | max3 | max4 |
+-----------+------+--------------+---------+------+------+------+------+
| 111 | 69 | department1 | class1 | 94 | 94 | 69 | 80 |
| 113 | 74 | department1 | class1 | 94 | 94 | 74 | 93 |
| 112 | 80 | department1 | class1 | 94 | 94 | 80 | 94 |
| 115 | 93 | department1 | class1 | 94 | 94 | 93 | 94 |
| 114 | 94 | department1 | class1 | 94 | 94 | 94 | 94 |
| 124 | 70 | department1 | class2 | 94 | 86 | 70 | 78 |
| 121 | 74 | department1 | class2 | 94 | 86 | 74 | 86 |
| 123 | 78 | department1 | class2 | 94 | 86 | 78 | 86 |
| 122 | 86 | department1 | class2 | 94 | 86 | 86 | 86 |
+-----------+------+--------------+---------+------+------+------+------+
# 结果解释:同count开窗函数
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
- avg() 开窗函数
-- avg 开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
avg(math) over() as avg1,
-- 以按classId分组的所有行作为窗口 avg(math) over(partition by classId) as avg2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 avg(math) over(partition by classId order by math) as avg3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 avg(math) over(partition by classId order by math rows between 1 preceding and 2 following) as avg4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+---------+---------+---------+---------+
| studentId | math | departmentId | classId | avg1 | avg2 | avg3 | avg4 |
+-----------+------+--------------+---------+---------+---------+---------+---------+
| 111 | 69 | department1 | class1 | 79.7778 | 82.0000 | 69.0000 | 74.3333 |
| 113 | 74 | department1 | class1 | 79.7778 | 82.0000 | 71.5000 | 79.0000 |
| 112 | 80 | department1 | class1 | 79.7778 | 82.0000 | 74.3333 | 85.2500 |
| 115 | 93 | department1 | class1 | 79.7778 | 82.0000 | 79.0000 | 89.0000 |
| 114 | 94 | department1 | class1 | 79.7778 | 82.0000 | 82.0000 | 93.5000 |
| 124 | 70 | department1 | class2 | 79.7778 | 77.0000 | 70.0000 | 74.0000 |
| 121 | 74 | department1 | class2 | 79.7778 | 77.0000 | 72.0000 | 77.0000 |
| 123 | 78 | department1 | class2 | 79.7778 | 77.0000 | 74.0000 | 79.3333 |
| 122 | 86 | department1 | class2 | 79.7778 | 77.0000 | 77.0000 | 82.0000 |
+-----------+------+--------------+---------+---------+---------+---------+---------+
# 结果解释:同count开窗函数
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 3,专用开窗函数
排序开窗函数
rank() 开窗函数
rank 开窗函数基于 over 子句中的 order by 确定一组值中一个值的排名。如果存在partition by ,则为每个分区组中的每个值排名。排名可能不是连续的。例如,如果两个行的排名为 1,则下一个排名为 3( 即:1、2、2、4)。
-- rank 开窗函数
select *,
-- 对全部学生按数学分数排序
rank() over(order by math) as rank1,
-- 对院系 按数学分数排序 rank() over(partition by departmentId order by math) as rank2,
-- 对每个院系每个班级 按数学分数排序 rank() over(partition by departmentId,classId order by math) as rank3
from student_scores;
-- 结果
+------+-----------+----------+------+---------+---------+--------------+-------+-------+-------+
| id | studentId | language | math | english | classId | departmentId | rank1 | rank2 | rank3 |
+------+-----------+----------+------+---------+---------+--------------+-------+-------+-------+
| 1 | 111 | 68 | 69 | 90 | class1 | department1 | 1 | 1 | 1 |
| 3 | 113 | 90 | 74 | 75 | class1 | department1 | 3 | 3 | 2 |
| 2 | 112 | 73 | 80 | 96 | class1 | department1 | 9 | 6 | 3 |
| 5 | 115 | 99 | 93 | 89 | class1 | department1 | 15 | 8 | 4 |
| 4 | 114 | 89 | 94 | 93 | class1 | department1 | 17 | 9 | 5 |
| 9 | 124 | 76 | 70 | 76 | class2 | department1 | 2 | 2 | 1 |
| 6 | 121 | 96 | 74 | 79 | class2 | department1 | 3 | 3 | 2 |
| 8 | 123 | 70 | 78 | 61 | class2 | department1 | 7 | 5 | 3 |
| 7 | 122 | 89 | 86 | 85 | class2 | department1 | 14 | 7 | 4 |
| 15 | 216 | 85 | 74 | 93 | class1 | department2 | 3 | 1 | 1 |
| 14 | 215 | 84 | 82 | 73 | class1 | department2 | 11 | 5 | 2 |
| 11 | 212 | 76 | 83 | 75 | class1 | department2 | 12 | 6 | 3 |
| 10 | 211 | 89 | 93 | 60 | class1 | department2 | 15 | 8 | 4 |
| 12 | 213 | 71 | 94 | 90 | class1 | department2 | 17 | 9 | 5 |
| 13 | 214 | 94 | 94 | 66 | class1 | department2 | 17 | 9 | 5 |
| 18 | 223 | 79 | 74 | 96 | class2 | department2 | 3 | 1 | 1 |
| 17 | 222 | 80 | 78 | 96 | class2 | department2 | 7 | 3 | 2 |
| 19 | 224 | 75 | 80 | 78 | class2 | department2 | 9 | 4 | 3 |
| 20 | 225 | 82 | 85 | 63 | class2 | department2 | 13 | 7 | 4 |
| 16 | 221 | 77 | 99 | 61 | class2 | department2 | 20 | 11 | 5 |
+------+-----------+----------+------+---------+---------+--------------+-------+-------+-------+
# 结果解释
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
- dense_rank() 开窗函数
dense_rank与rank有一点不同,当排名一样的时候,接下来的行是连续的。如两个行的排名为 1,则下一个排名为 2( 即:1、2、2、3)。
-- dense_rank 开窗函数
select *,
-- 对全部学生按数学分数排序
dense_rank() over(order by math) as den_r1,
-- 对院系 按数学分数排序 dense_rank() over(partition by departmentId order by math) as den_r2,
-- 对每个院系每个班级 按数学分数排序 dense_rank() over(partition by departmentId,classId order by math) as den_r3
from student_scores;
-- 结果
+------+-----------+----------+------+---------+---------+--------------+--------+--------+--------+
| id | studentId | language | math | english | classId | departmentId | den_r1 | den_r2 | den_r3 |
+------+-----------+----------+------+---------+---------+--------------+--------+--------+--------+
| 1 | 111 | 68 | 69 | 90 | class1 | department1 | 1 | 1 | 1 |
| 3 | 113 | 90 | 74 | 75 | class1 | department1 | 3 | 3 | 2 |
| 2 | 112 | 73 | 80 | 96 | class1 | department1 | 5 | 5 | 3 |
| 5 | 115 | 99 | 93 | 89 | class1 | department1 | 10 | 7 | 4 |
| 4 | 114 | 89 | 94 | 93 | class1 | department1 | 11 | 8 | 5 |
| 9 | 124 | 76 | 70 | 76 | class2 | department1 | 2 | 2 | 1 |
| 6 | 121 | 96 | 74 | 79 | class2 | department1 | 3 | 3 | 2 |
| 8 | 123 | 70 | 78 | 61 | class2 | department1 | 4 | 4 | 3 |
| 7 | 122 | 89 | 86 | 85 | class2 | department1 | 9 | 6 | 4 |
| 15 | 216 | 85 | 74 | 93 | class1 | department2 | 3 | 1 | 1 |
| 14 | 215 | 84 | 82 | 73 | class1 | department2 | 6 | 4 | 2 |
| 11 | 212 | 76 | 83 | 75 | class1 | department2 | 7 | 5 | 3 |
| 10 | 211 | 89 | 93 | 60 | class1 | department2 | 10 | 7 | 4 |
| 12 | 213 | 71 | 94 | 90 | class1 | department2 | 11 | 8 | 5 |
| 13 | 214 | 94 | 94 | 66 | class1 | department2 | 11 | 8 | 5 |
| 18 | 223 | 79 | 74 | 96 | class2 | department2 | 3 | 1 | 1 |
| 17 | 222 | 80 | 78 | 96 | class2 | department2 | 4 | 2 | 2 |
| 19 | 224 | 75 | 80 | 78 | class2 | department2 | 5 | 3 | 3 |
| 20 | 225 | 82 | 85 | 63 | class2 | department2 | 8 | 6 | 4 |
| 16 | 221 | 77 | 99 | 61 | class2 | department2 | 12 | 9 | 5 |
+------+-----------+----------+------+---------+---------+--------------+--------+--------+--------+
# 结果解释:同count开窗函数
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
- row_number() 开窗函数
从1开始对分区内的数据排序。
-- row_number 开窗函数
select studentid,
departmentid,
classid,
math,
-- 对分区departmentid,classid内的数据按math排序
row_number() over(partition by departmentid,classid order by math) as row_num
from student_scores;
-- 结果
+-----------+--------------+---------+------+---------+
| studentid | departmentid | classid | math | row_num |
+-----------+--------------+---------+------+---------+
| 111 | department1 | class1 | 69 | 1 |
| 113 | department1 | class1 | 74 | 2 |
| 112 | department1 | class1 | 80 | 3 |
| 115 | department1 | class1 | 93 | 4 |
| 114 | department1 | class1 | 94 | 5 |
| 124 | department1 | class2 | 70 | 1 |
| 121 | department1 | class2 | 74 | 2 |
| 123 | department1 | class2 | 78 | 3 |
| 122 | department1 | class2 | 86 | 4 |
| 216 | department2 | class1 | 74 | 1 |
| 215 | department2 | class1 | 82 | 2 |
| 212 | department2 | class1 | 83 | 3 |
| 211 | department2 | class1 | 93 | 4 |
| 213 | department2 | class1 | 94 | 5 |
| 214 | department2 | class1 | 94 | 6 |
| 223 | department2 | class2 | 74 | 1 |
| 222 | department2 | class2 | 78 | 2 |
| 224 | department2 | class2 | 80 | 3 |
| 225 | department2 | class2 | 85 | 4 |
| 221 | department2 | class2 | 99 | 5 |
+-----------+--------------+---------+------+---------+
# 结果解释:注意参数不要与关键字 row_number()重名
。
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
分布函数:
percent_rank() 开窗函数
计算给定行的百分比排名。可以用来计算超过了百分之多少的人。如360小助手开机速度超过了百分之多少的人。
(当前行的rank值-1)/(分组内的总行数-1)
-- percent_rank 开窗函数
select studentid,
departmentid,
classid,
math,
row_number() over(partition by departmentid,classid order by math) as row_num, percent_rank() over(partition by departmentid,classid order by math) as per_rank
from student_scores;
-- 结果
+-----------+--------------+---------+------+---------+--------------------+
| studentid | departmentid | classid | math | row_num | per_rank |
+-----------+--------------+---------+------+---------+--------------------+
| 111 | department1 | class1 | 69 | 1 | 0 |
| 113 | department1 | class1 | 74 | 2 | 0.25 |
| 112 | department1 | class1 | 80 | 3 | 0.5 |
| 115 | department1 | class1 | 93 | 4 | 0.75 |
| 114 | department1 | class1 | 94 | 5 | 1 |
| 124 | department1 | class2 | 70 | 1 | 0 |
| 121 | department1 | class2 | 74 | 2 | 0.3333333333333333 |
| 123 | department1 | class2 | 78 | 3 | 0.6666666666666666 |
| 122 | department1 | class2 | 86 | 4 | 1 |
| 216 | department2 | class1 | 74 | 1 | 0 |
| 215 | department2 | class1 | 82 | 2 | 0.2 |
| 212 | department2 | class1 | 83 | 3 | 0.4 |
| 211 | department2 | class1 | 93 | 4 | 0.6 |
| 213 | department2 | class1 | 94 | 5 | 0.8 |
| 214 | department2 | class1 | 94 | 6 | 0.8 |
| 223 | department2 | class2 | 74 | 1 | 0 |
| 222 | department2 | class2 | 78 | 2 | 0.25 |
| 224 | department2 | class2 | 80 | 3 | 0.5 |
| 225 | department2 | class2 | 85 | 4 | 0.75 |
| 221 | department2 | class2 | 99 | 5 | 1 |
+-----------+--------------+---------+------+---------+--------------------+
# 结果解释:
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cume_dist() 开窗函数
计算某个窗口或分区中某个值的累积分布。假定升序排序,则使用以下公式确定累积分布:
小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中,x 等于 order by 子句中指定的列的当前行中的值。
-- cume_dist 开窗函数
select studentId,
math,
departmentId,
classId,
-- 统计小于等于当前分数的人数占总人数的比例
cume_dist() over(order by math) as cume_dist1,
-- 统计大于等于当前分数的人数占总人数的比例 cume_dist() over(order by math desc) as cume_dist2,
-- 统计分区内小于等于当前分数的人数占总人数的比例 cume_dist() over(partition by classId order by math) as cume_dist3
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+--------------------+--------------------+------------+
| studentId | math | departmentId | classId | cume_dist1 | cume_dist2 | cume_dist3 |
+-----------+------+--------------+---------+--------------------+--------------------+------------+
| 111 | 69 | department1 | class1 | 0.1111111111111111 | 1 | 0.2 |
| 113 | 74 | department1 | class1 | 0.4444444444444444 | 0.7777777777777778 | 0.4 |
| 112 | 80 | department1 | class1 | 0.6666666666666666 | 0.4444444444444444 | 0.6 |
| 115 | 93 | department1 | class1 | 0.8888888888888888 | 0.2222222222222222 | 0.8 |
| 114 | 94 | department1 | class1 | 1 | 0.1111111111111111 | 1 |
| 124 | 70 | department1 | class2 | 0.2222222222222222 | 0.8888888888888888 | 0.25 |
| 121 | 74 | department1 | class2 | 0.4444444444444444 | 0.7777777777777778 | 0.5 |
| 123 | 78 | department1 | class2 | 0.5555555555555556 | 0.5555555555555556 | 0.75 |
| 122 | 86 | department1 | class2 | 0.7777777777777778 | 0.3333333333333333 | 1 |
+-----------+------+--------------+---------+--------------------+--------------------+------------+
# 结果解释:
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
前后函数:
lag(expr,n) 开窗函数
lag(col,n,default) 用于统计窗口内往上第n个值。
col:列名
n:往上第n行
default:往上第n行为NULL时候,取默认值,不指定则取NULL
-- lag 开窗函数
select studentId,
math,
departmentId,
classId,
-- 窗口内 往上取第二个 取不到时赋默认值60
lag(math, 2, 60) over(partition by classId order by math) as lag1,
-- 窗口内 往上取第二个 取不到时赋默认值NULL lag(math, 2) over(partition by classId order by math) as lag2
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+------+------+
| studentId | math | departmentId | classId | lag1 | lag2 |
+-----------+------+--------------+---------+------+------+
| 111 | 69 | department1 | class1 | 60 | NULL |
| 113 | 74 | department1 | class1 | 60 | NULL |
| 112 | 80 | department1 | class1 | 69 | 69 |
| 115 | 93 | department1 | class1 | 74 | 74 |
| 114 | 94 | department1 | class1 | 80 | 80 |
| 124 | 70 | department1 | class2 | 60 | NULL |
| 121 | 74 | department1 | class2 | 60 | NULL |
| 123 | 78 | department1 | class2 | 70 | 70 |
| 122 | 86 | department1 | class2 | 74 | 74 |
+-----------+------+--------------+---------+------+------+
/*
结果解释:
第3行(studentId=112),lag1:窗口内(69 74 80) 当前行80 向上取第二个值为69
倒数第3行(studentId=121),lag2:窗口内(70 74) 当前行74 向上取第二个值为NULL
*/
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
lead(expr,n) 开窗函数
lead(col,n,default) 用于统计窗口内往下第n个值。
col:列名
n:往下第n行
default:往下第n行为NULL时候,取默认值,不指定则取NULL
-- lead开窗函数
select studentId,
math,
departmentId,
classId,
-- 窗口内 往下取第二个 取不到时赋默认值60
lead(math, 2, 60) over(partition by classId order by math) as lead1,
-- 窗口内 往下取第二个 取不到时赋默认值NULL lead(math, 2) over(partition by classId order by math) as lead2
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+-------+-------+
| studentId | math | departmentId | classId | lead1 | lead2 |
+-----------+------+--------------+---------+-------+-------+
| 111 | 69 | department1 | class1 | 80 | 80 |
| 113 | 74 | department1 | class1 | 93 | 93 |
| 112 | 80 | department1 | class1 | 94 | 94 |
| 115 | 93 | department1 | class1 | 60 | NULL |
| 114 | 94 | department1 | class1 | 60 | NULL |
| 124 | 70 | department1 | class2 | 78 | 78 |
| 121 | 74 | department1 | class2 | 86 | 86 |
| 123 | 78 | department1 | class2 | 60 | NULL |
| 122 | 86 | department1 | class2 | 60 | NULL |
+-----------+------+--------------+---------+-------+-------+
# 结果解释:第4行lead1 窗口内向下第二个值为空
,赋值60
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
头尾函数:
first_value(expr) 开窗函数
返回分区中的第一个值。
-- first_value 开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
first_value(math) over() as f_v1,
-- 以按classId分组的所有行作为窗口 first_value(math) over(partition by classId) as f_v2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 first_value(math) over(partition by classId order by math) as f_v3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 first_value(math) over(partition by classId order by math rows between 1 preceding and 2 following) as f_v4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+------+------+------+------+
| studentId | math | departmentId | classId | f_v1 | f_v2 | f_v3 | f_v4 |
+-----------+------+--------------+---------+------+------+------+------+
| 111 | 69 | department1 | class1 | 69 | 69 | 69 | 69 |
| 113 | 74 | department1 | class1 | 69 | 69 | 69 | 69 |
| 112 | 80 | department1 | class1 | 69 | 69 | 69 | 74 |
| 115 | 93 | department1 | class1 | 69 | 69 | 69 | 80 |
| 114 | 94 | department1 | class1 | 69 | 69 | 69 | 93 |
| 124 | 70 | department1 | class2 | 69 | 74 | 70 | 70 |
| 121 | 74 | department1 | class2 | 69 | 74 | 70 | 70 |
| 123 | 78 | department1 | class2 | 69 | 74 | 70 | 74 |
| 122 | 86 | department1 | class2 | 69 | 74 | 70 | 78 |
+-----------+------+--------------+---------+------+------+------+------+
/*
结果解释:
1,studentid=124 first_value1:第一个值是69,first_value2:classId=class1分区 math的第一个值是69。
2,studentId=124行,为什么 f_v2=74?
因为:f_v2,只是分组,并没有排序,而 studentId=121 行,在class2中是第一个出现的,所以f_v2就取了studentId=121行,math的值,正是,74。
student_scores 表单数据,参考 数据准备的数据输出。
3,注意:order by之后, rows between num1 preceding and num2 followin 的使用。
// order by math rows between 1 preceding and 2 followin
*/
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
last_value(expr) 开窗函数
返回分区中的最后一个值。
-- last_value 开窗函数
select studentId,
math,
departmentId,
classId,
-- 以符合条件的所有行作为窗口
last_value(math) over() as last_v1,
-- 以按classId分组的所有行作为窗口 last_value(math) over(partition by classId) as last_v2,
-- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口 last_value(math) over(partition by classId order by math) as last_v3,
-- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口 last_value(math) over(partition by classId order by math rows between 1 preceding and 2 following) as last_v4
from student_scores
where departmentId = 'department1';
-- 结果
+-----------+------+--------------+---------+---------+---------+---------+---------+
| studentId | math | departmentId | classId | last_v1 | last_v2 | last_v3 | last_v4 |
+-----------+------+--------------+---------+---------+---------+---------+---------+
| 111 | 69 | department1 | class1 | 70 | 93 | 69 | 80 |
| 113 | 74 | department1 | class1 | 70 | 93 | 74 | 93 |
| 112 | 80 | department1 | class1 | 70 | 93 | 80 | 94 |
| 115 | 93 | department1 | class1 | 70 | 93 | 93 | 94 |
| 114 | 94 | department1 | class1 | 70 | 93 | 94 | 94 |
| 124 | 70 | department1 | class2 | 70 | 70 | 70 | 78 |
| 121 | 74 | department1 | class2 | 70 | 70 | 74 | 86 |
| 123 | 78 | department1 | class2 | 70 | 70 | 78 | 86 |
| 122 | 86 | department1 | class2 | 70 | 70 | 86 | 86 |
+-----------+------+--------------+---------+---------+---------+---------+---------+
/*
结果解释:
studentId=111 行,last_v1=70 参考student_scores表单原始数据。
studentId=111 行,last_v2=93 参考student_scores表单原始数据。
*/
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
其他函数:
ntile(n) 开窗函数
将分区中已排序的行划分为大小尽可能相等的指定数量的排名的组,并返回给定行所在的组的排名。
即:尽可能的平均分租,返回所在的组是第几组。
-- ntile 开窗函数
select studentId,
math,
classId,
departmentId,
-- 对分区内的数据分成两组
ntile(2) over(partition by departmentid order by math) as n_1,
-- 对分区内的数据分成三组 ntile(3) over(partition by departmentid order by math) as n_2
from student_scores;
-- 结果
+-----------+------+---------+--------------+------+------+
| studentId | math | classId | departmentId | n_1 | n_2 |
+-----------+------+---------+--------------+------+------+
| 111 | 69 | class1 | department1 | 1 | 1 |
| 124 | 70 | class2 | department1 | 1 | 1 |
| 113 | 74 | class1 | department1 | 1 | 1 |
| 121 | 74 | class2 | department1 | 1 | 2 |
| 123 | 78 | class2 | department1 | 1 | 2 |
| 112 | 80 | class1 | department1 | 2 | 2 |
| 122 | 86 | class2 | department1 | 2 | 3 |
| 115 | 93 | class1 | department1 | 2 | 3 |
| 114 | 94 | class1 | department1 | 2 | 3 |
| 216 | 74 | class1 | department2 | 1 | 1 |
| 223 | 74 | class2 | department2 | 1 | 1 |
| 222 | 78 | class2 | department2 | 1 | 1 |
| 224 | 80 | class2 | department2 | 1 | 1 |
| 215 | 82 | class1 | department2 | 1 | 2 |
| 212 | 83 | class1 | department2 | 1 | 2 |
| 225 | 85 | class2 | department2 | 2 | 2 |
| 211 | 93 | class1 | department2 | 2 | 2 |
| 213 | 94 | class1 | department2 | 2 | 3 |
| 214 | 94 | class1 | department2 | 2 | 3 |
| 221 | 99 | class2 | department2 | 2 | 3 |
+-----------+------+---------+--------------+------+------+
/*
结果解释:
n_1:department1 被分为2组,第一组5个,第二组:4个。department2 被分为2组,第一组6个,第二组5个。
n_2:department1 被分为3组,第一组3个,第二组:3个,第三组3个。department2 被分为3组,第一组4个,第二组4个,第三组3个。
*/
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
- NTH_VALU(expr,n) 开窗函数
-- NTH_VALU 开窗函数
SELECT studentId,
math,
classId,
departmentId,
-- 按照departmentid分区,math降序,取第二个的成绩。
nth_value(math, 2) OVER (PARTITION BY departmentid ORDER BY math DESC) AS n_1 ,
-- 按照departmentid分区,math升序,取第一个的成绩。 nth_value(math, 1) OVER (PARTITION BY departmentid ORDER BY math asc) AS n_2
FROM student_scores;
-- 结果
+-----------+------+---------+--------------+------+------+
| studentId | math | classId | departmentId | n_1 | n_2 |
+-----------+------+---------+--------------+------+------+
| 111 | 69 | class1 | department1 | 93 | 69 |
| 124 | 70 | class2 | department1 | 93 | 69 |
| 113 | 74 | class1 | department1 | 93 | 69 |
| 121 | 74 | class2 | department1 | 93 | 69 |
| 123 | 78 | class2 | department1 | 93 | 69 |
| 112 | 80 | class1 | department1 | 93 | 69 |
| 122 | 86 | class2 | department1 | 93 | 69 |
| 115 | 93 | class1 | department1 | 93 | 69 |
| 114 | 94 | class1 | department1 | NULL | 69 |
| 216 | 74 | class1 | department2 | 94 | 74 |
| 223 | 74 | class2 | department2 | 94 | 74 |
| 222 | 78 | class2 | department2 | 94 | 74 |
| 224 | 80 | class2 | department2 | 94 | 74 |
| 215 | 82 | class1 | department2 | 94 | 74 |
| 212 | 83 | class1 | department2 | 94 | 74 |
| 225 | 85 | class2 | department2 | 94 | 74 |
| 211 | 93 | class1 | department2 | 94 | 74 |
| 213 | 94 | class1 | department2 | 94 | 74 |
| 214 | 94 | class1 | department2 | 94 | 74 |
| 221 | 99 | class2 | department2 | NULL | 74 |
+-----------+------+---------+--------------+------+------+
# 结果解释:为什么会出现 NULL
?
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 五,常见报错
1,ERROR 1064
MySQL
ERROR 1064 (42000)
:
-- 1,使用了系统保留的关键字,如果非要使用,对关键字家反斜杠:`keyw0rd`
-- 2,没有权限;
-- 3,在给列设置string类型的时候,如果指定列的类型为varchar则需要指定该类型的最大‘容量值‘,若为char类型则不必指定。
2
3
4
5
6
2,创建表字符串类型:
-- MySQL 声明字符串:
name
varchar(10)
-- hive 声明字符串:
name string
2
3
4
5
3,插入数据:
MySQL
insert into student_scores values
-- 注意:insert into table student_scores values //不能带 table。
2
3
4,关键字重名
-- 报错:
row_number
() over(partition by departmentid,classId order by math) as row_number
-- 正确:
row_number() over(partition by departmentId,classId order by math) as row_num
-- 原因:as row_number 与 row_number() 属于关键字row_number重名;
2
3
4
5
6
7
5,注释不规范
MySQL语句中,做注释,#、-- 和注释内容之间需要加空格。否则报错。
ERROR
1064 (42000): xxxxx;
check the manual that corresponds to your MySQL server version for the right syntax to use near
‘默认60
-- 注释
2
3
4
5