新系统上线, 用户基数 16 万, 各种查询 timeout. 打开砂锅问到底, 直接看 sql 语句吧, 都是泪呀, 一大堆 in\not in\except. 这里总结一下, 怎么替换掉 in\not in\except.
1. in/except->left join
查询目的:
根据
客户表 (Customer, 按照站点, 册本划分, 16 万数据)
水表表 (Meter,16 万数据)
水表抄表数据表 (Meter_Data, 远传表每天更新, 27 万数据)
关联查询, 查询某天某个册本下水表未上传抄表数据的用户.
原查询结构
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No in
- (
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- where cs.Group_No = '册本编号'
- except
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号'
- )
原查询思路
查询出目标册本已上传数据的用户编号
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号'
查询出目标册本全部用户编号
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- where cs.Group_No = '册本编号'
全部用户编号中排除已上传数据的用户编号, 即为未上传数据的用户编号
全部用户编号 except 已抄表的用户编号
查询出在未抄表用户编号集合中的用户信息.
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No in
- (全部用户编号 except 已抄表的用户编号)
思路倒是没有问题, 但是 in+except 查询效率不要太慢了, 本来想测试个时间, 结果执行了几分钟愣是没出结果, 直接终止掉了
优化查询结构
其实 in\not in\except 这些语法在查询中使用, 效率不高是公认的事实, 但是可能是由于语义比较明显吧, 很多人还是喜欢这样用. 我们这里使用 left join 来替代 in+except. 这里就来改掉上面的查询:
- select cs.*
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and md.meter_no is null;
优化查询思路
用 left join 代替 in+except, 通过 left join 获取目标册本下全部用户的信息, 并与当天上传的抄表数据进行连接;
连接中, 右表为空即抄表数据为空的, 即为当前未上传数据的客户信息;
left join on expression where expression 执行时, 首先确保左表数据全部返回, 然后应用 on 后指定的条件. 因此, on 的条件如果是对左表数据的过滤, 是无效的; 对右表数据的过滤是有效的. 对左表数据的过滤条件, 需要放到 where 条件中.
2. not in->left join
上面 in+except 的写法, 可以使用 not in 简化一下, 但是一样效率不高. 这里想要说明的是 not in 也可以很方便的使用 left join 替换.
not in 结构
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No not in
- (
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号'
- )
left join 结构
- select cs.*
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and md.meter_no is null;
- 3. in->inner join
查询目的
还是上面的查询背景, 这里查询某天某个册本已经上传抄表数据的用户信息.
in 结构
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No in
- (
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号'
- )
这里使用 in 不够高效, 但是我们使用 left join 是否可以呢?
left join 结构
- select cs.*
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- left join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and md.meter_no is not null;
left join 结构的话, 这里需要使用 is not null 作为筛选条件. 但是 is not null 同样非常低效. 因此我们使用 inner join
inner join 结构
- select cs.*
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号';
inner join 通过连接操作, 直接获取到已上传抄表数据的用户信息.
4. not in -> in -> inner join
前面的查询场景中, 我们默认的条件是未上传抄表数据的用户, 当天在 meter_data 表是没有记录的. 现在假设我们每天凌晨初始化 meter_data 表, 设置抄表数值默认为零, 抄表数据上传默认为 state=0 未上传. 上传后, 更新抄表数值和抄表状态 state=1.
这时, 我们来优化上面的 not in 查询结构还有另外一种思路.
not in 结构
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No not in
- (
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and meter.state=1
- )
in 结构
通过筛选条件取反, 变换 not in->in
- select *
- from Customer cs
- where
- cs.Group_No = '册本编号' and
- cs.Customer_No in
- (
- select Customer_No
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and meter.state=0
- )
inner join 结构
- select cs.*
- from Customer cs
- left join Meter me on cs.Customer_No = me.Customer_No
- inner join Meter_data md on me.meter_no = md.meter_no and md.date = '2019-04-09'
- where cs.Group_NO='册本编号' and meter.state=0;
5. 总结如下
上面的查询结构拆分出来后, 大家可能觉得这么简单的 sql 怎么可能写成这个沙雕. 其实真实业务系统, 还有关联其他将近 10 张表. 这里想说的是, 在 in\not in\except 这种查询结构时, 如果涉及到的数据量较大, 建议坚决用连接替换.
... in (all except sub)... 查询结构可以转换为 ->left join
... not in ... 查询结构可以转换为 ->left join
... not in ... 查询也可以转换为 in -> inner join, 这里需要确认转换查询条件时, 是否有对应的数据
... in 查询结构可以转换为 ->inner join
来源: https://www.cnblogs.com/zhangdk/p/notintoleftjoin.html