当前位置：

首页
/
IT
/
数据库
/
MySQL
/
MySQL - 巧用 join 来优化 sql

MySQL - 巧用 join 来优化 sql

0. 准备相关表来进行接下来的测试

相关建表语句请看: https://github.com/YangBaohust/my_sql

user1 表, 取经组

+----+-----------+-----------------+---------------------------------+
| id | user_name | comment         | mobile                          |
+----+-----------+-----------------+---------------------------------+

| 1 | 唐僧 | 旃檀功德佛 | 138245623,021-382349 |

| 2 | 孙悟空 | 斗战胜佛 | 159384292,022-483432,+86-392432 |

| 3 | 猪八戒 | 净坛使者 | 183208243,055-8234234 |

| 4 | 沙僧 | 金身罗汉 | 293842295,098-2383429 |

| 5 | NULL | 白龙马 | 993267899 |

+----+-----------+-----------------+---------------------------------+

user2 表, 悟空的朋友圈

+----+--------------+-----------+
| id | user_name    | comment   |
+----+--------------+-----------+

| 1 | 孙悟空 | 美猴王 |

| 2 | 牛魔王 | 牛哥 |

| 3 | 铁扇公主 | 牛夫人 |

| 4 | 菩提老祖 | 葡萄 |

| 5 | NULL | 晶晶 |

+----+--------------+-----------+

user1_kills 表, 取经路上杀的妖怪数量

+----+-----------+---------------------+-------+
| id | user_name | timestr             | kills |
+----+-----------+---------------------+-------+

| 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 |

| 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 |

| 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 |

| 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 |

| 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

| 8 | 沙僧 | 2013-01-10 00:00:00 | 3 |

| 9 | 沙僧 | 2013-01-22 00:00:00 | 9 |

| 10 | 沙僧 | 2013-02-11 00:00:00 | 5 |

+----+-----------+---------------------+-------+

user1_equipment 表, 取经组装备

+----+-----------+--------------+-----------------+-----------------+
| id | user_name | arms         | clothing        | shoe            |
+----+-----------+--------------+-----------------+-----------------+

| 1 | 唐僧 | 九环锡杖 | 锦斓袈裟 | 僧鞋 |

| 2 | 孙悟空 | 金箍棒 | 梭子黄金甲 | 藕丝步云履 |

| 3 | 猪八戒 | 九齿钉耙 | 僧衣 | 僧鞋 |

| 4 | 沙僧 | 降妖宝杖 | 僧衣 | 僧鞋 |

+----+-----------+--------------+-----------------+-----------------+

1. 使用 left join 优化 not in 子句

例子: 找出取经组中不属于悟空朋友圈的人

+----+-----------+-----------------+-----------------------+
| id | user_name | comment         | mobile                |
+----+-----------+-----------------+-----------------------+

| 1 | 唐僧 | 旃檀功德佛 | 138245623,021-382349 |

| 3 | 猪八戒 | 净坛使者 | 183208243,055-8234234 |

| 4 | 沙僧 | 金身罗汉 | 293842295,098-2383429 |

+----+-----------+-----------------+-----------------------+

not in 写法:

select * from user1 a where a.user_name not in (select user_name from user2 where user_name is not null);

left join 写法:

首先看通过 user_name 进行连接的外连接数据集

select a.*, b.* from user1 a left join user2 b on (a.user_name = b.user_name);
+----+-----------+-----------------+---------------------------------+------+-----------+-----------+
| id | user_name | comment         | mobile                          | id   | user_name | comment   |
+----+-----------+-----------------+---------------------------------+------+-----------+-----------+

| 2 | 孙悟空 | 斗战胜佛 | 159384292,022-483432,+86-392432 | 1 | 孙悟空 | 美猴王 |

| 1 | 唐僧 | 旃檀功德佛 | 138245623,021-382349 | NULL | NULL | NULL |

| 3 | 猪八戒 | 净坛使者 | 183208243,055-8234234 | NULL | NULL | NULL |

| 4 | 沙僧 | 金身罗汉 | 293842295,098-2383429 | NULL | NULL | NULL |

| 5 | NULL | 白龙马 | 993267899 | NULL | NULL | NULL |

+----+-----------+-----------------+---------------------------------+------+-----------+-----------+

可以看到 a 表中的所有数据都有显示, b 表中的数据只有 b.user_name 与 a.user_name 相等才显示, 其余都以 null 值填充, 要想找出取经组中不属于悟空朋友圈的人, 只需要在 b.user_name 中加一个过滤条件 b.user_name is null 即可.

select a.* from user1 a left join user2 b on (a.user_name = b.user_name) where b.user_name is null;
+----+-----------+-----------------+-----------------------+
| id | user_name | comment         | mobile                |
+----+-----------+-----------------+-----------------------+

| 1 | 唐僧 | 旃檀功德佛 | 138245623,021-382349 |

| 3 | 猪八戒 | 净坛使者 | 183208243,055-8234234 |

| 4 | 沙僧 | 金身罗汉 | 293842295,098-2383429 |

| 5 | NULL | 白龙马 | 993267899 |

+----+-----------+-----------------+-----------------------+

看到这里发现结果集中还多了一个白龙马, 继续添加过滤条件 a.user_name is not null 即可.

select a.* from user1 a left join user2 b on (a.user_name = b.user_name) where b.user_name is null and a.user_name is not null;

2. 使用 left join 优化标量子查询

例子: 查看取经组中的人在悟空朋友圈的昵称

+-----------+-----------------+-----------+
| user_name | comment         | comment2  |
+-----------+-----------------+-----------+

| 唐僧 | 旃檀功德佛 | NULL |

| 孙悟空 | 斗战胜佛 | 美猴王 |

| 猪八戒 | 净坛使者 | NULL |

| 沙僧 | 金身罗汉 | NULL |

| NULL | 白龙马 | NULL |

+-----------+-----------------+-----------+

子查询写法:

select a.user_name, a.comment, (select comment from user2 b where b.user_name = a.user_name) comment2 from user1 a;

left join 写法:

select a.user_name, a.comment, b.comment comment2 from user1 a left join user2 b on (a.user_name = b.user_name);

3. 使用 join 优化聚合子查询

例子: 查询出取经组中每人打怪最多的日期

+----+-----------+---------------------+-------+
| id | user_name | timestr             | kills |
+----+-----------+---------------------+-------+

| 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

| 9 | 沙僧 | 2013-01-22 00:00:00 | 9 |

+----+-----------+---------------------+-------+

聚合子查询写法:

select * from user1_kills a where a.kills = (select max(b.kills) from user1_kills b where b.user_name = a.user_name);

join 写法:

首先看两表自关联的结果集, 为节省篇幅, 只取猪八戒的打怪数据来看

select a.*, b.* from user1_kills a join user1_kills b on (a.user_name = b.user_name) order by 1;
+----+-----------+---------------------+-------+----+-----------+---------------------+-------+
| id | user_name | timestr             | kills | id | user_name | timestr             | kills |
+----+-----------+---------------------+-------+----+-----------+---------------------+-------+

| 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 | 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 |

| 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 | 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 |

| 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 | 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

| 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 | 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

| 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 | 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 |

| 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 | 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 | 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 | 6 | 猪八戒 | 2013-02-07 00:00:00 | 17 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 | 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

+----+-----------+---------------------+-------+----+-----------+---------------------+-------+

可以看到当两表通过 user_name 进行自关联, 只需要对 a 表的所有字段进行一个 group by, 取 b 表中的 max(kills), 只要 a.kills=max(b.kills)就满足要求了. sql 如下

select a.* from user1_kills a join user1_kills b on (a.user_name = b.user_name) group by a.id, a.user_name, a.timestr, a.kills having a.kills = max(b.kills);

4. 使用 join 进行分组选择

例子: 对第 3 个例子进行升级, 查询出取经组中每人打怪最多的前两个日期

+----+-----------+---------------------+-------+
| id | user_name | timestr             | kills |
+----+-----------+---------------------+-------+

| 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 |

| 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 5 | 猪八戒 | 2013-01-11 00:00:00 | 20 |

| 7 | 猪八戒 | 2013-02-08 00:00:00 | 35 |

| 9 | 沙僧 | 2013-01-22 00:00:00 | 9 |

| 10 | 沙僧 | 2013-02-11 00:00:00 | 5 |

+----+-----------+---------------------+-------+

在 oracle 中, 可以通过分析函数来实现

select b.* from (select a.*, row_number() over(partition by user_name order by kills desc) cnt from user1_kills a) b where b.cnt <= 2;

很遗憾, 上面 sql 在 MySQL 中报错 ERROR 1064 (42000): You have an error in your SQL syntax; 因为 MySQL 并不支持分析函数. 不过可以通过下面的方式去实现.

首先对两表进行自关联, 为了节约篇幅, 只取出孙悟空的数据

select a.*, b.* from user1_kills a join user1_kills b on (a.user_name=b.user_name and a.kills<=b.kills) order by a.user_name, a.kills desc;
+----+-----------+---------------------+-------+----+-----------+---------------------+-------+
| id | user_name | timestr             | kills | id | user_name | timestr             | kills |
+----+-----------+---------------------+-------+----+-----------+---------------------+-------+

| 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 | 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 | 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 |

| 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 | 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 | 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 |

| 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 | 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 |

| 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 | 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 | 1 | 孙悟空 | 2013-01-10 00:00:00 | 10 |

| 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 | 3 | 孙悟空 | 2013-02-05 00:00:00 | 12 |

| 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 | 4 | 孙悟空 | 2013-02-12 00:00:00 | 22 |

| 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 | 2 | 孙悟空 | 2013-02-01 00:00:00 | 2 |

+----+-----------+---------------------+-------+----+-----------+---------------------+-------+

从上面的表中我们知道孙悟空打怪前两名的数量是 22 和 12, 那么只需要对 a 表的所有字段进行一个 group by, 对 b 表的 id 做个 count,count 值小于等于 2 就满足要求, sql 改写如下:

select a.* from user1_kills a join user1_kills b on (a.user_name=b.user_name and a.kills<=b.kills) group by a.id, a.user_name, a.timestr, a.kills having count(b.id) <= 2;

5. 使用笛卡尔积关联实现一列转多行

例子: 将取经组中每个电话号码变成一行

原始数据:

+-----------+---------------------------------+
| user_name | mobile                          |
+-----------+---------------------------------+

| 唐僧 | 138245623,021-382349 |

| 孙悟空 | 159384292,022-483432,+86-392432 |

| 猪八戒 | 183208243,055-8234234 |

| 沙僧 | 293842295,098-2383429 |

| NULL      | 993267899                       |
+-----------+---------------------------------+

想要得到的数据:

+-----------+-------------+
| user_name | mobile      |
+-----------+-------------+

| 唐僧 | 138245623 |

| 唐僧 | 021-382349 |

| 孙悟空 | 159384292 |

| 孙悟空 | 022-483432 |

| 孙悟空 | +86-392432 |

| 猪八戒 | 183208243 |

| 猪八戒 | 055-8234234 |

| 沙僧 | 293842295 |

| 沙僧 | 098-2383429 |

| NULL      | 993267899   |
+-----------+-------------+

可以看到唐僧有两个电话, 因此他就需要两行. 我们可以先求出每人的电话号码数量, 然后与一张序列表进行笛卡儿积关联, 为了节约篇幅, 只取出唐僧的数据

select a.id, b.* from tb_sequence a cross join (select user_name, mobile, length(mobile)-length(replace(mobile, ',', ''))+1 size from user1) b order by 2,1;
+----+-----------+---------------------------------+------+
| id | user_name | mobile                          | size |
+----+-----------+---------------------------------+------+

| 1 | 唐僧 | 138245623,021-382349 | 2 |

| 2 | 唐僧 | 138245623,021-382349 | 2 |

| 3 | 唐僧 | 138245623,021-382349 | 2 |

| 4 | 唐僧 | 138245623,021-382349 | 2 |

| 5 | 唐僧 | 138245623,021-382349 | 2 |

| 6 | 唐僧 | 138245623,021-382349 | 2 |

| 7 | 唐僧 | 138245623,021-382349 | 2 |

| 8 | 唐僧 | 138245623,021-382349 | 2 |

| 9 | 唐僧 | 138245623,021-382349 | 2 |

| 10 | 唐僧 | 138245623,021-382349 | 2 |

+----+-----------+---------------------------------+------+

a.id 对应的就是第几个电话号码, size 就是总的电话号码数量, 因此可以加上关联条件(a.id <= b.size), 将上面的 sql 继续调整

select b.user_name, replace(substring(substring_index(b.mobile, ',', a.id), char_length(substring_index(mobile, ',', a.id-1)) + 1), ',', '') as mobile from tb_sequence a cross join (select user_name, concat(mobile,',') as mobile, length(mobile)-length(replace(mobile,',',''))+1 size from user1) b on (a.id <= b.size);

6. 使用笛卡尔积关联实现多列转多行

例子: 将取经组中每件装备变成一行

原始数据:

+----+-----------+--------------+-----------------+-----------------+
| id | user_name | arms         | clothing        | shoe            |
+----+-----------+--------------+-----------------+-----------------+

| 1 | 唐僧 | 九环锡杖 | 锦斓袈裟 | 僧鞋 |

| 2 | 孙悟空 | 金箍棒 | 梭子黄金甲 | 藕丝步云履 |

| 3 | 猪八戒 | 九齿钉耙 | 僧衣 | 僧鞋 |

| 4 | 沙僧 | 降妖宝杖 | 僧衣 | 僧鞋 |

+----+-----------+--------------+-----------------+-----------------+

想要得到的数据:

+-----------+-----------+-----------------+
| user_name | equipment | equip_mame      |
+-----------+-----------+-----------------+

| 唐僧 | arms | 九环锡杖 |

| 唐僧 | clothing | 锦斓袈裟 |

| 唐僧 | shoe | 僧鞋 |

| 孙悟空 | arms | 金箍棒 |

| 孙悟空 | clothing | 梭子黄金甲 |

| 孙悟空 | shoe | 藕丝步云履 |

| 沙僧 | arms | 降妖宝杖 |

| 沙僧 | clothing | 僧衣 |

| 沙僧 | shoe | 僧鞋 |

| 猪八戒 | arms | 九齿钉耙 |

| 猪八戒 | clothing | 僧衣 |

| 猪八戒 | shoe | 僧鞋 |

+-----------+-----------+-----------------+

union 的写法:

select user_name, 'arms' as equipment, arms equip_mame from user1_equipment
union all
select user_name, 'clothing' as equipment, clothing equip_mame from user1_equipment
union all
select user_name, 'shoe' as equipment, shoe equip_mame from user1_equipment
order by 1, 2;

join 的写法:

首先看笛卡尔数据集的效果, 以唐僧为例

select a.*, b.* from user1_equipment a cross join tb_sequence b where b.id <= 3;
+----+-----------+--------------+-----------------+-----------------+----+
| id | user_name | arms         | clothing        | shoe            | id |
+----+-----------+--------------+-----------------+-----------------+----+

| 1 | 唐僧 | 九环锡杖 | 锦斓袈裟 | 僧鞋 | 1 |

| 1 | 唐僧 | 九环锡杖 | 锦斓袈裟 | 僧鞋 | 2 |

| 1 | 唐僧 | 九环锡杖 | 锦斓袈裟 | 僧鞋 | 3 |

+----+-----------+--------------+-----------------+-----------------+----+

使用 case 对上面的结果进行处理

select user_name,
case when b.id = 1 then 'arms'
when b.id = 2 then 'clothing'
when b.id = 3 then 'shoe' end as equipment,
case when b.id = 1 then arms end arms,
case when b.id = 2 then clothing end clothing,
case when b.id = 3 then shoe end shoe
from user1_equipment a cross join tb_sequence b where b.id <=3;
+-----------+-----------+--------------+-----------------+-----------------+
| user_name | equipment | arms         | clothing        | shoe            |
+-----------+-----------+--------------+-----------------+-----------------+

+-----------+-----------+--------------+-----------------+-----------------+

使用 coalesce 函数将多列数据进行合并

select user_name,
case when b.id = 1 then 'arms'
when b.id = 2 then 'clothing'
when b.id = 3 then 'shoe' end as equipment,
coalesce(case when b.id = 1 then arms end,
case when b.id = 2 then clothing end,
case when b.id = 3 then shoe end) equip_mame
from user1_equipment a cross join tb_sequence b where b.id <=3 order by 1, 2;

7. 使用 join 更新过滤条件中包含自身的表

例子: 把同时存在于取经组和悟空朋友圈中的人, 在取经组中把 comment 字段更新为 "此人在悟空的朋友圈"

我们很自然地想到先查出 user1 和 user2 中 user_name 都存在的人, 然后更新 user1 表, sql 如下

update user1 set comment = '此人在悟空的朋友圈' where user_name in (select a.user_name from user1 a join user2 b on (a.user_name = b.user_name));

很遗憾, 上面 sql 在 MySQL 中报错: ERROR 1093 (HY000): You can't specify target table'user1' for update in FROM clause, 提示不能更新目标表在 from 子句的表.

那有没有其它办法呢? 我们可以将 in 的写法转换成 join 的方式

select c.*, d.* from user1 c join (select a.user_name from user1 a join user2 b on (a.user_name = b.user_name)) d on (c.user_name = d.user_name);
+----+-----------+--------------+---------------------------------+-----------+
| id | user_name | comment      | mobile                          | user_name |
+----+-----------+--------------+---------------------------------+-----------+

| 2 | 孙悟空 | 斗战胜佛 | 159384292,022-483432,+86-392432 | 孙悟空 |

+----+-----------+--------------+---------------------------------+-----------+

然后对 join 之后的视图进行更新即可

update user1 c join (select a.user_name from user1 a join user2 b on (a.user_name = b.user_name)) d on (c.user_name = d.user_name) set c.comment = '此人在悟空的朋友圈';

再查看 user1, 可以看到 user1 已修改成功

select * from user1;
+----+-----------+-----------------------------+---------------------------------+
| id | user_name | comment                     | mobile                          |
+----+-----------+-----------------------------+---------------------------------+

| 1 | 唐僧 | 旃檀功德佛 | 138245623,021-382349 |

| 2 | 孙悟空 | 此人在悟空的朋友圈 | 159384292,022-483432,+86-392432 |

| 3 | 猪八戒 | 净坛使者 | 183208243,055-8234234 |

| 4 | 沙僧 | 金身罗汉 | 293842295,098-2383429 |

| 5 | NULL | 白龙马 | 993267899 |

+----+-----------+-----------------------------+---------------------------------+

8. 使用 join 删除重复数据

首先向 user2 表中插入两条数据

insert into user2(user_name, comment) values ('孙悟空', '美猴王');
insert into user2(user_name, comment) values ('牛魔王', '牛哥');

例子: 将 user2 表中的重复数据删除, 只保留 id 号大的

+----+--------------+-----------+
| id | user_name    | comment   |
+----+--------------+-----------+

| 1 | 孙悟空 | 美猴王 |

| 2 | 牛魔王 | 牛哥 |

| 3 | 铁扇公主 | 牛夫人 |

| 4 | 菩提老祖 | 葡萄 |

| 5 | NULL | 晶晶 |

| 6 | 孙悟空 | 美猴王 |

| 7 | 牛魔王 | 牛哥 |

+----+--------------+-----------+

首先查看重复记录

select a.*, b.* from user2 a join (select user_name, comment, max(id) id from user2 group by user_name, comment having count(*)> 1) b on (a.user_name=b.user_name and a.comment=b.comment) order by 2;
+----+-----------+-----------+-----------+-----------+------+
| id | user_name | comment   | user_name | comment   | id   |
+----+-----------+-----------+-----------+-----------+------+

| 1 | 孙悟空 | 美猴王 | 孙悟空 | 美猴王 | 6 |

| 6 | 孙悟空 | 美猴王 | 孙悟空 | 美猴王 | 6 |

| 2 | 牛魔王 | 牛哥 | 牛魔王 | 牛哥 | 7 |

| 7 | 牛魔王 | 牛哥 | 牛魔王 | 牛哥 | 7 |

+----+-----------+-----------+-----------+-----------+------+

接着只需要删除 (a.id <b.id) 的数据即可

delete a from user2 a join (select user_name, comment, max(id) id from user2 group by user_name, comment having count(*)> 1) b on (a.user_name=b.user_name and a.comment=b.comment) where a.id < b.id;

查看 user2, 可以看到重复数据已经被删掉了

select * from user2;
+----+--------------+-----------+
| id | user_name    | comment   |
+----+--------------+-----------+

| 3 | 铁扇公主 | 牛夫人 |

| 4 | 菩提老祖 | 葡萄 |

| 5 | NULL | 晶晶 |

| 6 | 孙悟空 | 美猴王 |

| 7 | 牛魔王 | 牛哥 |

+----+--------------+-----------+

总结: 给大家就介绍到这里, 大家有兴趣可以多造点数据, 然后比较不同的 sql 写法在执行时间上的区别. 本文例子取自于慕课网《sql 开发技巧》.

来源: https://www.cnblogs.com/ddzj01/p/11346954.html

与本文相关文章

暂无,快来抢沙发吧！