标签
PostgreSQL , 空间聚合 , 空间热力图 , 行政区 , 电子围栏
背景
某个时间区间 (或其他条件), 出现在某些围栏, 行政区(多边形信息) 中的对象 (空间点信息) 有多少, 按颜色深浅渲染这些多边形.
例如
例子
1, 面数据 - 围栏, 行政区(多边形信息)
生成 1 万个面数据
create table t1 (
gid int, -- 多边形 ID(用户定义的围栏, 行政区)
face box -- 空间信息, 实际使用时, 可以使用 PostGIS 的 geometry 类型
- );
- insert into t1 select row_number() over(), box (point(x,y),point(x+1,y+1)) from generate_series(0,99) t1(x),generate_series(0,99) t2(y);
- create index idx_t1_face on t1 using Gist(face);
2, 点数据, 空间对象的位置信息
生成 1000 万点数据
create table t2 (
id int, -- 对象 ID
pos point, -- 位置, 实际使用时, 可以使用 PostGIS 的 geometry 类型
att text -- 其他属性, 可以有更多其他属性
- );
- insert into t2 select id, point(random()*100, random()*100) from generate_series(1,10000000) t(id);
3, 创建函数, 输入点的值, 获得面的值.
支持并行 SQL
- create or replace function get_gid(point) returns int as $$
- select gid from t1 where face @> box($1,$1) limit 1;
- $$ language sql strict immutable parallel safe;
4, 按 GID 空间聚合的 SQL 如下
select gid, count(*) from t1 join t2 on (t1.face @> box(t2.pos,t2.pos)) group by gid;
或
select get_gid(pos) as gid, count(*) from t2 group by 1;
5, 使用 PG 并行聚合
- postgres=# show max_worker_processes ;
- max_worker_processes
- ----------------------
- 128
- (1 row)
- set max_parallel_workers=128;
- set max_parallel_workers_per_gather =28;
- set enable_sort=off;
- set parallel_tuple_cost =0;
- set parallel_setup_cost =0;
- set min_parallel_table_scan_size =0;
- set min_parallel_index_scan_size =0;
- set work_mem ='2GB';
- alter table t1 set (parallel_workers =28);
- alter table t2 set (parallel_workers =28);
- analyze t1;
- analyze t2;
1000 万点, 1 万面, 空间聚合性能
5.6 秒
- postgres=# explain analyze select get_gid(pos) as gid, count(*) from t2 group by 1;
- QUERY PLAN
- -----------------------------------------------------------------------------------------------------------------------------------------
- Finalize HashAggregate (cost=86550.79..86594.18 rows=200 width=12) (actual time=5592.898..5594.204 rows=10000 loops=1)
- Group Key: (get_gid(pos))
- -> Gather (cost=86483.13..86526.52 rows=5600 width=12) (actual time=5485.528..5536.356 rows=290000 loops=1)
- Workers Planned: 28
- Workers Launched: 28
- -> Partial HashAggregate (cost=86483.13..86526.52 rows=200 width=12) (actual time=5443.795..5445.860 rows=10000 loops=29)
- Group Key: get_gid(pos)
- -> Parallel Seq Scan on t2 (cost=0.00..84806.71 rows=386720 width=4) (actual time=1.014..5311.532 rows=344828 loops=29)
- Planning Time: 0.118 ms
- Execution Time: 5595.278 ms
- (10 rows)
100 万点, 1 万面, 空间聚合性能
690 毫秒
- truncate t2;
- insert into t2 select id, point(random()*100, random()*100) from generate_series(1,1000000) t(id);
- postgres=# explain analyze select get_gid(pos) as gid, count(*) from t2 group by 1;
- QUERY PLAN
- -------------------------------------------------------------------------------------------------------------------------------------
- Finalize HashAggregate (cost=8061.46..8104.85 rows=200 width=12) (actual time=687.602..688.897 rows=10000 loops=1)
- Group Key: (get_gid(pos))
- -> Gather (cost=7993.80..8037.18 rows=5600 width=12) (actual time=582.986..632.877 rows=280419 loops=1)
- Workers Planned: 28
- Workers Launched: 28
- -> Partial HashAggregate (cost=7993.80..8037.18 rows=200 width=12) (actual time=541.534..543.355 rows=9670 loops=29)
- Group Key: get_gid(pos)
- -> Parallel Seq Scan on t2 (cost=0.00..7838.98 rows=35714 width=4) (actual time=1.010..527.932 rows=34483 loops=29)
- Planning Time: 0.130 ms
- Execution Time: 689.867 ms
- (10 rows)
如果预先聚合 - 速度更快
比如数据写入时, 就把 GID 算出来, 写入 T2 表的 GID 字段里, 直接按 GID 聚合. 速度会飞起来.
1000 万点, 258 毫秒.
- postgres=# alter table t2 add column gid int;
- ALTER TABLE
-- 假设 GID 已提前算好(比如 insert 时直接设置为 get_gid(pos), 这里只是为了测试, 一次性全部更新掉)
- postgres=# update t2 set gid = get_gid(pos);
- UPDATE 10000000
- postgres=# vacuum full t2;
- VACUUM
- postgres=# explain analyze select gid,count(*) from t2 group by gid;
- QUERY PLAN
- --------------------------------------------------------------------------------------------------------------------------------------
- Finalize HashAggregate (cost=3882.87..3884.76 rows=9983 width=12) (actual time=255.763..257.094 rows=10000 loops=1)
- Group Key: gid
- -> Gather (cost=2669.26..2671.14 rows=279524 width=12) (actual time=135.953..200.398 rows=290000 loops=1)
- Workers Planned: 28
- Workers Launched: 28
- -> Partial HashAggregate (cost=2669.26..2671.14 rows=9983 width=12) (actual time=98.026..99.585 rows=10000 loops=29)
- Group Key: gid
- -> Parallel Seq Scan on t2 (cost=0.00..1121.05 rows=357143 width=4) (actual time=0.005..30.248 rows=344828 loops=29)
- Planning Time: 0.078 ms
- Execution Time: 258.268 ms
- (10 rows)
参考
PostGIS 比 内置 geometry 操作符稍慢.
《PostgreSQL Oracle 兼容性之 - 自定义并行聚合函数 PARALLEL_ENABLE AGGREGATE》
《HTAP 数据库 PostgreSQL 场景与性能测试之 23 - (OLAP) 并行计算》
http://postgis.net/
来源: https://yq.aliyun.com/articles/700357