当前位置：

首页
/
IT
/
程序
/
C/C++
/
PostgreSQL 设计优化 case - 大宽表任意字段组合查询索引如何选择 (btree, gin, rum) - (含单个索引列数超过 32 列的方法)

PostgreSQL 设计优化 case - 大宽表任意字段组合查询索引如何选择 (btree, gin, rum) - (含单个索引列数超过 32 列的方法)

德哥 2019-04-14 10:07:49 浏览 32 评论 0

数据存储与数据库

PostgreSQL
postgres

性能

SQL
random
kernel

索引

analyze
aggregate
index

摘要: 标签 PostgreSQL , adhoc 查询 , 大宽表 , 任意字段组合查询 , 索引 , btree , gin , rum 背景大宽表, 任意字段组合查询, 透视. 是实时分析系统中的常见需求: 1, 实时写入.

标签

PostgreSQL , adhoc 查询 , 大宽表 , 任意字段组合查询 , 索引 , btree , gin , rum

背景

大宽表, 任意字段组合查询, 透视. 是实时分析系统中的常见需求:

1, 实时写入.

2, 实时任意字段组合查询, 透视.

PostgreSQL 中, 有多种方法支持这种场景:

《PostgreSQL 9 种索引的原理和应用场景》

1, 方法 1, 每个字段加一个索引 (普通类型用 btree, 多值类型(数组, JSON, 全文检索, 枚举等类型) 用 gin, 空间, 范围类型用 Gist, 时序类型用 brin).

2, 方法 2, 使用单个复合索引, 把所有字段放到 GIN 倒排索引接口中.

3, 方法 3, 使用单个复合索引, 把所有字段放到 RUM 索引接口中.

那么这三种方法, 应该选哪种更好呢?

或者说选哪种, 可以在写入, 查询性能上获得更好的平衡.

让单个复合索引支持超过 32 个字段

注意, PG 默认情况下, 仅支持一个索引中, 最多放 32 个字段.

通过修改 src/include/pg_config_manual.h , 重新编译, 可以支持更多的字段. 但是注意, 支持的字段越多, 索引的头信息会越大.

vi src/include/pg_config_manual.h
/*
 * Maximum number of columns in an index.  There is little point in making
 * this anything but a multiple of 32, because the main cost is associated
 * with index tuple header size (see access/itup.h).
 *
 * Changing this requires an initdb.
 */
// #define INDEX_MAX_KEYS   32
#define INDEX_MAX_KEYS   128
src/include/access/itup.h
/*
 * Index tuple header structure
 *
 * All index tuples start with IndexTupleData.  If the HasNulls bit is set,
 * this is followed by an IndexAttributeBitMapData.  The index attribute
 * values follow, beginning at a MAXALIGN boundary.
 *
 * Note that the space allocated for the bitmap does not vary with the number
 * of attributes; that is because we don't have room to store the number of
 * attributes in the header.  Given the MAXALIGN constraint there's no space
 * savings to be had anyway, for usual values of INDEX_MAX_KEYS.
 */

如果字段允许 NULL, 则有一个数据结构 IndexAttributeBitMapData 用来表示哪个字段的值是 NULL(与 TUPLE 的头信息类似, 也有 NULL BITMAP 的表示).

因此如果字段有 NULL, 则索引条目的的头信息中, 会多出若干 BIT, 是固定大小的.

typedef struct IndexAttributeBitMapData
{
        bits8           bits[(INDEX_MAX_KEYS + 8 - 1) / 8];
}

INDEX_MAX_KEYS = 32 时, 4 字节.

对比每列独立索引, GIN 单个全字段复合索引, RUM 单个全字段复合索引.

安装测试 PostgreSQL 软件

1, 编译软件

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf PostgreSQL-snapshot.tar.bz2
cd PostgreSQL-12devel

参考上面的章节修改源码 src/include/pg_config_manual.h , 支持单个索引放 128 个字段.

export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql12
LIBS=-lpthread CFLAGS="-O3" make world -j 128
LIBS=-lpthread CFLAGS="-O3" make install-world

安装 rum 插件, 略.

https://github.com/postgrespro/rum

2, 设置环境变量

vi env12.sh
export PS1="$USER@`/bin/hostname -s`->"
export PGPORT=8000
export PGDATA=/data01/pg/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql12
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
. ./env12.sh

3, 初始化数据库集群

initdb -D $PGDATA -U postgres -X /data02/pg/pg_wal_8000 -E SQL_ASCII --locale=C

4, 修改数据库配置

vi $PGDATA/PostgreSQL.conf
listen_addresses = '0.0.0.0'
port = 8000
max_connections = 1000
unix_socket_directories = '/tmp,.'
shared_buffers = 32GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 0
max_worker_processes = 128
wal_level = replica
synchronous_commit = off
wal_buffers = 16MB
wal_writer_delay = 10ms
checkpoint_timeout = 35min
max_wal_size = 64GB
min_wal_size = 16GB
checkpoint_completion_target = 0.1
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 1200000000
autovacuum_multixact_freeze_max_age = 1400000000
autovacuum_vacuum_cost_delay = 0ms
vacuum_freeze_table_age = 1150000000
vacuum_multixact_freeze_table_age = 1150000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
jit = off

5, 启动数据库

pg_ctl start

构建测试表

1, 分别创建 a b c 3 张表, 结构一样, 都是 128 个字段. 按要求创建索引

每列一个 BTREE 索引

do language plpgsql $$
declare
  sql text := 'create table a (';
begin
  for i in 1..128 loop
    sql := sql || 'c' || i || 'int default random()*2000000000,';
  end loop;
  sql := rtrim(sql,',');
  sql := sql || ')';
  execute sql;
  for i in 1..128 loop
    execute 'create index idx_a_'||i||'on a (c'||i||')';
  end loop;
end;
$$;

所有列放到一个 GIN 索引中

(注意 GIN 索引有 FASTUPDATE 参数, 开启的时候, 写入性能很好, 但是会延迟合并 PENDING TUPLES, 写入快了, 查询就可能慢(如果在 PENDING TUPLES 合并慢, 并且很大时, 越慢), 后面会有例子)

create extension btree_gin;
do language plpgsql $$
declare
  sql text := 'create table b (';
begin
  for i in 1..128 loop
    sql := sql || 'c' || i || 'int default random()*2000000000,';
  end loop;
  sql := rtrim(sql,',');
  sql := sql || ')';
  execute sql;
  sql := 'create index idx_b_1 on b using gin (';
  for i in 1..128 loop
    sql := sql || 'c' || i || ',';
  end loop;
  sql := rtrim(sql,',');
  sql := sql || ')';
  execute sql;
end;
$$;

所有列放到一个 RUM 索引中

create extension rum;
do language plpgsql $$
declare
  sql text := 'create table c (';
begin
  for i in 1..128 loop
    sql := sql || 'c' || i || 'int default random()*2000000000,';
  end loop;
  sql := rtrim(sql,',');
  sql := sql || ')';
  execute sql;
  sql := 'create index idx_c_1 on c using rum (';
  for i in 1..128 loop
    sql := sql || 'c' || i || ',';
  end loop;
  sql := rtrim(sql,',');
  sql := sql || ')';
  execute sql;
end;
$$;

性能测试

1, 创建三个性能测试脚本, 分别用于测试三张表的写入性能

vi test_btree.sql
\set c1 random(1,2000000000)
insert into a (c1) values (:c1);
vi test_gin.sql
\set c1 random(1,2000000000)
insert into b (c1) values (:c1);
vi test_rum.sql
\set c1 random(1,2000000000)
insert into c (c1) values (:c1);

2, 独立 128 个 btree 索引的写入性能.

pgbench -M prepared -n -r -P 1 -f ./test_btree.sql -c 28 -j 28 -T 120
transaction type: ./test_btree.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 1906387
latency average = 1.762 ms
latency stddev = 0.693 ms
tps = 15886.268820 (including connections establishing)
tps = 15887.521828 (excluding connections establishing)
statement latencies in milliseconds:
         0.002  \set c1 random(1,2000000000)
         1.761  insert into a (c1) values (:c1);

1.59 万行 / s

Samples: 1M of event 'cpu-clock', Event count (approx.): 216077120900
Overhead  Shared Object        Symbol
  12.07%  postgres             [.] _bt_compare
   9.85%  postgres             [.] hash_search_with_hash_value
   6.44%  [kernel]             [k] _raw_spin_unlock_irqrestore
   5.33%  postgres             [.] LWLockAcquire
   4.17%  [kernel]             [k] __do_softirq
   3.90%  [kernel]             [k] run_timer_softirq
   3.50%  postgres             [.] PinBuffer
   3.20%  postgres             [.] _bt_relandgetbuf
   2.24%  libc-2.17.so         [.] __memset_sse2
   2.14%  postgres             [.] _bt_moveright
   2.07%  postgres             [.] LWLockRelease
   1.88%  libc-2.17.so         [.] __memmove_ssse3_back
   1.59%  [kernel]             [k] finish_task_switch
   1.25%  postgres             [.] LWLockReleaseClearVar
   1.06%  postgres             [.] MarkBufferDirty
   0.99%  libc-2.17.so         [.] __memcpy_ssse3_back

3, 一个 GIN 复合索引 (开启 FASTUPDATE) 的写入性能

pgbench -M prepared -n -r -P 1 -f ./test_gin.sql -c 28 -j 28 -T 120
transaction type: ./test_gin.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 4791575
latency average = 0.701 ms
latency stddev = 32.171 ms
tps = 39929.190873 (including connections establishing)
tps = 39932.416884 (excluding connections establishing)
statement latencies in milliseconds:
         0.001  \set c1 random(1,2000000000)
         0.700  insert into b (c1) values (:c1);

3.99 万行 / s

Samples: 698K of event 'cpu-clock', Event count (approx.): 119873063495
Overhead  Shared Object        Symbol
   9.70%  libc-2.17.so         [.] __memset_sse2
   7.79%  [kernel]             [k] _raw_spin_unlock_irqrestore
   3.93%  postgres             [.] ExecInitExprRec
   3.13%  postgres             [.] AllocSetAlloc
   2.69%  postgres             [.] ExecInitFunc
   2.12%  postgres             [.] TupleDescInitEntry
   2.04%  [kernel]             [k] finish_task_switch
   1.64%  postgres             [.] hash_search_with_hash_value
   1.57%  postgres             [.] SearchCatCache1
   1.56%  libc-2.17.so         [.] bsearch
   1.54%  libc-2.17.so         [.] __memcpy_ssse3_back
   1.40%  postgres             [.] expression_tree_walker
   1.33%  postgres             [.] palloc0
   1.29%  [kernel]             [k] run_timer_softirq
   1.27%  postgres             [.] dispatch_compare_ptr
   1.25%  postgres             [.] fmgr_info
   0.96%  postgres             [.] LWLockAcquire

4, 一个 RUM 复合索引的写入性能

pgbench -M prepared -n -r -P 1 -f ./test_rum.sql -c 28 -j 28 -T 120
transaction type: ./test_rum.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 453539
latency average = 7.408 ms
latency stddev = 11.713 ms
tps = 3779.393984 (including connections establishing)
tps = 3779.643084 (excluding connections establishing)
statement latencies in milliseconds:
         0.002  \set c1 random(1,2000000000)
         7.406  insert into c (c1) values (:c1);

0.378 万行 / s

Samples: 260K of event 'cpu-clock', Event count (approx.): 56451610586
Overhead  Shared Object       Symbol
  18.58%  postgres            [.] computeRegionDelta
   5.74%  postgres            [.] GenericXLogRegisterBuffer
   5.68%  [kernel]            [k] _raw_spin_unlock_irqrestore
   5.25%  rum.so              [.] rumtuple_get_attrnum
   4.11%  [kernel]            [k] __do_softirq
   3.51%  postgres            [.] hash_search_with_hash_value
   3.44%  rum.so              [.] rumFindLeafPage
   3.38%  libc-2.17.so        [.] __memcpy_ssse3_back
   3.10%  libc-2.17.so        [.] __memset_sse2
   2.77%  [kernel]            [k] run_timer_softirq
   2.70%  postgres            [.] LWLockAcquire
   2.07%  postgres            [.] PinBuffer
   1.71%  rum.so              [.] entryLocateLeafEntry
   1.49%  postgres            [.] LWLockReleaseClearVar
   1.36%  [kernel]            [k] copy_user_enhanced_fast_string
   1.07%  postgres            [.] LWLockRelease
   0.97%  rum.so              [.] entryLocateEntry
   0.89%  postgres            [.] UnpinBuffer.constprop.6

写入性能小结

很显然, 性能最好的是 GIN(开启 FASTUPDATE)时的性能, 其次是独立索引, 最后是 RUM 索引.

查询性能

1, 都测试了 120 秒, 写入量如下

postgres=# \dt+
                    List of relations
 Schema | Name | Type  |  Owner   |  Size   | Description
--------+------+-------+----------+---------+-------------
 public | a    | table | postgres | 993 MB  |
 public | b    | table | postgres | 2497 MB |
 public | c    | table | postgres | 237 MB  |
(2 rows)
postgres=# select count(*) from a;
  count
---------
 1906387
(1 row)
postgres=# select count(*) from b;
  count
---------
 4791575
(1 row)
postgres=# select count(*) from c;
 count
--------
 453539
(1 row)

2, 索引大小, 很显然 GIN 索引很大, 超过了所有 BTREE 索引加起来的大小, 并且还没有完全合并所有的 pending tuples.

postgres=# \di+
                          List of relations
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description
--------+-----------+-------+----------+-------+-------+-------------
 public | idx_a_1   | index | postgres | a     | 53 MB |
 public | idx_a_10  | index | postgres | a     | 53 MB |
 public | idx_a_100 | index | postgres | a     | 54 MB |
 public | idx_a_101 | index | postgres | a     | 54 MB |
 public | idx_a_102 | index | postgres | a     | 53 MB |
 public | idx_a_103 | index | postgres | a     | 54 MB |
 public | idx_a_104 | index | postgres | a     | 54 MB |
 public | idx_a_105 | index | postgres | a     | 53 MB |
 public | idx_a_106 | index | postgres | a     | 53 MB |
 public | idx_a_107 | index | postgres | a     | 54 MB |
 public | idx_a_108 | index | postgres | a     | 54 MB |
 public | idx_a_109 | index | postgres | a     | 54 MB |
 public | idx_a_11  | index | postgres | a     | 54 MB |
 public | idx_a_110 | index | postgres | a     | 53 MB |
 public | idx_a_111 | index | postgres | a     | 53 MB |
 public | idx_a_112 | index | postgres | a     | 54 MB |
 public | idx_a_113 | index | postgres | a     | 54 MB |
 public | idx_a_114 | index | postgres | a     | 53 MB |
 public | idx_a_115 | index | postgres | a     | 53 MB |
 public | idx_a_116 | index | postgres | a     | 54 MB |
 public | idx_a_117 | index | postgres | a     | 53 MB |
 public | idx_a_118 | index | postgres | a     | 54 MB |
 public | idx_a_119 | index | postgres | a     | 54 MB |
 public | idx_a_12  | index | postgres | a     | 54 MB |
 public | idx_a_120 | index | postgres | a     | 54 MB |
 public | idx_a_121 | index | postgres | a     | 53 MB |
 public | idx_a_122 | index | postgres | a     | 53 MB |
 public | idx_a_123 | index | postgres | a     | 53 MB |
 public | idx_a_124 | index | postgres | a     | 54 MB |
 public | idx_a_125 | index | postgres | a     | 54 MB |
 public | idx_a_126 | index | postgres | a     | 53 MB |
 public | idx_a_127 | index | postgres | a     | 53 MB |
 public | idx_a_128 | index | postgres | a     | 54 MB |
 public | idx_a_13  | index | postgres | a     | 54 MB |
 public | idx_a_14  | index | postgres | a     | 54 MB |
 public | idx_a_15  | index | postgres | a     | 54 MB |
 public | idx_a_16  | index | postgres | a     | 53 MB |
 public | idx_a_17  | index | postgres | a     | 53 MB |
 public | idx_a_18  | index | postgres | a     | 53 MB |
 public | idx_a_19  | index | postgres | a     | 53 MB |
 public | idx_a_2   | index | postgres | a     | 53 MB |
 public | idx_a_20  | index | postgres | a     | 53 MB |
 public | idx_a_21  | index | postgres | a     | 53 MB |
 public | idx_a_22  | index | postgres | a     | 53 MB |
 public | idx_a_23  | index | postgres | a     | 53 MB |
 public | idx_a_24  | index | postgres | a     | 54 MB |
 public | idx_a_25  | index | postgres | a     | 53 MB |
 public | idx_a_26  | index | postgres | a     | 54 MB |
 public | idx_a_27  | index | postgres | a     | 54 MB |
 public | idx_a_28  | index | postgres | a     | 53 MB |
 public | idx_a_29  | index | postgres | a     | 54 MB |
 public | idx_a_3   | index | postgres | a     | 54 MB |
 public | idx_a_30  | index | postgres | a     | 54 MB |
 public | idx_a_31  | index | postgres | a     | 53 MB |
 public | idx_a_32  | index | postgres | a     | 53 MB |
 public | idx_a_33  | index | postgres | a     | 53 MB |
 public | idx_a_34  | index | postgres | a     | 54 MB |
 public | idx_a_35  | index | postgres | a     | 53 MB |
 public | idx_a_36  | index | postgres | a     | 53 MB |
 public | idx_a_37  | index | postgres | a     | 53 MB |
 public | idx_a_38  | index | postgres | a     | 54 MB |
 public | idx_a_39  | index | postgres | a     | 54 MB |
 public | idx_a_4   | index | postgres | a     | 54 MB |
 public | idx_a_40  | index | postgres | a     | 53 MB |
 public | idx_a_41  | index | postgres | a     | 54 MB |
 public | idx_a_42  | index | postgres | a     | 53 MB |
 public | idx_a_43  | index | postgres | a     | 53 MB |
 public | idx_a_44  | index | postgres | a     | 53 MB |
 public | idx_a_45  | index | postgres | a     | 53 MB |
 public | idx_a_46  | index | postgres | a     | 54 MB |
 public | idx_a_47  | index | postgres | a     | 54 MB |
 public | idx_a_48  | index | postgres | a     | 54 MB |
 public | idx_a_49  | index | postgres | a     | 53 MB |
 public | idx_a_5   | index | postgres | a     | 54 MB |
 public | idx_a_50  | index | postgres | a     | 54 MB |
 public | idx_a_51  | index | postgres | a     | 53 MB |
 public | idx_a_52  | index | postgres | a     | 54 MB |
 public | idx_a_53  | index | postgres | a     | 54 MB |
 public | idx_a_54  | index | postgres | a     | 53 MB |
 public | idx_a_55  | index | postgres | a     | 54 MB |
 public | idx_a_56  | index | postgres | a     | 54 MB |
 public | idx_a_57  | index | postgres | a     | 53 MB |
 public | idx_a_58  | index | postgres | a     | 53 MB |
 public | idx_a_59  | index | postgres | a     | 53 MB |
 public | idx_a_6   | index | postgres | a     | 53 MB |
 public | idx_a_60  | index | postgres | a     | 54 MB |
 public | idx_a_61  | index | postgres | a     | 53 MB |
 public | idx_a_62  | index | postgres | a     | 54 MB |
 public | idx_a_63  | index | postgres | a     | 54 MB |
 public | idx_a_64  | index | postgres | a     | 54 MB |
 public | idx_a_65  | index | postgres | a     | 53 MB |
 public | idx_a_66  | index | postgres | a     | 54 MB |
 public | idx_a_67  | index | postgres | a     | 53 MB |
 public | idx_a_68  | index | postgres | a     | 54 MB |
 public | idx_a_69  | index | postgres | a     | 54 MB |
 public | idx_a_7   | index | postgres | a     | 54 MB |
 public | idx_a_70  | index | postgres | a     | 53 MB |
 public | idx_a_71  | index | postgres | a     | 54 MB |
 public | idx_a_72  | index | postgres | a     | 54 MB |
 public | idx_a_73  | index | postgres | a     | 54 MB |
 public | idx_a_74  | index | postgres | a     | 54 MB |
 public | idx_a_75  | index | postgres | a     | 54 MB |
 public | idx_a_76  | index | postgres | a     | 53 MB |
 public | idx_a_77  | index | postgres | a     | 54 MB |
 public | idx_a_78  | index | postgres | a     | 53 MB |
 public | idx_a_79  | index | postgres | a     | 53 MB |
 public | idx_a_8   | index | postgres | a     | 53 MB |
 public | idx_a_80  | index | postgres | a     | 53 MB |
 public | idx_a_81  | index | postgres | a     | 53 MB |
 public | idx_a_82  | index | postgres | a     | 54 MB |
 public | idx_a_83  | index | postgres | a     | 53 MB |
 public | idx_a_84  | index | postgres | a     | 53 MB |
 public | idx_a_85  | index | postgres | a     | 54 MB |
 public | idx_a_86  | index | postgres | a     | 53 MB |
 public | idx_a_87  | index | postgres | a     | 54 MB |
 public | idx_a_88  | index | postgres | a     | 53 MB |
 public | idx_a_89  | index | postgres | a     | 53 MB |
 public | idx_a_9   | index | postgres | a     | 54 MB |
 public | idx_a_90  | index | postgres | a     | 54 MB |
 public | idx_a_91  | index | postgres | a     | 54 MB |
 public | idx_a_92  | index | postgres | a     | 54 MB |
 public | idx_a_93  | index | postgres | a     | 53 MB |
 public | idx_a_94  | index | postgres | a     | 53 MB |
 public | idx_a_95  | index | postgres | a     | 54 MB |
 public | idx_a_96  | index | postgres | a     | 53 MB |
 public | idx_a_97  | index | postgres | a     | 53 MB |
 public | idx_a_98  | index | postgres | a     | 53 MB |
 public | idx_a_99  | index | postgres | a     | 54 MB |
 public | idx_b_1   | index | postgres | b     | 17 GB |
(129 rows)
postgres=# \di+
                          List of relations
 Schema |  Name   | Type  |  Owner   | Table |  Size   | Description
--------+---------+-------+----------+-------+---------+-------------
 public | idx_c_1 | index | postgres | c     | 2250 MB |
(1 row)

3, 多列查询 CASE 1, 一个条件选择性好, 一个条件选择性差

postgres=# explain analyze select count(*) from a where c1<100000 and c2<10000000;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=104.57..104.58 rows=1 width=8) (actual time=0.332..0.332 rows=1 loops=1)
   ->  Index Scan using idx_a_1 on a  (cost=0.43..104.56 rows=1 width=0) (actual time=0.327..0.327 rows=0 loops=1)
         Index Cond: (c1 <100000)
         Filter: (c2 < 10000000)
         Rows Removed by Filter: 92
 Planning Time: 0.977 ms
 Execution Time: 0.399 ms
(7 rows)
postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=351003.54..351003.55 rows=1 width=8) (actual time=752.842..752.842 rows=1 loops=1)
   ->  Gather  (cost=351003.32..351003.53 rows=2 width=8) (actual time=752.829..752.834 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=350003.32..350003.33 rows=1 width=8) (actual time=546.163..546.164 rows=1 loops=3)
               ->  Parallel Seq Scan on b  (cost=0.00..349448.64 rows=221871 width=0) (actual time=499.082..546.152 rows=0 loops=3)
                     Filter: ((c1 <100000) AND (c2 < 10000000))
                     Rows Removed by Filter: 1597191
 Planning Time: 0.669 ms
 Execution Time: 792.562 ms
(10 rows)
postgres=# explain analyze select count(*) from c where c1<100000 and c2<10000000;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=18.71..18.72 rows=1 width=8) (actual time=0.776..0.776 rows=1 loops=1)
   ->  Index Scan using idx_c_1 on c  (cost=16.50..18.71 rows=1 width=0) (actual time=0.773..0.773 rows=0 loops=1)
         Index Cond: ((c1 <100000) AND (c2 < 10000000))
 Planning Time: 0.077 ms
 Execution Time: 0.815 ms
(5 rows)

显然, 性能最好的是独立索引, 因为选择性好的一个就好了, 只是少量的 FILTER

为什么 GIN 性能不行呢? 理论上应该很好呀.

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
       230891 |      1842877 |           5600 |         1587579 |          4205010 |             2 |             1 |            0 |         0 |       2
(1 row)

原因, 开启了 FASTUPDATE 的情况下, 写入超快, 但是合并 PENDING TUPLES 只有一个 AUTOVACUUM WORKER, 所以合并没有写入快, 导致了 pending tuples 越来越大. 如上, 有几百万条还没有合并.

强制合并:

vacuum analyze b;

耗时需要很久很久, 所以实际上开启 gin 索引的 fastupdate, 如果写入长时间持续太猛了, 并不好. 因为合并跟不上, 查询性能跟不上.

修改 GIN 索引的 fastupdate, 关闭, 重测

postgres=#   alter index idx_b_1 set (fastupdate =off);
ALTER INDEX

重新测试

transaction type: ./test_gin.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 1638833
latency average = 1.985 ms
latency stddev = 0.572 ms
tps = 13656.742727 (including connections establishing)
tps = 13657.785073 (excluding connections establishing)
statement latencies in milliseconds:
         0.002  \set c1 random(1,2000000000)
         1.983  insert into b (c1) values (:c1);

写入速度从 3.99 万行 / s 下降到了 1.36 万行 / s

Samples: 963K of event 'cpu-clock', Event count (approx.): 191435286728
Overhead  Shared Object       Symbol
   9.26%  postgres            [.] PinBuffer
   8.74%  postgres            [.] LWLockAcquire
   7.85%  postgres            [.] gintuple_get_attrnum
   6.01%  postgres            [.] UnpinBuffer.constprop.6
   5.48%  postgres            [.] hash_search_with_hash_value
   5.46%  postgres            [.] ginFindLeafPage
   4.85%  postgres            [.] LWLockRelease
   3.89%  [kernel]            [k] _raw_spin_unlock_irqrestore
   3.19%  [kernel]            [k] __do_softirq
   3.03%  [kernel]            [k] run_timer_softirq
   2.89%  postgres            [.] ReadBuffer_common
   2.51%  postgres            [.] entryLocateLeafEntry
   1.93%  postgres            [.] entryLocateEntry
   1.67%  libc-2.17.so        [.] __memset_sse2
   1.48%  postgres            [.] gintuple_get_key
   1.29%  postgres            [.] LWLockReleaseClearVar
   1.21%  libc-2.17.so        [.] __memmove_ssse3_back
   1.07%  libc-2.17.so        [.] __memcpy_ssse3_back
   0.89%  postgres            [.] MarkBufferDirty
   0.89%  postgres            [.] AllocSetAlloc
   0.76%  [kernel]            [k] finish_task_switch
   0.72%  postgres            [.] XLogInsertRecord

写入性能下降, 比 128 列独立索引还要慢, 可以看到现在没有 pending tuples 了.

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
   4294967295 |   4294967295 |              0 |               0 |                0 |             2 |             1 |            0 |         0 |       2
(1 row)

多列查询 CASE 1, 一个条件选择性好, 一个条件选择性差.

现在 GIN 索引的查询性能还好, 但是还是不如 128 独立列索引.

postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.13..23.14 rows=1 width=8) (actual time=3.062..3.062 rows=1 loops=1)
   ->  Bitmap Heap Scan on b  (cost=22.01..23.12 rows=1 width=0) (actual time=3.060..3.060 rows=0 loops=1)
         Recheck Cond: ((c1 <100000) AND (c2 < 10000000))
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..22.01 rows=1 width=0) (actual time=3.058..3.058 rows=0 loops=1)
               Index Cond: ((c1 <100000) AND (c2 < 10000000))
 Planning Time: 0.053 ms
 Execution Time: 3.187 ms
(7 rows)

4, 多列查询 CASE 2, 两个条件都是大范围, 合并后结果集较小

postgres=# explain analyze select count(*) from c where c1<100000000 and c2<100000000;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1290.52..1290.53 rows=1 width=8) (actual time=20.199..20.200 rows=1 loops=1)
   ->  Bitmap Heap Scan on c  (cost=29.59..1287.60 rows=1170 width=0) (actual time=18.299..20.091 rows=1078 loops=1)
         Recheck Cond: ((c1 <100000000) AND (c2 < 100000000))
         Heap Blocks: exact=1061
         ->  Bitmap Index Scan on idx_c_1  (cost=0.00..29.30 rows=1170 width=0) (actual time=18.179..18.179 rows=1078 loops=1)
               Index Cond: ((c1 <100000000) AND (c2 < 100000000))
 Planning Time: 0.183 ms
 Execution Time: 20.474 ms
(8 rows)
postgres=# explain analyze select count(*) from b where c1<100000000 and c2<100000000;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4567.42..4567.43 rows=1 width=8) (actual time=105.917..105.918 rows=1 loops=1)
   ->  Bitmap Heap Scan on b  (cost=65.90..4556.98 rows=4176 width=0) (actual time=81.593..105.503 rows=4098 loops=1)
         Recheck Cond: ((c1 <100000000) AND (c2 < 100000000))
         Heap Blocks: exact=4020
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..64.86 rows=4176 width=0) (actual time=81.067..81.067 rows=4098 loops=1)
               Index Cond: ((c1 <100000000) AND (c2 < 100000000))
 Planning Time: 10.918 ms
 Execution Time: 107.486 ms
(8 rows)
postgres=# explain analyze select count(*) from a where c1<100000000 and c2<100000000;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7574.38..7574.39 rows=1 width=8) (actual time=181.375..181.375 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=2231.28..7561.98 rows=4959 width=0) (actual time=62.215..180.837 rows=4860 loops=1)
         Recheck Cond: ((c1 <100000000) AND (c2 < 100000000))
         Rows Removed by Index Recheck: 266961
         Heap Blocks: exact=3166 lossy=17904
         ->  BitmapAnd  (cost=2231.28..2231.28 rows=4959 width=0) (actual time=61.461..61.461 rows=0 loops=1)
               ->  Bitmap Index Scan on idx_a_1  (cost=0.00..1090.57 rows=95192 width=0) (actual time=20.603..20.603 rows=95198 loops=1)
                     Index Cond: (c1 <100000000)
               ->  Bitmap Index Scan on idx_a_2  (cost=0.00..1137.98 rows=99314 width=0) (actual time=37.628..37.628 rows=95665 loops=1)
                     Index Cond: (c2 <100000000)
 Planning Time: 3.797 ms
 Execution Time: 183.723 ms
(12 rows)

现在, 性能最好的是 GIN 索引, 其次是 RUM 和独立列索引(性能差不多).

5, 多列查询 CASE 3, 单个字段大范围, 结果集大

postgres=# explain analyze select count(*) from c where c1<100000000 ;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=18671.14..18671.15 rows=1 width=8) (actual time=25.041..25.041 rows=1 loops=1)
   ->  Index Scan using idx_c_1 on c  (cost=8.80..18614.25 rows=22757 width=0) (actual time=10.142..22.987 rows=22238 loops=1)
         Index Cond: (c1 <100000000)
 Planning Time: 0.099 ms
 Execution Time: 25.276 ms
(5 rows)
postgres=# explain analyze select count(*) from b where c1<100000000 ;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=63372.32..63372.33 rows=1 width=8) (actual time=139.236..139.236 rows=1 loops=1)
   ->  Bitmap Heap Scan on b  (cost=684.51..63166.06 rows=82505 width=0) (actual time=57.762..131.941 rows=81741 loops=1)
         Recheck Cond: (c1 <100000000)
         Heap Blocks: exact=58458
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..663.89 rows=82505 width=0) (actual time=48.058..48.058 rows=81741 loops=1)
               Index Cond: (c1 <100000000)
 Planning Time: 0.123 ms
 Execution Time: 141.904 ms
(8 rows)
postgres=# explain analyze select count(*) from a where c1<100000000 ;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=78459.78..78459.79 rows=1 width=8) (actual time=109.022..109.023 rows=1 loops=1)
   ->  Index Only Scan using idx_a_1 on a  (cost=0.43..78221.80 rows=95192 width=0) (actual time=0.028..100.343 rows=95198 loops=1)
         Index Cond: (c1 < 100000000)
         Heap Fetches: 95198
 Planning Time: 0.614 ms
 Execution Time: 109.053 ms
(6 rows)

现在, 性能最好的是 RUM 和独立列索引(性能差不多), 其次是 GIN 索引.

小结

索引模式	写入速度	索引大小	单列查询性能（选择性好）	单列查询性能（选择性不好）	多列查询性能（单列选择性都不好，但是合并后选择性很好）	多列查询性能（有些列选择性很好）
128 列单个 gin(fastupdate=on) 索引	3.99 万行 / s	最大	不好 (pending tuples 很大时)	不好 (选择性不好本身就不适合用索引过滤)	不好 (pending tuples 很大时)	不好 (pending tuples 很大时)
128 列单个 gin(fastupdate=off) 索引	1.36 万行 / s	最大	好	不好 (选择性不好本身就不适合用索引过滤)	好	一般
128 列单个 rum 索引	0.378 万行 / s	较小	很好	不好 (选择性不好本身就不适合用索引过滤)	一般	好
128 列 128 个 btree 索引	1.59 万行 / s	较小	很好	不好 (选择性不好本身就不适合用索引过滤)	一般	好

所以, 建议使用每列独立的索引, 来支撑任意列的组合查询, 可以获得写入, 读取较好的均衡.

选择性不好的列, 建议不要使用索引, 选择性通过 pg_stats 查看

analyze c;
postgres=# select schemaname,tablename,attname,n_distinct from pg_stats where tablename='c';
 schemaname | tablename | attname | n_distinct
------------+-----------+---------+------------
 public     | c         | c1      |         -1
 public     | c         | c2      |  -0.999025
 public     | c         | c3      |         -1
...

《PostgreSQL SQL 自动优化案例 - 极简, 自动推荐索引》

《自动选择正确索引访问接口 (btree,hash,gin,Gist,sp-Gist,brin,bitmap...) 的方法》

为什么写入性能会是这样的结果?

同步模式 BUILD 索引, 索引越多, 写入的影响越大, 本文对单表的测试可以看出, 128 个索引, 写入只能达到 1.59 万行 / s. 如果硬件资源足够的话, 可以通过使用分区表 (多个表) 来解决写入的瓶颈问题.

异步模式 BUILD 索引,(GIN 开启 fastupdate), 写入吞吐可以提高, 但是如果长时间处于高吞吐的写入下, 由于一张表只有一个 VACUUM WORKER 进程合并 pending tuples, 所以可能导致 PENDING TUPLES 会越来越多, 导致查询性能下降. 解决办法是分区表, 这样每个分区可以有 1 个 vacuum worker 参与合并, 降低延迟.

为什么查询性能在几种情况下会是这样的结果?

选择性好的情况下, 三种索引, 性能都很好.

多列查询, 如果所有条件的选择性都不好, 但是最后所有条件的合并选择性如果很好时, GIN 内置的 BITMAP 过滤, 使得性能在 RUM,GIN,BTREE 多列三种情况下达到最好.

多列查询, 如果所有条件的选择性都不好, 并且最后所有条件合并后的选择性也不好时, 所有索引的性能都一般.

来源: https://yq.aliyun.com/articles/698090

与本文相关文章

暂无,快来抢沙发吧！