参考文档
https://greatwqs.iteye.com/blog/1741330
缘起
Pinpoint 接入业务监控后数据量大涨, 平均每天 Hbase 数据增量 20G 左右, 数据量太大, 需要对数据进行定期清理, 否则监控可用性降低, 由于之前环境是由 docker-compose 部署, 查到 hbase 可以修改表的 ttl 来清理数据, 目前进入 pinpoint-hbase 容器操作, 如果能在 hbase 表格生成时就修改 ttl 效果会更佳, 该方法需要熟悉 docker-compose 里面 pinpoint-web 及 pinpoint-hbase 部署方法, 后期跟进
操作步骤
查找出数据大的 hbase 表
- root@990fb5560f64:/opt/hbase/hbase-1.2.6# ls
- CHANGES.txt LICENSE.txt README.txt conf hbase-webapps logs
- LEGAL NOTICE.txt bin docs lib
- root@990fb5560f64:/opt/hbase/hbase-1.2.6# cd bin/
- root@990fb5560f64:/opt/hbase/hbase-1.2.6/bin# ls
- draining_servers.rb hbase-jruby rolling-restart.sh
- get-active-master.rb hbase.cmd shutdown_regionserver.rb
- graceful_stop.sh hirb.rb start-hbase.cmd
- hbase local-master-backup.sh start-hbase.sh
- hbase-cleanup.sh local-regionservers.sh stop-hbase.cmd
- hbase-common.sh master-backup.sh stop-hbase.sh
- hbase-config.cmd region_mover.rb test
- hbase-config.sh region_status.rb thread-pool.rb
- hbase-daemon.sh regionservers.sh zookeepers.sh
- hbase-daemons.sh replication
- root@990fb5560f64:/home/pinpoint/hbase/data/default# ls
- AgentEvent AgentStatV2 ApplicationMapStatisticsCallee_Ver2 ApplicationStatAggre SqlMetaData_Ver2
- AgentInfo ApiMetaData ApplicationMapStatisticsCaller_Ver2 ApplicationTraceIndex StringMetaData
- AgentLifeCycle ApplicationIndex ApplicationMapStatistiCSSelf_Ver2 HostApplicationMap_Ver2 TraceV2
- root@990fb5560f64:/home/pinpoint/hbase/data/default# du -h |grep G
- 17G ./TraceV2
- 2.2G ./ApplicationTraceIndex
- 19G .
24 小时产生数据大概 20G, 发现其中 TraceV2 及 ApplicationTraceIndex 数据比较大, 设置 TTL 分别为 7Day 及 14Day
进入 hbase 修改表 ttl
- root@990fb5560f64:/opt/hbase/hbase-1.2.6/bin# ./hbase shell
- 2019-04-26 12:31:44,071 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- HBase Shell; enter 'help<RETURN>' for list of supported commands.
- Type "exit<RETURN>" to leave the HBase Shell
- Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017
- hbase(main):001:0> list
- TABLE
- AgentEvent
- AgentInfo
- AgentLifeCycle
- AgentStatV2
- ApiMetaData
- ApplicationIndex
- ApplicationMapStatisticsCallee_Ver2
- ApplicationMapStatisticsCaller_Ver2
- ApplicationMapStatisticsSelf_Ver2
- ApplicationStatAggre
- ApplicationTraceIndex
- HostApplicationMap_Ver2
- SqlMetaData_Ver2
- StringMetaData
- TraceV2
- 15 row(s) in 0.1750 seconds
- => ["AgentEvent", "AgentInfo", "AgentLifeCycle", "AgentStatV2", "ApiMetaData", "ApplicationIndex", "ApplicationMapStatisticsCallee_Ver2", "ApplicationMapStatisticsCaller_Ver2", "ApplicationMapStatisticsSelf_Ver2", "ApplicationStatAggre", "ApplicationTraceIndex", "HostApplicationMap_Ver2", "SqlMetaData_Ver2", "StringMetaData", "TraceV2"]
- hbase(main):002:0> describe 'TraceV2'
- Table TraceV2 is ENABLED
- TraceV2
- COLUMN FAMILIES DESCRIPTION
- {
- NAME => 'S', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX', TTL => '5184000 SECONDS (60 DAYS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'
- }
- 1 row(s) in 0.1000 seconds
- hbase(main):003:0> disable 'TraceV2'
- 0 row(s) in 8.3610 seconds
- hbase(main):004:0> alter 'TraceV2' , {
- NAME=>'S',TTL=>'604800'
- }
- Updating all regions with the new schema...
- 256/256 regions updated.
- Done.
- 0 row(s) in 1.9750 seconds
- hbase(main):001:0>
- hbase(main):002:0* enable 'TraceV2'
- 0 row(s) in 28.5440 seconds
- hbase(main):003:0> describe 'TraceV2'
- Table TraceV2 is ENABLED
- TraceV2
- COLUMN FAMILIES DESCRIPTION
- {
- NAME => 'S', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX', TTL => '604800 SECONDS (7 DAYS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'
- }
- 1 row(s) in 0.2410 seconds
设置 ApplicationTraceIndex 的 TTL 为 14 天
- hbase(main):004:0> describe 'ApplicationTraceIndex'
- Table ApplicationTraceIndex is ENABLED
- ApplicationTraceIndex
- COLUMN FAMILIES DESCRIPTION
- {
- NAME => 'I', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX', TTL => '5184000 SECONDS (60 DAYS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'
- }
- 1 row(s) in 0.0240 seconds
- hbase(main):007:0> disable 'ApplicationTraceIndex'
- 0 row(s) in 2.2970 seconds
- hbase(main):008:0> alter 'ApplicationTraceIndex' , {
- NAME=>'I',TTL=>'1209600'
- }
- Updating all regions with the new schema...
- 16/16 regions updated.
- Done.
- 0 row(s) in 1.9250 seconds
- hbase(main):009:0> enable 'ApplicationTraceIndex'
- 0 row(s) in 2.2350 seconds
- hbase(main):010:0> describe 'ApplicationTraceIndex'
- Table ApplicationTraceIndex is ENABLED
- ApplicationTraceIndex
- COLUMN FAMILIES DESCRIPTION
- {
- NAME => 'I', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX', TTL => '1209600 SECONDS (14 DAYS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'
- }
- 1 row(s) in 0.0290 seconds
- hbase(main):012:0> major_compact 'ApplicationTraceIndex'
- 0 row(s) in 0.3740 seconds
备注
major_compact 的操作目的
合并文件
清除删除, 过期, 多余版本的数据
提高读写数据的效率
- 604800 7day
- describe 'TraceV2'
- disable 'TraceV2'
- alter 'TraceV2' , {
- NAME=>'S',TTL=>'604800'
- }
- enable 'TraceV2'
- disable 'TraceV2'
- major_compact 'TraceV2'
- 1209600 14day
- describe 'ApplicationTraceIndex'
- disable 'ApplicationTraceIndex'
- alter 'ApplicationTraceIndex' , {
- NAME=>'I',TTL=>'1209600'
- }
- enable 'ApplicationTraceIndex'
- disable 'ApplicationTraceIndex'
- major_compact 'ApplicationTraceIndex'
来源: http://blog.51cto.com/jerrymin/2386757