经过之前的工作, 目前已经完成了数据地图的数据格式化和录入记录, 目前我们的数据地图项目已经进行到最后阶段, 所以现在需要一个接口, 进行格式化数据并输出, 其中需要用到 Elasticsearch 的全文检索, 检索出数据后, 使用 PHP 接口格式化数据输出
一, 全文检索
搜索条件 (时间, 空间)
输出结果 (用户数量)
例如, 一个小时内, 在中国范围内, 各个经纬度坐标的, 有操作行为的, 用户个数
由此需求, 可以得到相应的 Elasticsearch 的搜索语句, 如下:
- {
- "size": 0,
- "aggs": {
- "filter_agg": {
- "filter": {
- "geo_bounding_box": {
- "location": {
- "top_left": {
- "lat": 90,
- "lon": -34.453125
- },
- "bottom_right": {
- "lat": -90,
- "lon": 34.453125
- }
- }
- }
- },
- "aggs": {
- "2": {
- "geohash_grid": {
- "field": "location",
- "precision": 2
- },
- "aggs": {
- "3": {
- "geo_centroid": {
- "field": "location"
- }
- }
- }
- }
- }
- }
- },
- "stored_fields": [
- "*"
- ],
- "docvalue_fields": [
- "@timestamp"
- ],
- "query": {
- "bool": {
- "must": [
- {
- "range": {
- "@timestamp": {
- "gte": 1542692193461,
- "lte": 1542695793461,
- "format": "epoch_millis"
- }
- }
- }
- ]
- }
- }
- }
size=0 表示不分页
query 为搜索主体, 其中的必要条件为时间参数, 即, 搜索此段时间内的所有数据
aggs 中相当于 spl 中的 where 条件, 而其中 geo_bounding_box 为地理范围, 由左上角经纬度点到右下角经纬度点所界定的一个矩形方框.
aggs 嵌套, 即上层条件的结果上, 继续做筛选
geohash_grid 表示, 按照你定义的精度计算每一个点的 geohash 值而将附近的位置聚合在一起, 其中 field 为目前筛选的的字段, precision 为经度, 单位为 km
最后, 通过 geo_centroid 得到 key 为 location 的聚合数据
结果数据格式如下:
- {
- "took": 428,
- "timed_out": false,
- "_shards": {
- "total": 131,
- "successful": 126,
- "skipped": 121,
- "failed": 5,
- "failures": [
- {
- "shard": 0,
- "index": "elastalert_status_status",
- "node": "w10b9zEBRpuUEQsWvNqEig",
- "reason": {
- "type": "query_shard_exception",
- "reason": "failed to find geo_point field [location]",
- "index_uuid": "Dm4dpUtTTHitYN-TZFC-1g",
- "index": "elastalert_status_status"
- }
- }
- ]
- },
- "hits": {
- "total": 360942,
- "max_score": 0,
- "hits": []
- },
- "aggregations": {
- "filter_agg": {
- "2": {
- "buckets": [
- {
- "3": {
- "location": {
- "lat": 48.58949514372008,
- "lon": 7.584022147181843
- },
- "count": 252
- },
- "key": "u0",
- "doc_count": 252
- },
- {
- "3": {
- "location": {
- "lat": 54.420127907268785,
- "lon": -3.120888938036495
- },
- "count": 181
- },
- "key": "gc",
- "doc_count": 181
- },
- {
- "3": {
- "location": {
- "lat": 42.32862451614172,
- "lon": 3.7518564593602917
- },
- "count": 67
- },
- "key": "sp",
- "doc_count": 67
- },
- {
- "3": {
- "location": {
- "lat": 45.40799999143928,
- "lon": 11.88589995726943
- },
- "count": 21
- },
- "key": "u2",
- "doc_count": 21
- },
- {
- "3": {
- "location": {
- "lat": 46.65579996071756,
- "lon": 32.61779992841184
- },
- "count": 1
- },
- "key": "u8",
- "doc_count": 1
- }
- ]
- },
- "doc_count": 522
- }
- }
- }
aggregations 中是我们最终需要的数据
其中 location 为聚合的经纬度坐标, 紧跟着的 count 则指的是, 在此点 2km*2km 范围之内的用户数.
自此, 我们首先明白了, 在 Elasticsearch, 如何书写 search 语句查询我们想要的东西. 接下来, 我们需要书写相应的 PHP 接口, 来格式化输出数据
二, 接口书写
使用 Elasticseach 的 PHP API
确定输入参数: 时间范围, 空间范围
确定输出数据结构, 并格式化数据输出
代码如下, 有注释:
- <?PHP
- /**
- * Created by PhpStorm.
- * User: ekisong
- * Date: 2018/11/13
- * Time: 15:55
- */
- require 'vendor/autoload.php';
- ini_set('display_errors','on');
- error_reporting(E_ALL);
- use Elasticsearch\ClientBuilder;
- // 创建 Elasticsearch 的搜索对象 client
- $client = ClientBuilder::create()->setHosts(["localhost:9200"])->build();
- // 需要被筛选的字段名, 默认值为 location
- $fieldName = isset($_GET['field']) ? $_GET['field'] : 'location';
- // 地理围栏左上角纬度, 默认值 90
- $topLeftLat = isset($_GET['top_left_lat']) ? $_GET['top_left_lat'] : 90;
- // 地理围栏左上角经度, 默认值 - 180
- $topLeftLon = isset($_GET['top_left_lon']) ? $_GET['top_left_lon'] : -180;
- // 地理围栏右下角纬度, 默认值 - 90
- $bottomRightLat = isset($_GET['bottom_right_lat']) ? $_GET['bottom_right_lat'] : -90;
- // 地理围栏右下角经度, 默认值 180
- $bottomRightLon = isset($_GET['bottom_right_lon']) ? $_GET['bottom_right_lon'] : 180;
- // 时间范围结束时间, 默认当前时间
- $endTime = isset($_GET['end_time']) ? $_GET['end_time'] : time()*1000;
- // 时间范围其实时间, 默认当前时间前 15 分钟
- $startTime = isset($_GET['start_time']) ? $_GET['start_time'] : $endTime - 15*60*1000;
- // 创建查询结构体
- $body = [
- 'size' => 0,
- 'query' => [
- 'bool' => [
- 'must' => [
- [
- 'range' => [
- '@timestamp' => [
- 'gte' => $startTime,
- 'lte' => $endTime,
- 'format' => 'epoch_millis'
- ]
- ]
- ]
- ]
- ]
- ],
- 'aggs' => [
- 'filter_agg' => [
- 'filter' => [
- 'geo_bounding_box' => [
- 'location' => [
- 'top_left' => [
- 'lat' => $topLeftLat,
- 'lon' => $topLeftLon
- ],
- 'bottom_right' => [
- 'lat' => $bottomRightLat,
- 'lon' => $bottomRightLon
- ]
- ]
- ]
- ],
- 'aggs' => [
- '2' => [
- 'geohash_grid' => [
- 'field' => $fieldName,
- 'precision' => 1
- ],
- 'aggs' => [
- '3' => [
- 'geo_centroid' => [
- 'field' => $fieldName
- ]
- ]
- ]
- ]
- ]
- ]
- ],
- 'stored_fields' => [
- '*'
- ],
- 'docvalue_fields' => [
- '@timestamp'
- ]
- ];
- // 搜索参数
- $params = [
- 'index' => 'logstash-*',
- 'body' => $body
- ];
- //Elasticsearch 搜索结果原始数据
- $response = $client->search($params);
- $resultTmp = $response['aggregations']['filter_agg']['2']['buckets'];
- $data = array();
- // 格式化数据
- foreach ($resultTmp as $doc)
- {
- $lat = $doc['3'][$fieldName]['lat'];
- $lon = $doc['3'][$fieldName]['lon'];
- $count = $doc['doc_count'];
- $tmp = [
- 'count' => $count,
- 'geometry' => [
- 'type' => 'Point',
- 'coordinates' => [$lon,$lat]
- ]
- ];
- $data[] = $tmp;
- }
- $result = array('data'=>$data,'error_msg'=>'','flag'=>1);
- if (empty($data))
- {
- $result['error_msg'] = 'no data';
- $result['flag'] = 0;
- }
- // 最终输出
- echo json_encode($result);
- exit();
由于 H5 页面插件限制, 所以需要特定的数据格式. 所以最终输出结果如下:
- [{
- "count": 6,
- "geometry": {
- "type": "Point",
- "coordinates": ["116.395645", "39.929986"]
- }
- }, {
- "count": 6,
- "geometry": {
- "type": "Point",
- "coordinates": ["121.487899", "31.249162"]
- }
- }, {
- "count": 5,
- "geometry": {
- "type": "Point",
- "coordinates": ["117.210813", "39.14393"]
- }
- }, {
- "count": 4,
- "geometry": {
- "type": "Point",
- "coordinates": ["106.530635", "29.544606"]
- }
- }]
至此, 我们数据地图项目在数据方面的工作暂且告一段落.
参考文档:
来源: https://juejin.im/post/5bf3cafde51d451b3b63f792