query ext rep repl 内容 image ppi keyword index
ES默认是动态创建索引和索引类型的mapping的,但是在学习的时候还能这样用,在生产中一定是手动制定mapping!在生产中经常会遇到这样的需求,想用某个字段进行统计,又想对该字段进行模糊查询,解决这种需求的方法就是对该字段创建别名!
mapping结构如下:
- {
- "settings" : {
- "index" : {
- "analysis" : {
- "filter" : {
- "english_keywords" : {
- "type" : "keyword_marker",
- "keywords" : [
- "topsec"
- ]
- },
- "english_stemmer" : {
- "type" : "stemmer",
- "language" : "english"
- },
- "english_possessive_stemmer" : {
- "type" : "stemmer",
- "language" : "possessive_english"
- },
- "english_stop" : {
- "type" : "stop",
- "stopwords" : "_english_"
- }
- },
- "analyzer" : {
- "default" : {
- "tokenizer" : "keyword"
- },
- "english" : {
- "type" : "custom",
- "filter" : [
- "lowercase",
- "english_stop"
- ],
- "tokenizer" : "standard"
- },
- "ik" : {
- "filter" : ["lowercase"],
- "type" : "custom",
- "tokenizer" : "ik_max_word"
- },
- "html" : {
- "filter" : [
- "lowercase",
- "english_stop"
- ],
- "char_filter" : [
- "html_strip"
- ],
- "type" : "custom",
- "tokenizer" : "standard"
- },
- "lower" : {
- "filter" : "lowercase",
- "type" : "custom",
- "tokenizer" : "keyword"
- }
- }
- },
- "number_of_shards" : "1",
- "number_of_replicas" : "0"
- }
- },
- "mappings" : {
- "test" : {
- "_all" : {
- "enabled" : false
- },
- "properties" : {
- "name" : {
- "type" : "keyword"
- },
- "age" : {
- "type" : "keyword",
- "fields" : {
- "cn" : {
- "analyzer" : "ik",
- "type" : "text"
- }
- }
- },
- "address" : {
- "type" : "text"
- }
- }
- }
- }
- }
字段age的"type" : "keyword",不分词,然后起个别名cn,对它使用ik分词器进行分词!插入四条数据
用age字段对数据进行统计的时候,需要用不分词的age,并且需要使用全匹配规则,语句:
- {
- "query": {
- "bool": {
- "must": [
- {
- "term": {
- "age": "北京市海淀区西二旗中关村西门"
- }
- }
- ],
- "must_not": [],
- "should": []
- }
- },
- "from": 0,
- "size": 10,
- "sort": [],
- "aggs": {}
- }
结果:
使用age的分词age.cn进行统计是有问题的,运行的结果说明对age的别名age.cn进行分词,查询条件必须匹配分词器对age的内容进行分词的结果进行匹配,
- {
- "query": {
- "bool": {
- "must": [
- {
- "term": {
- "age.cn": "北京市海淀区西二旗中关村西门"
- }
- }
- ],
- "must_not": [],
- "should": []
- }
- },
- "from": 0,
- "size": 10,
- "sort": [],
- "aggs": {}
- }
结果:
- {
- "query": {
- "bool": {
- "must": [
- {
- "term": {
- "age.cn": "北京市"
- }
- }
- ],
- "must_not": [],
- "should": []
- }
- },
- "from": 0,
- "size": 10,
- "sort": [],
- "aggs": {}
- }
结果:
如果使用match来统计的话也会有问题,会把不正确的数据也统计出来,使用 match进行统计会把查询条件与内容进行匹配,根据匹配度进行打分,分数高的说明匹配度高,会排在上面
- {
- "query": {
- "bool": {
- "must": [
- {
- "match": {
- "age.cn": "北京市海淀区西二旗中关村"
- }
- }
- ],
- "must_not": [],
- "should": []
- }
- },
- "from": 0,
- "size": 10,
- "sort": [],
- "aggs": {}
- }
结果:
下面就是按匹配度打分排名的结果
- {
- "query": {
- "bool": {
- "must": [
- {
- "match": {
- "age.cn": "北京市昌平区"
- }
- }
- ],
- "must_not": [],
- "should": []
- }
- },
- "from": 0,
- "size": 10,
- "sort": [],
- "aggs": {}
- }
结果:
总结:统计就用term,不分词,全匹配;模糊查询就用match,分词,不用全匹配!
若有不正之处,请谅解和批评指正,不胜感激!!!!!欢迎大家留言讨论!!!
ES创建mapping时字段别名
query ext rep repl 内容 image ppi keyword index
原文:http://www.cnblogs.com/sqy123/p/7920519.html
来源: http://www.bubuko.com/infodetail-2411326.html