分词后数据查询问题-慕课网

{ "settings": { "number_of_shards": 3, "number_of_replicas": 2 }, "mappings": { "_doc": { "properties": { "id": { "type": "keyword" }, "documentId": { "type": "integer" }, "name": { "type": "text", "analyzer": "ik_smart" }, "content": { "type": "text", "analyzer": "ik_smart" }, "labelName": { "type": "text" }, "ownerName": { "type": "text" }, "lastBrowseUser": { "type": "text" }, "createTime": { "type": "date" }, "updateTime": { "type": "date" }, "lastBrowseTime": { "type": "date" }, "type": { "type": "keyword" } } } } }

{ "bool": { "must": [ { "multi_match": { "query": "新建文", "fields": [ "name^1.0" ], "type": "best_fields", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }

1回答

rockybean 回答被采纳获得+3积分 2020-12-18 11:18:42

你可以看下新建文的分词结果，通过 _analyze 接口查看下

GET _analyze
{
"analyzer": "ik_smart",
"text": ["新建文"]
}

结果如下:

{
"tokens" : [
{
"token" : "新",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "建文",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}
]
}

这样一看，没搜到就很正常了

1 回复有任何疑惑可以回复我~

收起回答

提问者 simons_fan #1

老师，https://coding.imooc.com/learn/questiondetail/199069.html，，我在这里看到您的答复用match查询是可以实现类似于mysql的like用法的，，换句话说，上面问题，有啥好的解决方案吗？感谢

回复有任何疑惑可以回复我~ 2020-12-18 14:32:34

rockybean 回复提问者 simons_fan #2

出现这个问题的核心原因四分词不准确导致的。ik_smart 的分词结果比较少，你可以换用 ik_max_word

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": ["新建文"]
}

{
  "tokens" : [
    {
      "token" : "新建",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "建文",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}



GET _analyze
{
  "analyzer": "ik_max_word",
  "text": ["新建文件夹"]
}

{
  "tokens" : [
    {
      "token" : "新建",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "建文",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "文件夹",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "文件",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "夹",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    }
  ]
}

回复有任何疑惑可以回复我~ 2020-12-19 09:46:16

提问者 simons_fan #3
```
非常感谢！
```
回复有任何疑惑可以回复我~ 2020-12-19 11:15:35

分词后数据查询问题

正在回答

1回答

相似问题

请选择置顶位置

本课精华内容

ES优化问题,不解决试用期过不了了。哎

doc里面要如何删除不需要的TYPE字段, 如何以天来建造INDEX且按照天来自动删除

对于无序的英文编号如何进行分词呢

集群报红问题

老师，filebeat重复输出怎么解决？我tail_files设置的是true，每次新增一条记录，就会把这个文件里面的所有日志在输出一遍

热搜

最近搜索清空