白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析
概述
继续跟中华石杉老师学习ES,第18篇
课程地址: https://www.roncoo.com/view/55
接上篇博客 白话Elasticsearch17-match_phrase query 短语匹配搜索
官网
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
slop 含义
官网中我们可以看到
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.
- 1
- 2
slop是什么呢?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop 。
-
slop的phrase match,就是proximity match,近似匹配
-
如果我们指定了slop,那么就允许搜索关键词进行移动,来尝试与doc进行匹配
-
搜索关键词k,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
例子
一个query string经过几次移动之后可以匹配到一个document,然后设置slop .
假设有个doc
hello world, java is very good, spark is also very good.
- 1
我们使用 match_phrase query 来搜索 java spark ,是肯定搜索不到的, 因为 match_phrase query 会将java spark 作为一个整体来查找。
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了 。
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop": 3
}
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的。
示例一
我们那我们的测试数据来验证下
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "spark data",
"slop": 3
}
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
分析一下slop
data经过了3次移动才匹配到 spark data ,所以 slop设置为3即可,当然了设置成比3大的数字,肯定也是可以查询到的,这里的slop设置为3 ,可以理解为至少移动3次。
示例二
如果我们搜索data spark 呢? 会不会匹配得到呢? 答案是 : 可以
来分析一下
示例三
slop搜索下,关键词离的越近,relevance score就会越高 .
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java blog",
"slop": 5
}
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
返回结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.81487787,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.81487787,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.31424814,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01",
"tag": [
"java",
"hadoop"
],
"tag_cnt": 2,
"view_cnt": 30,
"title": "this is java and elasticsearch blog",
"content": "i like to write best elasticsearch article",
"sub_title": "learning more courses",
"author_first_name": "Peter",
"author_last_name": "Smith",
"new_author_last_name": "Smith",
"new_author_first_name": "Peter"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 0.31424814,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
],
"tag_cnt": 2,
"view_cnt": 80,
"title": "this is java, elasticsearch, hadoop blog",
"content": "elasticsearch and hadoop are all very good solution, i am a beginner",
"sub_title": "both of them are good",
"author_first_name": "Robbin",
"author_last_name": "Li",
"new_author_last_name": "Li",
"new_author_first_name": "Robbin"
}
}
]
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
可以看到
得分最高的
次之
最后
文章来源: artisan.blog.csdn.net,作者:小小工匠,版权归原作者所有,如需转载,请联系作者。
原文链接:artisan.blog.csdn.net/article/details/97696444
- 点赞
- 收藏
- 关注作者
评论(0)