白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析
【摘要】
文章目录
概述官网slop 含义例子示例一示例二示例三
概述
继续跟中华石杉老师学习ES,第18篇
课程地址: https://www.roncoo.com/view/55
...

概述
继续跟中华石杉老师学习ES,第18篇
课程地址: https://www.roncoo.com/view/55
接上篇博客 白话Elasticsearch17-match_phrase query 短语匹配搜索
官网
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
slop 含义
官网中我们可以看到
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.
slop是什么呢?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop 。
-
slop的phrase match,就是proximity match,近似匹配
-
如果我们指定了slop,那么就允许搜索关键词进行移动,来尝试与doc进行匹配
-
搜索关键词k,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
例子
一个query string经过几次移动之后可以匹配到一个document,然后设置slop .
假设有个doc
hello world, java is very good, spark is also very good.
我们使用 match_phrase query 来搜索 java spark ,是肯定搜索不到的, 因为 match_phrase query 会将java spark 作为一个整体来查找。
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了 。
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop": 3
}
}
}
}
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的。
示例一
我们那我们的测试数据来验证下
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "spark data",
"slop": 3
}
}
}
}
分析一下slop
data经过了3次移动才匹配到 spark data ,所以 slop设置为3即可,当然了设置成比3大的数字,肯定也是可以查询到的,这里的slop设置为3 ,可以理解为至少移动3次。
示例二
如果我们搜索data spark 呢? 会不会匹配得到呢? 答案是 : 可以
来分析一下
示例三
slop搜索下,关键词离的越近,relevance score就会越高 .
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java blog",
"slop": 5
}
}
}
}
返回结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.81487787,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.81487787,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.31424814,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01",
"tag": [
"java",
"hadoop"
],
"tag_cnt": 2,
"view_cnt": 30,
"title": "this is java and elasticsearch blog",
"content": "i like to write best elasticsearch article",
"sub_title": "learning more courses",
"author_first_name": "Peter",
"author_last_name": "Smith",
"new_author_last_name": "Smith",
"new_author_first_name": "Peter"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 0.31424814,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
],
"tag_cnt": 2,
"view_cnt": 80,
"title": "this is java, elasticsearch, hadoop blog",
"content": "elasticsearch and hadoop are all very good solution, i am a beginner",
"sub_title": "both of them are good",
"author_first_name": "Robbin",
"author_last_name": "Li",
"new_author_last_name": "Li",
"new_author_first_name": "Robbin"
}
}
]
}
}
可以看到
得分最高的
次之
最后
文章来源: artisan.blog.csdn.net,作者:小小工匠,版权归原作者所有,如需转载,请联系作者。
原文链接:artisan.blog.csdn.net/article/details/97696444
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
作者其他文章
评论(0)