- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【Elasticsearch】文本分析 Text analysis 查询_search中使用分析 (3)

小雨青年发表于 2022/03/28 22:52:49 2022/03/28

【摘要】内置的analyzer fingerprint 指纹分析器实现了一个指纹算法，OpenRefine项目使用该算法来协助聚类。内部的流程为转换小写去掉扩展字符排序删除重复字符删除配置的停止(st...

内置的analyzer

fingerprint

指纹分析器实现了一个指纹算法，OpenRefine项目使用该算法来协助聚类。

内部的流程为

转换小写
去掉扩展字符
排序
删除重复字符
删除配置的停止(stop)单词

示例如下

POST _analyze
{
  "analyzer": "fingerprint",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}

  
 
  1
  2
  3
  4
  5

[ and consistent godel is said sentence this yes ]

  
 
  1

keyword

关键词分析器，什么事情都没做，直接返回原来的字符串。

POST _analyze
{
  "analyzer": "keyword",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]

  
 
  1

Language

语言分析器,一组旨在分析特定语言文本的分析器。支持以下类型：阿拉伯语，亚美尼亚语，巴斯克语，孟加拉语，巴西语，保加利亚语，加泰罗尼亚语，cjk，捷克语，丹麦语，荷兰语，英语，爱沙尼亚语，芬兰语，法语，加利西亚语，德语，希腊语，印地语，印地语，匈牙利语，印度尼西亚语，爱尔兰语，意大利语，拉脱维亚语，立陶宛语，挪威语，波斯语，葡萄牙语，罗马尼亚语，俄语，索拉尼语，西班牙语，瑞典语，土耳其语，泰语。

arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai

pattern

正则分析器，默认正则表达式是\W+

POST _analyze
{
  "analyzer": "pattern",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

  
 
  1

simple

去掉非字母字符
转换小写

POST _analyze
{
  "analyzer": "simple",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

  
 
  1

standard

标准分析器是默认的，如果不指定就是这个。

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

  
 
  1

stop

停止分析器基本是和simple一样的，只是配置上增加了stopwords

POST _analyze
{
  "analyzer": "stop",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

  
 
  1

whitespace

空格分析器在遇到空格字符时会将文本分解为多个词

POST _analyze
{
  "analyzer": "whitespace",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

  
 
  1
  2
  3
  4
  5

[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

  
 
  1

需要注意的点

分析器默认配置可以直接用在搜索上
如果需要额外的配置比如stopwords需要自定义分析器
搜索会是使用分析器的处理结果作为查询的条件，这样做相当于自己在搜索之前处理了用户的输入

使用分析器查询

GET _search
{
  "query": {
    "match": {
      "message": {
        "query": "a Pose",
        "analyzer": "stop"
       
      }
    }
  }
}

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12

参考资料

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-fingerprint-analyzer.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html

文章来源: coderfix.blog.csdn.net，作者：小雨青年，版权归原作者所有，如需转载，请联系作者。

原文链接：coderfix.blog.csdn.net/article/details/114392540

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

【Elasticsearch】文本分析 Text analysis 查询_search中使用分析 (3)

内置的analyzer

fingerprint

keyword

Language

pattern

simple

standard

stop

whitespace

需要注意的点

使用分析器查询

参考资料

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

【Elasticsearch】文本分析 Text analysis 查询_search中使用分析 (3)

内置的analyzer

fingerprint

keyword

Language

pattern

simple

standard

stop

whitespace

需要注意的点

使用分析器查询

参考资料

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品