- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

python复习之【difflib文本相似性】

livingbody 发表于 2022/12/29 11:32:34 2022/12/29

【摘要】 Python 库提供了检测difflib模块中两个序列之间差异的工具。由于文本本身是一个序列（字符序列），我们可以应用提供的函数来检测字符串中的相似性。 1.difflib.SequenceMatcher计算字符串之间的相似性（从 0 到 1）s1 = 'Today the weather is nice's2 = 'Today the weater is nice's3 = 'Yester...

Python 库提供了检测difflib模块中两个序列之间差异的工具。由于文本本身是一个序列（字符序列），我们可以应用提供的函数来检测字符串中的相似性。

1.`difflib.SequenceMatcher`计算字符串之间的相似性（从 0 到 1）

s1 = 'Today the weather is nice'
s2 = 'Today the weater is nice'
s3 = 'Yesterday the weather was nice'
s4 = 'Today my dog ate steak'

import difflib
difflib.SequenceMatcher(None, s1, s2, False).ratio()

0.9795918367346939

difflib.SequenceMatcher(None, s1, s3, False).ratio()

0.8

difflib.SequenceMatcher(None, s1, s4, False).ratio()

0.46808510638297873

2.`.get_matching_blocks()`返回匹配位置

.get_matching_blocks()返回匹配位置，忽略字符时，可以适用maketrans

a = 'aaaaaaaaaaaaaXaaaaaaaaaa'
b = 'X'

difflib.SequenceMatcher(None, a, b, False).get_matching_blocks()

[Match(a=13, b=0, size=1), Match(a=24, b=1, size=0)]

difflib.SequenceMatcher(lambda c: c=='a', a, b, False).get_matching_blocks()

[Match(a=13, b=0, size=1), Match(a=24, b=1, size=0)]

discardmap = str.maketrans({"a": None})
difflib.SequenceMatcher(None, a.translate(discardmap), b.translate(discardmap), False).ratio()

1.0

3.拼写建议

如果拼写错误，返回正确拼写
如果拼写无误，火车查不到，则为正确。

# 引入单词词表
dictionary = {'ability', 'able', 'about', 'above', 'accept',    
              'according', 
              'account', 'across', 'act', 'action', 'activity', 
              'actually', 
              'add', 'address', 'administration', 'admit', 'adult', 
              'affect', 
              'after', 'again', 'against', 'age', 'agency', 
              'agent', 'ago', 
              'agree', 'agreement', 'ahead', 'air', 'all', 'allow',  
              'almost', 
              'alone', 'along', 'already', 'also', 'although', 
              'always', 
              'American', 'among', 'amount', 'analysis', 'and', 
              'animal', 
              'another', 'answer', 'any', 'anyone', 'anything', 
              'appear', 
              'apply', 'approach', 'area', 'argue', 
              'arm', 'around', 'arrive', 
              'art', 'article', 'artist', 'as', 'ask', 'assume', 
              'at', 'attack', 
              'attention', 'attorney', 'audience', 'author',  
              'authority', 
              'available', 'avoid', 'away', 'baby', 'back', 'bad', 
              'bag', 
              'ball', 'bank', 'bar', 'base', 'be', 'beat', 
              'beautiful', 
              'because', 'become'}

# 相似度比较
import difflib

def suggest(phrase):
    changes = 0
    words = phrase.split()
    for idx, w in enumerate(words):
        if w not in dictionary:
            changes += 1
            matches = difflib.get_close_matches(w, dictionary)
            if matches:
                words[idx] = matches[0]
    return changes, ' '.join(words)

suggest('assume anu answer')

(1, 'assume any answer')

suggest('beautiful dog')

(1, 'beautiful dog')

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

python复习之【difflib文本相似性】

1.`difflib.SequenceMatcher`计算字符串之间的相似性（从 0 到 1）

2.`.get_matching_blocks()`返回匹配位置

3.拼写建议

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

python复习之【difflib文本相似性】

1.difflib.SequenceMatcher计算字符串之间的相似性（从 0 到 1）

2..get_matching_blocks()返回匹配位置

3.拼写建议

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品

1.`difflib.SequenceMatcher`计算字符串之间的相似性（从 0 到 1）

2.`.get_matching_blocks()`返回匹配位置