如何获得英语单词的发音?增加 IPA-SAMPA

举报
tsinghuazhuoqing 发表于 2022/02/19 22:08:08 2022/02/19
【摘要】 简 介: 为了获得英文单词的读音并进行显示,使用 eng_to_ipa 或者单词的 IPA, 然后通过自行编写的转换程序,将IPA 转换成 sampa并进行显示。 关键词: sampa,ipa ...

简 介: 为了获得英文单词的读音并进行显示,使用 eng_to_ipa 或者单词的 IPA, 然后通过自行编写的转换程序,将IPA 转换成 sampa并进行显示。

关键词 sampaipa

单词发音
目 录
Contents
文字转语音
wordnet
eng-to-ipa
显示IPA
转换程序
gruut-ipa
SAMPA字符对应表格
改造程序
改造原料
改造小程序
总 结

 

§01 词发音


  语的单词的认知需要同时配合他的发音才能够有更好的效果。在测试一些利用PYTHON完成中英文翻译的效果利用有道在线翻译对于TEASOFT软件Python附加程序进行了英语查询扩充,但是现在还缺少读音的提示,下面查看一些辅助的工具,看是否能够将这个功能补齐。

1.1 文字转语音

  在 Python(九)- 音频文字转换 给出了几个语音模块:

  • pyttsx3 :软件包,是系统内置的语音引擎实现发音,不生成MP3
  • win32comWindows 操作系统内置的语音引擎实现文字发音;

  在 更新pip3与pyttsx3文字语音转换 中指出在安装 pyttsx3之前需要安装 pypiwin32

1.1.1 安装软件包

(1)安装 pypiwin32

python -m pip install pypiwin32

(2)安装 pyttsx3

python -m pip install pyttsx3

1.1.2 测试软件包

 Ⅰ.汉语读音
import pyttsx3
engine = pyttsx3.init()
engine.say('人类真帅')
engine.runAndWait()

  
 
  • 1
  • 2
  • 3
  • 4

  可以听到“人类真帅” 的输出语言。

 Ⅱ.英语读音

  engine.say(‘Windows is an OS.’)

  也可以听到英文的读出(“Windows is an OS.”)

  但是上述的输出有着明显的“机器人” 的强调,不是非常自认的语言。

1.2 wordnet

  根据 volcabulary 软件模块,介绍volcabulary获得单词的信息。但是在安装volcabulary的时候出现错误。

python -m pip install nltk

1.2.1 测试 wordnet

import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *

from nltk.corpus import wordnet
syns = wordnet.synsets('car')
print(syns)

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

  测试结果:测试错误。 无法工作。

1.3 eng-to-ipa

  在 Convert English text into the Phonetics using Python 介绍利用“eng-to-ipa” 获取英语单词的读音。

  这个软件报是将英语单词转换为 IPA (International Phonetic Alphabet)。

  详细的介绍可以参见: English to IPA (eng_to_ipa)

1.3.1 安装软件包

python -m pip install eng-to-ipa

▲ 图1.3.1 python -m pip install eng-to-ipa

▲ 图1.3.1 python -m pip install eng-to-ipa

1.3.2 测试软件包

import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *

import eng_to_ipa as p

ipa = p.convert("Hello Geeks.")

filename = r'd:\temp\1.txt'
with open(filename, 'w', encoding='utf-8') as f:
    f.write(ipa)

print('\a')

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

▲ 图1.3.2 音标对应的显示

▲ 图1.3.2 音标对应的显示

ipa = p.convert("Using ipa-list() instead of convert()")

  
 
  • 1

▲ 图1.3.3 音标显示

▲ 图1.3.3 音标显示

ipa = p.ipa_list("Yes I am geeks, How are you.")

  
 
  • 1

▲ 图1.3.4 音标显示

▲ 图1.3.4 音标显示

1.3.3 get_rhymes

py = p.get_rhymes('test')
printf(py)

  
 
  • 1
  • 2
['abreast', 'acquiesced', 'addressed', 'addwest', 'arrest', 'assessed', 'attest', 'behest', 'bequest', 'best', 'beste', 'blessed', 'blest', 'breast', 'brest', 'bud-test', "c'est", 'caressed', 'celeste', 'charest', 'chest', 'chrest', 'coalesced', 'compressed', 'confessed', 'congest', 'contest', 'crest', "d'allest", 'depressed', 'dest', 'detest', 'digest', 'digressed', 'dispossessed', 'distressed', 'divest', 'dressed', 'eastern-west', 'est', 'expressed', 'farwest', 'fessed', 'fest', 'finessed', 'gest', 'guessed', 'guest', 'impressed', 'indigest', 'infest', 'ingest', 'intrawest', 'invest', 'jest', 'key_west', 'lest', 'messed', 'mest', 'midwest', 'molest', 'natwest', 'nest', 'neste', 'northwest', 'norwest', 'obsessed', 'oppressed', 'penwest', 'pest', 'possessed', 'pressed', 'prest', 'professed', 'progressed', 'protest', 'quest', 'rearrest', 'reassessed', 'recessed', 'reinvest', 'repossessed', 'repressed', 'request', 'rest', 'retest', 'self-professed', 'southwest', 'stateswest', 'stressed', 'suggest', 'suppressed', 'sylvest', 'telequest', 'telewest', 'transgressed', 'trest', 'unaddressed', 'undressed', 'unimpressed', 'unrest', 'vest', 'west', 'wrest', 'yest', 'yoest', 'zest']

  
 
  • 1

1.3.4 如何显示IPA?

  很尴尬的问题,就是eng_to_ipa输出的文字中包含有很多无法直接显示的字符。那么如何将其进行转化成ASCII?

  在 ipapy 0.0.9.0 给出了关于IPA 的python软件包。

(1)安装 IPAPY

python -m pip install ipapy

  测试了这个软件,但是好像无法进行转换。

 

§02 示IPA


  现在的TEASOFT软件中还无法显示 IPA 的代码。该如何将IPA转换成标准的ASCII呢?

  在 IPA to plain simple English translator 给出了这方面的提问。 也就是将IPA转换成 : American Heritage Dictionary uses

2.1 转换程序

  ipa_converter 给出了一个将IPA转换成 SAMPA - computer readable phonetic alphabet

2.1.1 转换程序

import eng_to_ipa as p
import string

#------------------------------------------------------------
sampafile = r'D:\Temp\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

#------------------------------------------------------------
ipa = p.convert("Yes I am geeks, How are you.")

out = []
for c in ipa:
    if c in table: c= table[c]
    elif c not in string.printable: c = ''

    out.append(c)

printf(''.join(out))

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

2.1.2 测试结果

ipa = p.convert("Yes I am geeks, How are you.")

  
 
  • 1
jEs aI {m giks, haU @r ju.

  
 
  • 1
ipa = p.convert("instead")

  
 
  • 1
%In"stEd

  
 
  • 1

2.2 gruut-ipa

  从 Gruut IPA 安装软件包。

▲ 图2.1.1 SAMPA 对应表格

▲ 图2.1.1 SAMPA 对应表格

2.2.1 安装软件包

python -m pip install gruut-ipa

IPA eSpeak Sampa Description
i i i close front unrounded vowel
y y y close front rounded vowel
ɨ i 1 close central unrounded vowel
ʉ u } close central rounded vowel
ɯ u- M close back unrounded vowel
u u u close back rounded vowel
ɪ I I near-close near-front unrounded vowel
ʏ I. Y near-close near-front rounded vowel
ʊ U U near-close near-back rounded vowel
e e e close-mid front unrounded vowel
ø Y 2 close-mid front rounded vowel
ɘ @ @ close-mid central unrounded vowel
ɵ @. 8 close-mid central rounded vowel
ɤ o- 7 close-mid back unrounded vowel
o o o close-mid back rounded vowel
ɛ E E open-mid front unrounded vowel
œ W 9 open-mid front rounded vowel
ɜ V 3 open-mid central unrounded vowel
ɞ O 3 open-mid central rounded vowel
ʌ V V open-mid back unrounded vowel
ɔ O O open-mid back rounded vowel
æ a { near-open front unrounded vowel
ɐ V 6 near-open central unrounded vowel
a a a open front unrounded vowel
ɶ W & open front rounded vowel
ɑ A A open back unrounded vowel
ɒ A. Q open back rounded vowel
m m m voiced bilabial nasal
ɱ M F voiced labio-dental nasal
n n n voiced alveolar nasal
ɳ n. n` voiced retroflex nasal
ŋ N N voiced velar nasal
ɴ n N voiced uvular nasal
p p p voiceless bilabial plosive
b b b voiced bilabial plosive
t t t voiceless alveolar plosive
d d d voiced alveolar plosive
ʈ t. t` voiceless retroflex plosive
ɖ d. d` voiced retroflex plosive
c c c voiceless palatal plosive
ɟ J J voiced palatal plosive
k k k voiceless velar plosive
ɡ g g voiced velar plosive
g g g voiced velar plosive
q q q voiceless uvular plosive
ɢ G G voiced uvular plosive
ʡ > voiceless pharyngeal plosive
ʔ ? ? voiceless glottal plosive
p͡f pf pf voiceless labio-dental affricate
b͡v bv bv voiced dental affricate
t̪͡s ts t ds voiceless dental affricate
t͡s ts ts voiceless alveolar affricate
d͡z dz dz voiced alveolar affricate
t͡ʃ tS tS voiceless post-alveolar affricate
d͡ʒ dZ dZ voiced post-alveolar affricate
ʈ͡ʂ tS ts` voiceless retroflex affricate
ɖ͡ʐ dz dz` voiced retroflex affricate
t͡ɕ tS; ts voiceless palatal affricate
d͡ʑ dZ; dz voiced palatal affricate
k͡x k k x voiceless velar affricate
ɸ F p voiceless bilabial fricative
β B B voiced bilabial fricative
f f f voiceless labio-dental fricative
v v v voiced labio-dental fricative
θ T T voiceless dental fricative
ð D D voiced dental fricative
s s s voiceless alveolar fricative
z z z voiced alveolar fricative
ʃ S S voiceless post-alveolar fricative
ʒ Z Z voiced post-alveolar fricative
ʂ s. s` voiceless retroflex fricative
ʐ z. z` voiced palatal fricative
ç C C voiceless palatal fricative
x x x voiceless velar fricative
ɣ Q G voiced velar fricative
χ X X voiceless uvular fricative
ʁ g R voiced uvular fricative
ħ H X voiceless pharyngeal fricative
h h h voiceless glottal fricative
ɦ h<?> h voiced glottal fricative
w w w voiced bilabial approximant
ʋ v# v voiced labio-dental approximant
ɹ r r voiced alveolar approximant
ɻ r. r ` voiced retroflex approximant
j j j voiced palatal approximant
ɰ Q M voiced velar approximant
voiced labio-dental flap
ɾ * 4 voiced alveolar flap
ɽ *. r` voiced retroflex flap
ʙ b B voiced bilabial trill
r r r voiced alveolar trill
ʀ r R voiced uvular trill
l l l voiced alveolar lateral-approximant
ɫ l 5 voiced alveolar lateral-approximant
ɭ l. l` voiced retroflex lateral-approximant
ʎ l^ L voiced palatal lateral-approximant
ʟ L L voiced velar lateral-approximant
ə @ @ schwa
ɚ 3 @` r-coloured schwa
ɝ 3 @` r-coloured schwa
ɹ̩ r- r ̩ voiced alveolar approximant

  这个软件包似乎只能通过命令行来调用。

2.3 SAMPA字符对应表格

  下面表格从 sampa.cfg 拷贝粘贴过来。

A       ɑ       script a        open back unrounded, Cardinal 5, Eng. start
{       æ       ae ligature     near-open front unrounded, Eng. trap
6       ɐ       turned a        open schwa, Ger. besser
Q       ɒ       turned script a open back rounded, Eng. lot
E       ɛ       epsilon         open-mid front unrounded, C3, Fr. même
@       ə       turned e        schwa, Eng. banana
3       ɜ       rev. epsilon    long mid central, Eng. nurse
I       ɪ       small cap I     lax close front unrounded, Eng. kit
O       ɔ       turned c        open-mid back rounded, Eng. thought
2       ø       o-slash         close-mid front rounded, Fr. deux
9       œ       oe ligature     open-mid front rounded, Fr. neuf
&       ɶ       s.c. OE lig.    open front rounded
U       ʊ       upsilon         lax close back rounded, Eng. foot
}       ʉ       barred u        close central rounded, Swedish sju
V       ʌ       turned v        open-mid back unrounded, Eng. strut
Y       ʏ       small cap Y     lax [y], Ger. hübsch
B       β       beta            voiced bilabial fricative, Sp. cabo
C       ç       c-cedilla       voiceless palatal fricative, Ger. ich
D       ð       eth             voiced dental fricative, Eng. then
G       ɣ       gamma           voiced velar fricative, Sp. fuego
L       ʎ       turned y        palatal lateral, It. famiglia
J       ɲ       left-tail n     palatal nasal, Sp. año
N       ŋ       eng             velar nasal, Eng. thing
R       ʁ       inv. s.c. R     vd. uvular fric. or trill, Fr. roi
S       ʃ       esh             voiceless palatoalveolar fricative, Eng. ship
T       θ       theta           voiceless dental fricative, Eng. thin
H       ɥ       turned h        labial-palatal semivowel, Fr. huit
Z       ʒ       ezh (yogh)      vd. palatoalveolar fric., Eng. measure
?       ʔ       dotless ?       glottal stop, Ger. Verein, also Danish stød
:       ː       length mark     length mark
"       ˈ       vertical stroke primary stress *
%       ˌ       low vert. str.  secondary stress
s'      ʂ       Added for Russian support (ш)
s\      ɕ       Added for Russian support (щ)
s'      ʐ       Added for Russian support (ж)
z\      ʑ       Added for Russian support (ж)
1       ɨ       Added for Russian support (и, sometimes ы)
8       ɵ       Added for Russian support (ё)
_j      ʲ       Added for Russian support (ь)

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39

  将上述文本存储粘贴在 notepad中,并以 ‘utf-8’ 格式进行存储。

#!/usr/bin/python

from __future__ import print_function
from string import printable
import sys, locale

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser(description='Converts IPA input to ASCII output. Does not handle certain cases: nasals, tones, syllabic consonants, and non-SAMPA representable phonetic symbols. Writes to stdout.')
    parser.add_argument('-c','--config',help='Name of the configuration table. Defaults to "sampa.cfg" in the working directory.',default='sampa.cfg')
    parser.add_argument('source',help='IPA symbols to convert.')
    args = parser.parse_args(sys.argv[1:])
    config = args.config
    source = args.source.decode(locale.getpreferredencoding())
    print(source)

    config = open(config,'r')
    table = {}
    for line in config:
        line=line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0].decode('utf-8')
        ipa_symb = row[1].decode('utf-8')
        table[ipa_symb] = sampa_symb

    out = []
    for c in source:
        if c in table: c = table[c]
        elif c not in printable: c = ''
        out.append(c)
    print(''.join(out))

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

 

§03 造程序


3.1 改造原料

  • sampa.cfg : 对应反映表格,是一个 “utf-8” 格式的文本文件。

  • 代码:

import string

sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

def ipa2sampa(ipa):
    out = []
    for c in ipa:
        if c in table: c = table[c]
        elif c not in string.printable: c = ''

        out.append(c)

    return ''.join(out)

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

3.2 改造小程序

  主要改造的小程序包括:

  • cal
  • cdtm

3.2.1 增加IPA-SAMPA

#------------------------------------------------------------
import eng_to_ipa as ipa
import string

sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

def ipa2sampa(ipa):
    out = []
    for c in ipa:
        if c in table: c = table[c]
        elif c not in string.printable: c = ''

        out.append(c)

    return ''.join(out)

#------------------------------------------------------------
import json
import requests

def translate(word):
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
    key = {
        'type': "AUTO",
        'i': word,
        "doctype": "json",
        "version": "2.1",
        "keyfrom": "fanyi.web",
        "ue": "UTF-8",
        "action": "FY_BY_CLICKBUTTON",
        "typoResult": "true"
    }
    response = requests.post(url, data=key)
    if response.status_code == 200:
        return response.text
    else:
        return None

def get_reuslt(repsonse):
    result = json.loads(repsonse)

    origin = result['translateResult'][0][0]['src']
    target = result['translateResult'][0][0]['tgt']

    originflag = 1
    targetflag = 1

    for c in origin:
        if ord(c) >= 0x80:
            originflag = 0
            break

    for c in target:
        if ord(c) >= 0x80:
            targetflag = 0
            break

    originsampa = ''
    targetsampa = ''

    if originflag > 0:
        originsampa = "/%s/"%ipa2sampa(ipa.convert(origin))
        for c in originsampa:
            if c not in string.printable:
                originsampa = ''
                break

    if targetflag > 0:
        targetsampa = "/%s/"%ipa2sampa(ipa.convert(target))
        for c in targetsampa:
            if c not in string.printable:
                targetsampa = ''
                break

    printf ("%s%s --> %s%s" %(result['translateResult'][0][0]['src'], originsampa,
                              result['translateResult'][0][0]['tgt'], targetsampa))

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85

3.2.2 允许结果

靠谱 --> By spectrum/baI "spEktr@m/
command/k@"m{nd/ --> 命令

  
 
  • 1
  • 2
python --> python
input/"In%pUt/ --> 输入
python --> python
cmd/cmd*/ --> cmd/cmd*/

append/@"pEnd/ --> 附加

input/"In%pUt/ --> 输入
python --> python
command/k@"m{nd/ --> 命令

命令 --> The command/D@ k@"m{nd/
事情 --> things/TINz/
input/"In%pUt/ --> 输入

backup/"b{%k@p/ --> 备份

  
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

 

  结 ※


  了获得英文单词的读音并进行显示,使用 eng_to_ipa 或者单词的 IPA, 然后通过自行编写的转换程序,将IPA 转换成 sampa并进行显示。


■ 相关文献链接:

● 相关图表链接:

文章来源: zhuoqing.blog.csdn.net,作者:卓晴,版权归原作者所有,如需转载,请联系作者。

原文链接:zhuoqing.blog.csdn.net/article/details/122996325

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。