如何获得英语单词的发音?增加 IPA-SAMPA
简 介: 为了获得英文单词的读音并进行显示,使用 eng_to_ipa 或者单词的 IPA, 然后通过自行编写的转换程序,将IPA 转换成 sampa并进行显示。
关键词
: sampa,ipa
§01 单词发音
英语的单词的认知需要同时配合他的发音才能够有更好的效果。在测试一些利用PYTHON完成中英文翻译的效果利用有道在线翻译对于TEASOFT
软件Python
附加程序进行了英语查询扩充,但是现在还缺少读音的提示,下面查看一些辅助的工具,看是否能够将这个功能补齐。
1.1 文字转语音
在 Python(九)- 音频文字转换 给出了几个语音模块:
- pyttsx3 :软件包,是系统内置的语音引擎实现发音,不生成
MP3
; - win32com:
Windows
操作系统内置的语音引擎实现文字发音;
在 更新pip3与pyttsx3文字语音转换 中指出在安装 pyttsx3
之前需要安装 pypiwin32
。
1.1.1 安装软件包
(1)安装 pypiwin32
python -m pip install pypiwin32
(2)安装 pyttsx3
python -m pip install pyttsx3
1.1.2 测试软件包
Ⅰ.汉语读音
import pyttsx3
engine = pyttsx3.init()
engine.say('人类真帅')
engine.runAndWait()
- 1
- 2
- 3
- 4
可以听到“人类真帅” 的输出语言。
Ⅱ.英语读音
engine.say(‘Windows is an OS.’)
也可以听到英文的读出(“Windows is an OS.”)
但是上述的输出有着明显的“机器人” 的强调,不是非常自认的语言。
1.2 wordnet
根据 volcabulary 软件模块,介绍volcabulary获得单词的信息。但是在安装volcabulary的时候出现错误。
python -m pip install nltk
1.2.1 测试 wordnet
import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *
from nltk.corpus import wordnet
syns = wordnet.synsets('car')
print(syns)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
测试结果:测试错误。 无法工作。
1.3 eng-to-ipa
在 Convert English text into the Phonetics using Python 介绍利用“eng-to-ipa” 获取英语单词的读音。
这个软件报是将英语单词转换为 IPA (International Phonetic Alphabet)。
详细的介绍可以参见: English to IPA (eng_to_ipa) 。
1.3.1 安装软件包
python -m pip install eng-to-ipa
▲ 图1.3.1 python -m pip install eng-to-ipa
1.3.2 测试软件包
import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *
import eng_to_ipa as p
ipa = p.convert("Hello Geeks.")
filename = r'd:\temp\1.txt'
with open(filename, 'w', encoding='utf-8') as f:
f.write(ipa)
print('\a')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
▲ 图1.3.2 音标对应的显示
ipa = p.convert("Using ipa-list() instead of convert()")
- 1
▲ 图1.3.3 音标显示
ipa = p.ipa_list("Yes I am geeks, How are you.")
- 1
▲ 图1.3.4 音标显示
1.3.3 get_rhymes
py = p.get_rhymes('test')
printf(py)
- 1
- 2
['abreast', 'acquiesced', 'addressed', 'addwest', 'arrest', 'assessed', 'attest', 'behest', 'bequest', 'best', 'beste', 'blessed', 'blest', 'breast', 'brest', 'bud-test', "c'est", 'caressed', 'celeste', 'charest', 'chest', 'chrest', 'coalesced', 'compressed', 'confessed', 'congest', 'contest', 'crest', "d'allest", 'depressed', 'dest', 'detest', 'digest', 'digressed', 'dispossessed', 'distressed', 'divest', 'dressed', 'eastern-west', 'est', 'expressed', 'farwest', 'fessed', 'fest', 'finessed', 'gest', 'guessed', 'guest', 'impressed', 'indigest', 'infest', 'ingest', 'intrawest', 'invest', 'jest', 'key_west', 'lest', 'messed', 'mest', 'midwest', 'molest', 'natwest', 'nest', 'neste', 'northwest', 'norwest', 'obsessed', 'oppressed', 'penwest', 'pest', 'possessed', 'pressed', 'prest', 'professed', 'progressed', 'protest', 'quest', 'rearrest', 'reassessed', 'recessed', 'reinvest', 'repossessed', 'repressed', 'request', 'rest', 'retest', 'self-professed', 'southwest', 'stateswest', 'stressed', 'suggest', 'suppressed', 'sylvest', 'telequest', 'telewest', 'transgressed', 'trest', 'unaddressed', 'undressed', 'unimpressed', 'unrest', 'vest', 'west', 'wrest', 'yest', 'yoest', 'zest']
- 1
1.3.4 如何显示IPA?
很尴尬的问题,就是eng_to_ipa输出的文字中包含有很多无法直接显示的字符。那么如何将其进行转化成ASCII?
在 ipapy 0.0.9.0 给出了关于IPA 的python软件包。
(1)安装 IPAPY
python -m pip install ipapy
测试了这个软件,但是好像无法进行转换。
§02 显示IPA
在现在的TEASOFT软件中还无法显示 IPA 的代码。该如何将IPA转换成标准的ASCII呢?
在 IPA to plain simple English translator 给出了这方面的提问。 也就是将IPA转换成 : American Heritage Dictionary uses 。
2.1 转换程序
ipa_converter 给出了一个将IPA转换成 SAMPA - computer readable phonetic alphabet
2.1.1 转换程序
import eng_to_ipa as p
import string
#------------------------------------------------------------
sampafile = r'D:\Temp\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
for line in f:
line = line.strip()
if line == '': continue
row = line.split()
sampa_symb = row[0]
ipa_symb = row[1]
table[ipa_symb] = sampa_symb
#------------------------------------------------------------
ipa = p.convert("Yes I am geeks, How are you.")
out = []
for c in ipa:
if c in table: c= table[c]
elif c not in string.printable: c = ''
out.append(c)
printf(''.join(out))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
2.1.2 测试结果
ipa = p.convert("Yes I am geeks, How are you.")
- 1
jEs aI {m giks, haU @r ju.
- 1
ipa = p.convert("instead")
- 1
%In"stEd
- 1
2.2 gruut-ipa
从 Gruut IPA 安装软件包。
▲ 图2.1.1 SAMPA 对应表格
2.2.1 安装软件包
python -m pip install gruut-ipa
IPA | eSpeak | Sampa | Description |
---|---|---|---|
i | i | i | close front unrounded vowel |
y | y | y | close front rounded vowel |
ɨ | i | 1 | close central unrounded vowel |
ʉ | u | } | close central rounded vowel |
ɯ | u- | M | close back unrounded vowel |
u | u | u | close back rounded vowel |
ɪ | I | I | near-close near-front unrounded vowel |
ʏ | I. | Y | near-close near-front rounded vowel |
ʊ | U | U | near-close near-back rounded vowel |
e | e | e | close-mid front unrounded vowel |
ø | Y | 2 | close-mid front rounded vowel |
ɘ | @ | @ | close-mid central unrounded vowel |
ɵ | @. | 8 | close-mid central rounded vowel |
ɤ | o- | 7 | close-mid back unrounded vowel |
o | o | o | close-mid back rounded vowel |
ɛ | E | E | open-mid front unrounded vowel |
œ | W | 9 | open-mid front rounded vowel |
ɜ | V | 3 | open-mid central unrounded vowel |
ɞ | O | 3 | open-mid central rounded vowel |
ʌ | V | V | open-mid back unrounded vowel |
ɔ | O | O | open-mid back rounded vowel |
æ | a | { | near-open front unrounded vowel |
ɐ | V | 6 | near-open central unrounded vowel |
a | a | a | open front unrounded vowel |
ɶ | W | & | open front rounded vowel |
ɑ | A | A | open back unrounded vowel |
ɒ | A. | Q | open back rounded vowel |
m | m | m | voiced bilabial nasal |
ɱ | M | F | voiced labio-dental nasal |
n | n | n | voiced alveolar nasal |
ɳ | n. | n` | voiced retroflex nasal |
ŋ | N | N | voiced velar nasal |
ɴ | n | N | voiced uvular nasal |
p | p | p | voiceless bilabial plosive |
b | b | b | voiced bilabial plosive |
t | t | t | voiceless alveolar plosive |
d | d | d | voiced alveolar plosive |
ʈ | t. | t` | voiceless retroflex plosive |
ɖ | d. | d` | voiced retroflex plosive |
c | c | c | voiceless palatal plosive |
ɟ | J | J | voiced palatal plosive |
k | k | k | voiceless velar plosive |
ɡ | g | g | voiced velar plosive |
g | g | g | voiced velar plosive |
q | q | q | voiceless uvular plosive |
ɢ | G | G | voiced uvular plosive |
ʡ | > | voiceless pharyngeal plosive | |
ʔ | ? | ? | voiceless glottal plosive |
p͡f | pf | pf | voiceless labio-dental affricate |
b͡v | bv | bv | voiced dental affricate |
t̪͡s | ts | t ds | voiceless dental affricate |
t͡s | ts | ts | voiceless alveolar affricate |
d͡z | dz | dz | voiced alveolar affricate |
t͡ʃ | tS | tS | voiceless post-alveolar affricate |
d͡ʒ | dZ | dZ | voiced post-alveolar affricate |
ʈ͡ʂ | tS | ts` | voiceless retroflex affricate |
ɖ͡ʐ | dz | dz` | voiced retroflex affricate |
t͡ɕ | tS; | ts | voiceless palatal affricate |
d͡ʑ | dZ; | dz | voiced palatal affricate |
k͡x | k | k x | voiceless velar affricate |
ɸ | F | p | voiceless bilabial fricative |
β | B | B | voiced bilabial fricative |
f | f | f | voiceless labio-dental fricative |
v | v | v | voiced labio-dental fricative |
θ | T | T | voiceless dental fricative |
ð | D | D | voiced dental fricative |
s | s | s | voiceless alveolar fricative |
z | z | z | voiced alveolar fricative |
ʃ | S | S | voiceless post-alveolar fricative |
ʒ | Z | Z | voiced post-alveolar fricative |
ʂ | s. | s` | voiceless retroflex fricative |
ʐ | z. | z` | voiced palatal fricative |
ç | C | C | voiceless palatal fricative |
x | x | x | voiceless velar fricative |
ɣ | Q | G | voiced velar fricative |
χ | X | X | voiceless uvular fricative |
ʁ | g | R | voiced uvular fricative |
ħ | H | X | voiceless pharyngeal fricative |
h | h | h | voiceless glottal fricative |
ɦ | h<?> | h | voiced glottal fricative |
w | w | w | voiced bilabial approximant |
ʋ | v# | v | voiced labio-dental approximant |
ɹ | r | r | voiced alveolar approximant |
ɻ | r. | r ` | voiced retroflex approximant |
j | j | j | voiced palatal approximant |
ɰ | Q | M | voiced velar approximant |
ⱱ | ⱱ | ⱱ | voiced labio-dental flap |
ɾ | * | 4 | voiced alveolar flap |
ɽ | *. | r` | voiced retroflex flap |
ʙ | b | B | voiced bilabial trill |
r | r | r | voiced alveolar trill |
ʀ | r | R | voiced uvular trill |
l | l | l | voiced alveolar lateral-approximant |
ɫ | l | 5 | voiced alveolar lateral-approximant |
ɭ | l. | l` | voiced retroflex lateral-approximant |
ʎ | l^ | L | voiced palatal lateral-approximant |
ʟ | L | L | voiced velar lateral-approximant |
ə | @ | @ | schwa |
ɚ | 3 | @` | r-coloured schwa |
ɝ | 3 | @` | r-coloured schwa |
ɹ̩ | r- | r ̩ | voiced alveolar approximant |
这个软件包似乎只能通过命令行来调用。
2.3 SAMPA字符对应表格
下面表格从 sampa.cfg 拷贝粘贴过来。
A ɑ script a open back unrounded, Cardinal 5, Eng. start
{ æ ae ligature near-open front unrounded, Eng. trap
6 ɐ turned a open schwa, Ger. besser
Q ɒ turned script a open back rounded, Eng. lot
E ɛ epsilon open-mid front unrounded, C3, Fr. même
@ ə turned e schwa, Eng. banana
3 ɜ rev. epsilon long mid central, Eng. nurse
I ɪ small cap I lax close front unrounded, Eng. kit
O ɔ turned c open-mid back rounded, Eng. thought
2 ø o-slash close-mid front rounded, Fr. deux
9 œ oe ligature open-mid front rounded, Fr. neuf
& ɶ s.c. OE lig. open front rounded
U ʊ upsilon lax close back rounded, Eng. foot
} ʉ barred u close central rounded, Swedish sju
V ʌ turned v open-mid back unrounded, Eng. strut
Y ʏ small cap Y lax [y], Ger. hübsch
B β beta voiced bilabial fricative, Sp. cabo
C ç c-cedilla voiceless palatal fricative, Ger. ich
D ð eth voiced dental fricative, Eng. then
G ɣ gamma voiced velar fricative, Sp. fuego
L ʎ turned y palatal lateral, It. famiglia
J ɲ left-tail n palatal nasal, Sp. año
N ŋ eng velar nasal, Eng. thing
R ʁ inv. s.c. R vd. uvular fric. or trill, Fr. roi
S ʃ esh voiceless palatoalveolar fricative, Eng. ship
T θ theta voiceless dental fricative, Eng. thin
H ɥ turned h labial-palatal semivowel, Fr. huit
Z ʒ ezh (yogh) vd. palatoalveolar fric., Eng. measure
? ʔ dotless ? glottal stop, Ger. Verein, also Danish stød
: ː length mark length mark
" ˈ vertical stroke primary stress *
% ˌ low vert. str. secondary stress
s' ʂ Added for Russian support (ш)
s\ ɕ Added for Russian support (щ)
s' ʐ Added for Russian support (ж)
z\ ʑ Added for Russian support (ж)
1 ɨ Added for Russian support (и, sometimes ы)
8 ɵ Added for Russian support (ё)
_j ʲ Added for Russian support (ь)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
将上述文本存储粘贴在 notepad中,并以 ‘utf-8’ 格式进行存储。
#!/usr/bin/python
from __future__ import print_function
from string import printable
import sys, locale
if __name__ == '__main__':
from argparse import ArgumentParser
parser = ArgumentParser(description='Converts IPA input to ASCII output. Does not handle certain cases: nasals, tones, syllabic consonants, and non-SAMPA representable phonetic symbols. Writes to stdout.')
parser.add_argument('-c','--config',help='Name of the configuration table. Defaults to "sampa.cfg" in the working directory.',default='sampa.cfg')
parser.add_argument('source',help='IPA symbols to convert.')
args = parser.parse_args(sys.argv[1:])
config = args.config
source = args.source.decode(locale.getpreferredencoding())
print(source)
config = open(config,'r')
table = {}
for line in config:
line=line.strip()
if line == '': continue
row = line.split()
sampa_symb = row[0].decode('utf-8')
ipa_symb = row[1].decode('utf-8')
table[ipa_symb] = sampa_symb
out = []
for c in source:
if c in table: c = table[c]
elif c not in printable: c = ''
out.append(c)
print(''.join(out))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
§03 改造程序
3.1 改造原料
-
sampa.cfg : 对应反映表格,是一个 “utf-8” 格式的文本文件。
-
代码:
import string
sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
for line in f:
line = line.strip()
if line == '': continue
row = line.split()
sampa_symb = row[0]
ipa_symb = row[1]
table[ipa_symb] = sampa_symb
def ipa2sampa(ipa):
out = []
for c in ipa:
if c in table: c = table[c]
elif c not in string.printable: c = ''
out.append(c)
return ''.join(out)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
3.2 改造小程序
主要改造的小程序包括:
- cal
- cdtm
3.2.1 增加IPA-SAMPA
#------------------------------------------------------------
import eng_to_ipa as ipa
import string
sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
for line in f:
line = line.strip()
if line == '': continue
row = line.split()
sampa_symb = row[0]
ipa_symb = row[1]
table[ipa_symb] = sampa_symb
def ipa2sampa(ipa):
out = []
for c in ipa:
if c in table: c = table[c]
elif c not in string.printable: c = ''
out.append(c)
return ''.join(out)
#------------------------------------------------------------
import json
import requests
def translate(word):
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
key = {
'type': "AUTO",
'i': word,
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"ue": "UTF-8",
"action": "FY_BY_CLICKBUTTON",
"typoResult": "true"
}
response = requests.post(url, data=key)
if response.status_code == 200:
return response.text
else:
return None
def get_reuslt(repsonse):
result = json.loads(repsonse)
origin = result['translateResult'][0][0]['src']
target = result['translateResult'][0][0]['tgt']
originflag = 1
targetflag = 1
for c in origin:
if ord(c) >= 0x80:
originflag = 0
break
for c in target:
if ord(c) >= 0x80:
targetflag = 0
break
originsampa = ''
targetsampa = ''
if originflag > 0:
originsampa = "/%s/"%ipa2sampa(ipa.convert(origin))
for c in originsampa:
if c not in string.printable:
originsampa = ''
break
if targetflag > 0:
targetsampa = "/%s/"%ipa2sampa(ipa.convert(target))
for c in targetsampa:
if c not in string.printable:
targetsampa = ''
break
printf ("%s%s --> %s%s" %(result['translateResult'][0][0]['src'], originsampa,
result['translateResult'][0][0]['tgt'], targetsampa))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
3.2.2 允许结果
靠谱 --> By spectrum/baI "spEktr@m/
command/k@"m{nd/ --> 命令
- 1
- 2
python --> python
input/"In%pUt/ --> 输入
python --> python
cmd/cmd*/ --> cmd/cmd*/
append/@"pEnd/ --> 附加
input/"In%pUt/ --> 输入
python --> python
command/k@"m{nd/ --> 命令
命令 --> The command/D@ k@"m{nd/
事情 --> things/TINz/
input/"In%pUt/ --> 输入
backup/"b{%k@p/ --> 备份
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
※ 总 结 ※
为了获得英文单词的读音并进行显示,使用 eng_to_ipa 或者单词的 IPA, 然后通过自行编写的转换程序,将IPA 转换成 sampa并进行显示。
■ 相关文献链接:
- 测试一些利用PYTHON完成中英文翻译的效果
- Python(九)- 音频文字转换
- 更新pip3与pyttsx3文字语音转换
- volcabulary
- Convert English text into the Phonetics using Python
- English to IPA (eng_to_ipa)
- ipapy 0.0.9.0
- IPA to plain simple English translator
- ipa_converter
- SAMPA - computer readable phonetic alphabet
- Gruut IPA
- sampa.cfg
● 相关图表链接:
文章来源: zhuoqing.blog.csdn.net,作者:卓晴,版权归原作者所有,如需转载,请联系作者。
原文链接:zhuoqing.blog.csdn.net/article/details/122996325
- 点赞
- 收藏
- 关注作者
评论(0)