- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

如何获得英语单词的发音？增加 IPA-SAMPA

tsinghuazhuoqing 发表于 2022/02/19 22:08:08 2022/02/19

【摘要】简介：为了获得英文单词的读音并进行显示，使用 eng_to_ipa 或者单词的 IPA，然后通过自行编写的转换程序，将IPA 转换成 sampa并进行显示。关键词： sampa，ipa ...

简介：为了获得英文单词的读音并进行显示，使用 eng_to_ipa 或者单词的 IPA，然后通过自行编写的转换程序，将IPA 转换成 sampa并进行显示。

关键词： sampa，ipa

§01 单词发音

英语的单词的认知需要同时配合他的发音才能够有更好的效果。在测试一些利用PYTHON完成中英文翻译的效果利用有道在线翻译对于TEASOFT软件Python附加程序进行了英语查询扩充，但是现在还缺少读音的提示，下面查看一些辅助的工具，看是否能够将这个功能补齐。

1.1 文字转语音

在 Python（九）- 音频文字转换 给出了几个语音模块：

pyttsx3 ：软件包，是系统内置的语音引擎实现发音，不生成MP3；
win32com：Windows 操作系统内置的语音引擎实现文字发音；

在 更新pip3与pyttsx3文字语音转换 中指出在安装 pyttsx3之前需要安装 pypiwin32。

1.1.1 安装软件包

（1）安装 pypiwin32

python -m pip install pypiwin32

（2）安装 pyttsx3

python -m pip install pyttsx3

1.1.2 测试软件包

Ⅰ.汉语读音

import pyttsx3
engine = pyttsx3.init()
engine.say('人类真帅')
engine.runAndWait()

  
 
  1
  2
  3
  4

可以听到“人类真帅” 的输出语言。

Ⅱ.英语读音

engine.say(‘Windows is an OS.’)

也可以听到英文的读出（“Windows is an OS.”）

但是上述的输出有着明显的“机器人” 的强调，不是非常自认的语言。

1.2 wordnet

根据 volcabulary 软件模块，介绍volcabulary获得单词的信息。但是在安装volcabulary的时候出现错误。

python -m pip install nltk

1.2.1 测试 wordnet

import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *

from nltk.corpus import wordnet
syns = wordnet.synsets('car')
print(syns)

  
 
  1
  2
  3
  4
  5
  6
  7

测试结果：测试错误。无法工作。

1.3 eng-to-ipa

在 Convert English text into the Phonetics using Python 介绍利用“eng-to-ipa” 获取英语单词的读音。

这个软件报是将英语单词转换为 IPA （International Phonetic Alphabet）。

详细的介绍可以参见： English to IPA (eng_to_ipa) 。

1.3.1 安装软件包

python -m pip install eng-to-ipa

▲ 图1.3.1 python -m pip install eng-to-ipa

1.3.2 测试软件包

import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *

import eng_to_ipa as p

ipa = p.convert("Hello Geeks.")

filename = r'd:\temp\1.txt'
with open(filename, 'w', encoding='utf-8') as f:
    f.write(ipa)

print('\a')

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13

▲ 图1.3.2 音标对应的显示

ipa = p.convert("Using ipa-list() instead of convert()")

  
 
  1

▲ 图1.3.3 音标显示

ipa = p.ipa_list("Yes I am geeks, How are you.")

  
 
  1

▲ 图1.3.4 音标显示

1.3.3 get_rhymes

py = p.get_rhymes('test')
printf(py)

  
 
  1
  2

['abreast', 'acquiesced', 'addressed', 'addwest', 'arrest', 'assessed', 'attest', 'behest', 'bequest', 'best', 'beste', 'blessed', 'blest', 'breast', 'brest', 'bud-test', "c'est", 'caressed', 'celeste', 'charest', 'chest', 'chrest', 'coalesced', 'compressed', 'confessed', 'congest', 'contest', 'crest', "d'allest", 'depressed', 'dest', 'detest', 'digest', 'digressed', 'dispossessed', 'distressed', 'divest', 'dressed', 'eastern-west', 'est', 'expressed', 'farwest', 'fessed', 'fest', 'finessed', 'gest', 'guessed', 'guest', 'impressed', 'indigest', 'infest', 'ingest', 'intrawest', 'invest', 'jest', 'key_west', 'lest', 'messed', 'mest', 'midwest', 'molest', 'natwest', 'nest', 'neste', 'northwest', 'norwest', 'obsessed', 'oppressed', 'penwest', 'pest', 'possessed', 'pressed', 'prest', 'professed', 'progressed', 'protest', 'quest', 'rearrest', 'reassessed', 'recessed', 'reinvest', 'repossessed', 'repressed', 'request', 'rest', 'retest', 'self-professed', 'southwest', 'stateswest', 'stressed', 'suggest', 'suppressed', 'sylvest', 'telequest', 'telewest', 'transgressed', 'trest', 'unaddressed', 'undressed', 'unimpressed', 'unrest', 'vest', 'west', 'wrest', 'yest', 'yoest', 'zest']

  
 
  1

1.3.4 如何显示IPA？

很尴尬的问题，就是eng_to_ipa输出的文字中包含有很多无法直接显示的字符。那么如何将其进行转化成ASCII？

在 ipapy 0.0.9.0 给出了关于IPA 的python软件包。

（1）安装 IPAPY

python -m pip install ipapy

测试了这个软件，但是好像无法进行转换。

§02 显示IPA

在现在的TEASOFT软件中还无法显示 IPA 的代码。该如何将IPA转换成标准的ASCII呢？

在 IPA to plain simple English translator 给出了这方面的提问。也就是将IPA转换成： American Heritage Dictionary uses 。

2.1 转换程序

ipa_converter 给出了一个将IPA转换成 SAMPA - computer readable phonetic alphabet

2.1.1 转换程序

import eng_to_ipa as p
import string

#------------------------------------------------------------
sampafile = r'D:\Temp\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

#------------------------------------------------------------
ipa = p.convert("Yes I am geeks, How are you.")

out = []
for c in ipa:
    if c in table: c= table[c]
    elif c not in string.printable: c = ''

    out.append(c)

printf(''.join(out))

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26

2.1.2 测试结果

ipa = p.convert("Yes I am geeks, How are you.")

  
 
  1

jEs aI {m giks, haU @r ju.

  
 
  1

ipa = p.convert("instead")

  
 
  1

%In"stEd

  
 
  1

2.2 gruut-ipa

从 Gruut IPA 安装软件包。

▲ 图2.1.1 SAMPA 对应表格

2.2.1 安装软件包

python -m pip install gruut-ipa

IPA	eSpeak	Sampa	Description
i	i	i	close front unrounded vowel
y	y	y	close front rounded vowel
ɨ	i	1	close central unrounded vowel
ʉ	u	}	close central rounded vowel
ɯ	u-	M	close back unrounded vowel
u	u	u	close back rounded vowel
ɪ	I	I	near-close near-front unrounded vowel
ʏ	I.	Y	near-close near-front rounded vowel
ʊ	U	U	near-close near-back rounded vowel
e	e	e	close-mid front unrounded vowel
ø	Y	2	close-mid front rounded vowel
ɘ	@	@	close-mid central unrounded vowel
ɵ	@.	8	close-mid central rounded vowel
ɤ	o-	7	close-mid back unrounded vowel
o	o	o	close-mid back rounded vowel
ɛ	E	E	open-mid front unrounded vowel
œ	W	9	open-mid front rounded vowel
ɜ	V	3	open-mid central unrounded vowel
ɞ	O	3	open-mid central rounded vowel
ʌ	V	V	open-mid back unrounded vowel
ɔ	O	O	open-mid back rounded vowel
æ	a	{	near-open front unrounded vowel
ɐ	V	6	near-open central unrounded vowel
a	a	a	open front unrounded vowel
ɶ	W	&	open front rounded vowel
ɑ	A	A	open back unrounded vowel
ɒ	A.	Q	open back rounded vowel
m	m	m	voiced bilabial nasal
ɱ	M	F	voiced labio-dental nasal
n	n	n	voiced alveolar nasal
ɳ	n.	n`	voiced retroflex nasal
ŋ	N	N	voiced velar nasal
ɴ	n	N	voiced uvular nasal
p	p	p	voiceless bilabial plosive
b	b	b	voiced bilabial plosive
t	t	t	voiceless alveolar plosive
d	d	d	voiced alveolar plosive
ʈ	t.	t`	voiceless retroflex plosive
ɖ	d.	d`	voiced retroflex plosive
c	c	c	voiceless palatal plosive
ɟ	J	J	voiced palatal plosive
k	k	k	voiceless velar plosive
ɡ	g	g	voiced velar plosive
g	g	g	voiced velar plosive
q	q	q	voiceless uvular plosive
ɢ	G	G	voiced uvular plosive
ʡ		>	voiceless pharyngeal plosive
ʔ	?	?	voiceless glottal plosive
p͡f	pf	pf	voiceless labio-dental affricate
b͡v	bv	bv	voiced dental affricate
t̪͡s	ts	t ds	voiceless dental affricate
t͡s	ts	ts	voiceless alveolar affricate
d͡z	dz	dz	voiced alveolar affricate
t͡ʃ	tS	tS	voiceless post-alveolar affricate
d͡ʒ	dZ	dZ	voiced post-alveolar affricate
ʈ͡ʂ	tS	ts`	voiceless retroflex affricate
ɖ͡ʐ	dz	dz`	voiced retroflex affricate
t͡ɕ	tS;	ts	voiceless palatal affricate
d͡ʑ	dZ;	dz	voiced palatal affricate
k͡x	k	k x	voiceless velar affricate
ɸ	F	p	voiceless bilabial fricative
β	B	B	voiced bilabial fricative
f	f	f	voiceless labio-dental fricative
v	v	v	voiced labio-dental fricative
θ	T	T	voiceless dental fricative
ð	D	D	voiced dental fricative
s	s	s	voiceless alveolar fricative
z	z	z	voiced alveolar fricative
ʃ	S	S	voiceless post-alveolar fricative
ʒ	Z	Z	voiced post-alveolar fricative
ʂ	s.	s`	voiceless retroflex fricative
ʐ	z.	z`	voiced palatal fricative
ç	C	C	voiceless palatal fricative
x	x	x	voiceless velar fricative
ɣ	Q	G	voiced velar fricative
χ	X	X	voiceless uvular fricative
ʁ	g	R	voiced uvular fricative
ħ	H	X	voiceless pharyngeal fricative
h	h	h	voiceless glottal fricative
ɦ	h<?>	h	voiced glottal fricative
w	w	w	voiced bilabial approximant
ʋ	v#	v	voiced labio-dental approximant
ɹ	r	r	voiced alveolar approximant
ɻ	r.	r `	voiced retroflex approximant
j	j	j	voiced palatal approximant
ɰ	Q	M	voiced velar approximant
ⱱ	ⱱ	ⱱ	voiced labio-dental flap
ɾ	*	4	voiced alveolar flap
ɽ	*.	r`	voiced retroflex flap
ʙ	b	B	voiced bilabial trill
r	r	r	voiced alveolar trill
ʀ	r	R	voiced uvular trill
l	l	l	voiced alveolar lateral-approximant
ɫ	l	5	voiced alveolar lateral-approximant
ɭ	l.	l`	voiced retroflex lateral-approximant
ʎ	l^	L	voiced palatal lateral-approximant
ʟ	L	L	voiced velar lateral-approximant
ə	@	@	schwa
ɚ	3	@`	r-coloured schwa
ɝ	3	@`	r-coloured schwa
ɹ̩	r-	r ̩	voiced alveolar approximant

这个软件包似乎只能通过命令行来调用。

2.3 SAMPA字符对应表格

下面表格从 sampa.cfg 拷贝粘贴过来。

A       ɑ       script a        open back unrounded, Cardinal 5, Eng. start
{       æ       ae ligature     near-open front unrounded, Eng. trap
6       ɐ       turned a        open schwa, Ger. besser
Q       ɒ       turned script a open back rounded, Eng. lot
E       ɛ       epsilon         open-mid front unrounded, C3, Fr. même
@       ə       turned e        schwa, Eng. banana
3       ɜ       rev. epsilon    long mid central, Eng. nurse
I       ɪ       small cap I     lax close front unrounded, Eng. kit
O       ɔ       turned c        open-mid back rounded, Eng. thought
2       ø       o-slash         close-mid front rounded, Fr. deux
9       œ       oe ligature     open-mid front rounded, Fr. neuf
&       ɶ       s.c. OE lig.    open front rounded
U       ʊ       upsilon         lax close back rounded, Eng. foot
}       ʉ       barred u        close central rounded, Swedish sju
V       ʌ       turned v        open-mid back unrounded, Eng. strut
Y       ʏ       small cap Y     lax [y], Ger. hübsch
B       β       beta            voiced bilabial fricative, Sp. cabo
C       ç       c-cedilla       voiceless palatal fricative, Ger. ich
D       ð       eth             voiced dental fricative, Eng. then
G       ɣ       gamma           voiced velar fricative, Sp. fuego
L       ʎ       turned y        palatal lateral, It. famiglia
J       ɲ       left-tail n     palatal nasal, Sp. año
N       ŋ       eng             velar nasal, Eng. thing
R       ʁ       inv. s.c. R     vd. uvular fric. or trill, Fr. roi
S       ʃ       esh             voiceless palatoalveolar fricative, Eng. ship
T       θ       theta           voiceless dental fricative, Eng. thin
H       ɥ       turned h        labial-palatal semivowel, Fr. huit
Z       ʒ       ezh (yogh)      vd. palatoalveolar fric., Eng. measure
?       ʔ       dotless ?       glottal stop, Ger. Verein, also Danish stød
:       ː       length mark     length mark
"       ˈ       vertical stroke primary stress *
%       ˌ       low vert. str.  secondary stress
s'      ʂ       Added for Russian support (ш)
s\      ɕ       Added for Russian support (щ)
s'      ʐ       Added for Russian support (ж)
z\      ʑ       Added for Russian support (ж)
1       ɨ       Added for Russian support (и, sometimes ы)
8       ɵ       Added for Russian support (ё)
_j      ʲ       Added for Russian support (ь)

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39

将上述文本存储粘贴在 notepad中，并以 ‘utf-8’ 格式进行存储。

#!/usr/bin/python

from __future__ import print_function
from string import printable
import sys, locale

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser(description='Converts IPA input to ASCII output. Does not handle certain cases: nasals, tones, syllabic consonants, and non-SAMPA representable phonetic symbols. Writes to stdout.')
    parser.add_argument('-c','--config',help='Name of the configuration table. Defaults to "sampa.cfg" in the working directory.',default='sampa.cfg')
    parser.add_argument('source',help='IPA symbols to convert.')
    args = parser.parse_args(sys.argv[1:])
    config = args.config
    source = args.source.decode(locale.getpreferredencoding())
    print(source)

    config = open(config,'r')
    table = {}
    for line in config:
        line=line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0].decode('utf-8')
        ipa_symb = row[1].decode('utf-8')
        table[ipa_symb] = sampa_symb

    out = []
    for c in source:
        if c in table: c = table[c]
        elif c not in printable: c = ''
        out.append(c)
    print(''.join(out))

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32

§03 改造程序

3.1 改造原料

sampa.cfg ：对应反映表格，是一个 “utf-8” 格式的文本文件。
代码：

import string

sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

def ipa2sampa(ipa):
    out = []
    for c in ipa:
        if c in table: c = table[c]
        elif c not in string.printable: c = ''

        out.append(c)

    return ''.join(out)

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22

3.2 改造小程序

主要改造的小程序包括：

cal
cdtm

3.2.1 增加IPA-SAMPA

#------------------------------------------------------------
import eng_to_ipa as ipa
import string

sampafile = r'D:\Python\Cmd\sampa.cfg'
table = {}
with open(sampafile, 'r', encoding='utf8') as f:
    for line in f:
        line = line.strip()
        if line == '': continue
        row = line.split()
        sampa_symb = row[0]
        ipa_symb = row[1]
        table[ipa_symb] = sampa_symb

def ipa2sampa(ipa):
    out = []
    for c in ipa:
        if c in table: c = table[c]
        elif c not in string.printable: c = ''

        out.append(c)

    return ''.join(out)

#------------------------------------------------------------
import json
import requests

def translate(word):
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
    key = {
        'type': "AUTO",
        'i': word,
        "doctype": "json",
        "version": "2.1",
        "keyfrom": "fanyi.web",
        "ue": "UTF-8",
        "action": "FY_BY_CLICKBUTTON",
        "typoResult": "true"
    }
    response = requests.post(url, data=key)
    if response.status_code == 200:
        return response.text
    else:
        return None

def get_reuslt(repsonse):
    result = json.loads(repsonse)

    origin = result['translateResult'][0][0]['src']
    target = result['translateResult'][0][0]['tgt']

    originflag = 1
    targetflag = 1

    for c in origin:
        if ord(c) >= 0x80:
            originflag = 0
            break

    for c in target:
        if ord(c) >= 0x80:
            targetflag = 0
            break

    originsampa = ''
    targetsampa = ''

    if originflag > 0:
        originsampa = "/%s/"%ipa2sampa(ipa.convert(origin))
        for c in originsampa:
            if c not in string.printable:
                originsampa = ''
                break

    if targetflag > 0:
        targetsampa = "/%s/"%ipa2sampa(ipa.convert(target))
        for c in targetsampa:
            if c not in string.printable:
                targetsampa = ''
                break

    printf ("%s%s --> %s%s" %(result['translateResult'][0][0]['src'], originsampa,
                              result['translateResult'][0][0]['tgt'], targetsampa))

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85

3.2.2 允许结果

靠谱 --> By spectrum/baI "spEktr@m/
command/k@"m{nd/ --> 命令

  
 
  1
  2

python --> python
input/"In%pUt/ --> 输入
python --> python
cmd/cmd*/ --> cmd/cmd*/

append/@"pEnd/ --> 附加

input/"In%pUt/ --> 输入
python --> python
command/k@"m{nd/ --> 命令

命令 --> The command/D@ k@"m{nd/
事情 --> things/TINz/
input/"In%pUt/ --> 输入

backup/"b{%k@p/ --> 备份

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16

※ 总结 ※

为了获得英文单词的读音并进行显示，使用 eng_to_ipa 或者单词的 IPA，然后通过自行编写的转换程序，将IPA 转换成 sampa并进行显示。

■ 相关文献链接:

● 相关图表链接:

图1.3.1 python -m pip install eng-to-ipa
图1.3.2 音标对应的显示
图1.3.3 音标显示
图1.3.4 音标显示
图2.1.1 SAMPA 对应表格

文章来源: zhuoqing.blog.csdn.net，作者：卓晴，版权归原作者所有，如需转载，请联系作者。

原文链接：zhuoqing.blog.csdn.net/article/details/122996325

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

如何获得英语单词的发音？增加 IPA-SAMPA

§01 单词发音

1.1 文字转语音

1.1.1 安装软件包

（1）安装 pypiwin32

（2）安装 pyttsx3

1.1.2 测试软件包

Ⅰ.汉语读音

Ⅱ.英语读音

1.2 wordnet

1.2.1 测试 wordnet

1.3 eng-to-ipa

1.3.1 安装软件包

1.3.2 测试软件包

1.3.3 get_rhymes

1.3.4 如何显示IPA？

（1）安装 IPAPY

§02 显示IPA

2.1 转换程序

2.1.1 转换程序

2.1.2 测试结果

2.2 gruut-ipa

2.2.1 安装软件包

2.3 SAMPA字符对应表格

§03 改造程序

3.1 改造原料

3.2 改造小程序

3.2.1 增加IPA-SAMPA

3.2.2 允许结果

※ 总结 ※

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

如何获得英语单词的发音？增加 IPA-SAMPA

§01 单词发音

1.1 文字转语音

1.1.1 安装软件包

（1）安装 pypiwin32

（2）安装 pyttsx3

1.1.2 测试软件包

Ⅰ.汉语读音

Ⅱ.英语读音

1.2 wordnet

1.2.1 测试 wordnet

1.3 eng-to-ipa

1.3.1 安装软件包

1.3.2 测试软件包

1.3.3 get_rhymes

1.3.4 如何显示IPA？

（1）安装 IPAPY

§02 显示IPA

2.1 转换程序

2.1.1 转换程序

2.1.2 测试结果

2.2 gruut-ipa

2.2.1 安装软件包

2.3 SAMPA字符对应表格

§03 改造程序

3.1 改造原料

3.2 改造小程序

3.2.1 增加IPA-SAMPA

3.2.2 允许结果

※ 总 结 ※

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品

※ 总结 ※