abydos.phonetic package¶
abydos.phonetic.
The phonetic package includes classes for phonetic algorithms, including:
Robert C. Russell's Index (
RussellIndex
)American Soundex (
Soundex
)Refined Soundex (
RefinedSoundex
)Daitch-Mokotoff Soundex (
DaitchMokotoff
)NYSIIS (
NYSIIS
)Match Rating Algorithm (
phonetic.MRA
)Metaphone (
Metaphone
)Double Metaphone (
DoubleMetaphone
)Caverphone (
Caverphone
)Alpha Search Inquiry System (
AlphaSIS
)Fuzzy Soundex (
FuzzySoundex
)Phonex (
Phonex
)Phonem (
Phonem
)Phonix (
Phonix
)PHONIC (
PHONIC
)Standardized Phonetic Frequency Code (
SPFC
)Statistics Canada (
StatisticsCanada
)LEIN (
LEIN
)Roger Root (
RogerRoot
)Eudex phonetic hash (
phonetic.Eudex
)Parmar-Kumbharana (
ParmarKumbharana
)Davidson's Consonant Code (
Davidson
)SoundD (
SoundD
)PSHP Soundex/Viewex Coding (
PSHPSoundexFirst
andPSHPSoundexLast
)Dolby Code (
Dolby
)NRL English-to-phoneme (
NRL
)Ainsworth grapheme to phoneme (
Ainsworth
)Beider-Morse Phonetic Matching (
BeiderMorse
)
There are also language-specific phonetic algorithms for German:
For French:
FONEM (
FONEM
)an early version of Henry Code (
HenryEarly
)
For Spanish:
Phonetic Spanish (
PhoneticSpanish
)Spanish Metaphone (
SpanishMetaphone
)
For Swedish:
For Norwegian:
Norphone (
Norphone
)
For Brazilian Portuguese:
SoundexBR (
SoundexBR
)
And there are some hybrid phonetic algorithms that employ multiple underlying phonetic algorithms:
Oxford Name Compression Algorithm (ONCA) (
ONCA
)MetaSoundex (
MetaSoundex
)
Each class has an encode
method to return the phonetically encoded string.
Classes for which encode
returns a numeric value generally have an
encode_alpha
method that returns an alphabetic version of the phonetic
encoding, as demonstrated below:
>>> rus = RussellIndex()
>>> rus.encode('Abramson')
'128637'
>>> rus.encode_alpha('Abramson')
'ABRMCN'
-
class
abydos.phonetic.
Ainsworth
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Ainsworth's grapheme to phoneme converter.
Based on the ruleset listed in [Ain73].
New in version 0.4.1.
-
encode
(word: str) → str[source]¶ Return the phonemic representation of a word.
- Parameters
word (str) -- The word to transform
- Returns
The phonemic representation in IPA
- Return type
str
Examples
>>> pe = Ainsworth() >>> pe.encode('Christopher') 'tʃrɪstofɜ' >>> pe.encode('Niall') 'nɪɔl' >>> pe.encode('Smith') 'smɪð' >>> pe.encode('Schmidt') 'skmɪdt'
New in version 0.4.1.
-
-
class
abydos.phonetic.
AlphaSIS
(max_length: int = 14)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Alpha-SIS.
The Alpha Search Inquiry System code is defined in [Cor73]. This implementation is based on the description in [MKTM77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 14)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the IBM Alpha Search Inquiry System code for a word.
A collection is necessary as the return type since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.
- Parameters
word (str) -- The word to transform
- Returns
The Alpha-SIS value
- Return type
str
Examples
>>> pe = AlphaSIS() >>> pe.encode('Christopher') '06401840000000,07040184000000,04018400000000' >>> pe.encode('Niall') '02500000000000' >>> pe.encode('Smith') '03100000000000' >>> pe.encode('Schmidt') '06310000000000'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Alpha-SIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Alpha-SIS value
- Return type
str
Examples
>>> pe = AlphaSIS() >>> pe.encode_alpha('Christopher') 'JRSTFR,KSRSTFR,RSTFR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'MT' >>> pe.encode_alpha('Schmidt') 'JMT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
-
class
abydos.phonetic.
BeiderMorse
(language_arg: Union[str, int] = 0, name_mode: str = 'gen', match_mode: str = 'approx', concat: bool = False, filter_langs: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Beider-Morse Phonetic Matching.
The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.
New in version 0.3.6.
Initialize BeiderMorse instance.
- Parameters
language_arg (str or int) --
The language of the term; supported values include:
any
arabic
cyrillic
czech
dutch
english
french
german
greek
greeklatin
hebrew
hungarian
italian
latvian
polish
portuguese
romanian
russian
spanish
turkish
name_mode (str) --
The name mode of the algorithm:
gen
-- general (default)ash
-- Ashkenazisep
-- Sephardic
match_mode (str) -- Matching mode:
approx
orexact
concat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Beider-Morse Phonetic Matching encoding(s) of a term.
- Parameters
word (str) -- The word to transform
- Returns
The Beider-Morse phonetic value(s)
- Return type
tuple
- Raises
ValueError -- Unknown language
Examples
>>> pe = BeiderMorse() >>> pe.encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir', 'xristopi', 'xritopir', 'xritopi', 'xristofi', 'xritofir', 'xritofi', 'tzristopir', 'tzristofir', 'zristopir', 'zristopi', 'zritopir', 'zritopi', 'zristofir', 'zristofi', 'zritofir', 'zritofi'] >>> pe.encode('Niall') 'nial,niol' >>> pe.encode('Smith') 'zmit' >>> pe.encode('Schmidt') 'zmit,stzmit'
>>> BeiderMorse(language_arg='German').encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir'] >>> BeiderMorse(language_arg='English').encode( ... 'Christopher').split(',') ['tzristofir', 'tzrQstofir', 'tzristafir', 'tzrQstafir', 'xristofir', 'xrQstofir', 'xristafir', 'xrQstafir'] >>> BeiderMorse(language_arg='German', ... name_mode='ash').encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir']
>>> BeiderMorse(language_arg='German', ... match_mode='exact').encode('Christopher') 'xriStopher,xriStofer,xristopher,xristofer'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made comma-sepated instead of space-separated output
-
class
abydos.phonetic.
Caverphone
(version: int = 2)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Caverphone.
A description of version 1 of the algorithm can be found in [Hoo02].
A description of version 2 of the algorithm can be found in [Hoo04].
New in version 0.3.6.
Initialize Caverphone instance.
- Parameters
version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode('Christopher') 'KRSTFA1111' >>> pe.encode('Niall') 'NA11111111' >>> pe.encode('Smith') 'SMT1111111' >>> pe.encode('Schmidt') 'SKMT111111'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode('Christopher') 'KRSTF1' >>> pe_1.encode('Niall') 'N11111' >>> pe_1.encode('Smith') 'SMT111' >>> pe_1.encode('Schmidt') 'SKMT11'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode_alpha('Christopher') 'KRSTFA' >>> pe.encode_alpha('Niall') 'NA' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'SKMT'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode_alpha('Christopher') 'KRSTF' >>> pe_1.encode_alpha('Niall') 'N' >>> pe_1.encode_alpha('Smith') 'SMT' >>> pe_1.encode_alpha('Schmidt') 'SKMT'
New in version 0.4.0.
-
class
abydos.phonetic.
DaitchMokotoff
(max_length: int = 6, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Daitch-Mokotoff Soundex.
Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.
New in version 0.3.6.
Initialize DaitchMokotoff instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> pe.encode('Christopher') '494379,594379' >>> pe.encode('Niall') '680000' >>> pe.encode('Smith') '463000' >>> pe.encode('Schmidt') '463000'
>>> DaitchMokotoff(max_length=20, ... zero_pad=False).encode('The quick brown fox') '35457976754,3557976754'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> pe.encode_alpha('Christopher') 'SRSTPR,KRSTPR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
>>> DaitchMokotoff(max_length=20, ... zero_pad=False).encode_alpha('The quick brown fox') 'TKSKPRPNPKS,TKKPRPNPKS'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
-
class
abydos.phonetic.
Davidson
(omit_fname: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Davidson Consonant Code.
This is based on the name compression system described in [Dav62].
[Dol70] identifies this as having been the name compression algorithm used by SABRE.
New in version 0.3.6.
Initialize Davidson instance.
- Parameters
omit_fname (bool) -- Set to True to completely omit the first character of the first name
New in version 0.4.0.
-
encode
(lname: str, fname: str = '.') → str[source]¶ Return Davidson's Consonant Code.
- Parameters
lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.
- Returns
Davidson's Consonant Code
- Return type
str
Example
>>> pe = Davidson() >>> pe.encode('Gough') 'G .' >>> pe.encode('pneuma') 'PNM .' >>> pe.encode('knight') 'KNGT.' >>> pe.encode('trice') 'TRC .' >>> pe.encode('judge') 'JDG .' >>> pe.encode('Smith', 'James') 'SMT J' >>> pe.encode('Wasserman', 'Tabitha') 'WSRMT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
Dolby
(max_length: int = - 1, keep_vowels: bool = False, vowel_char: str = '*')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Dolby Code.
This follows "A Spelling Equivalent Abbreviation Algorithm For Personal Names" from [Dol70] and [C+69].
New in version 0.3.6.
Initialize Dolby instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode('Hansen') 'H*NSN' >>> pe.encode('Larsen') 'L*RSN' >>> pe.encode('Aagaard') '*GR' >>> pe.encode('Braaten') 'BR*DN' >>> pe.encode('Sandvik') 'S*NVK'
>>> pe_6 = Dolby(max_length=6) >>> pe_6.encode('Hansen') 'H*NS*N' >>> pe_6.encode('Larsen') 'L*RS*N' >>> pe_6.encode('Aagaard') '*G*R ' >>> pe_6.encode('Braaten') 'BR*D*N' >>> pe_6.encode('Sandvik') 'S*NF*K'
>>> pe.encode('Smith') 'SM*D' >>> pe.encode('Waters') 'W*DRS' >>> pe.encode('James') 'J*MS' >>> pe.encode('Schmidt') 'SM*D' >>> pe.encode('Ashcroft') '*SKRFD'
>>> pe_6.encode('Smith') 'SM*D ' >>> pe_6.encode('Waters') 'W*D*RS' >>> pe_6.encode('James') 'J*M*S ' >>> pe_6.encode('Schmidt') 'SM*D ' >>> pe_6.encode('Ashcroft') '*SKRFD'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode_alpha('Hansen') 'HANSN' >>> pe.encode_alpha('Larsen') 'LARSN' >>> pe.encode_alpha('Aagaard') 'AGR' >>> pe.encode_alpha('Braaten') 'BRADN' >>> pe.encode_alpha('Sandvik') 'SANVK'
New in version 0.4.0.
-
class
abydos.phonetic.
DoubleMetaphone
(max_length: int = - 1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Double Metaphone.
Based on Lawrence Philips' (Visual) C++ code from 1999 [Phi00].
New in version 0.3.6.
Initialize DoubleMetaphone instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Double Metaphone value(s)
- Return type
str
Examples
>>> pe = DoubleMetaphone() >>> pe.encode('Christopher') 'KRSTFR,' >>> pe.encode('Niall') 'NL,' >>> pe.encode('Smith') 'SM0,XMT' >>> pe.encode('Schmidt') 'XMT,SMT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Double Metaphone value(s)
- Return type
str
Examples
>>> pe = DoubleMetaphone() >>> pe.encode_alpha('Christopher') 'KRSTFR,' >>> pe.encode_alpha('Niall') 'NL,' >>> pe.encode_alpha('Smith') 'SMÞ,XMT' >>> pe.encode_alpha('Schmidt') 'XMT,SMT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
-
class
abydos.phonetic.
Eudex
(max_length: int = 8)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Eudex hash.
This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].
Further details can be found at [Tic16].
New in version 0.3.6.
Initialize Eudex instance.
- Parameters
max_length (int) -- The length in bits of the code returned (default 8)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the eudex phonetic hash of a word.
- Parameters
word (str) -- The word to transform
- Returns
The eudex hash
- Return type
str
Examples
>>> pe = Eudex() >>> pe.encode('Colin') '432345564238053650' >>> pe.encode('Christopher') '433648490138894409' >>> pe.encode('Niall') '648518346341351840' >>> pe.encode('Smith') '720575940412906756' >>> pe.encode('Schmidt') '720589151732307997'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str instead of int
-
class
abydos.phonetic.
FONEM
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
FONEM.
FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].
Guillaume Plique's Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the FONEM code of a word.
- Parameters
word (str) -- The word to transform
- Returns
The FONEM code
- Return type
str
Examples
>>> pe = FONEM() >>> pe.encode('Marchand') 'MARCHEN' >>> pe.encode('Beaulieu') 'BOLIEU' >>> pe.encode('Beaumont') 'BOMON' >>> pe.encode('Legrand') 'LEGREN' >>> pe.encode('Pelletier') 'PELETIER'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
FuzzySoundex
(max_length: int = 5, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Fuzzy Soundex.
Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].
New in version 0.3.6.
Initialize FuzzySoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode('Christopher') 'K6931' >>> pe.encode('Niall') 'N4000' >>> pe.encode('Smith') 'S5300' >>> pe.encode('Smith') 'S5300'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode_alpha('Christopher') 'KRSTP' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
class
abydos.phonetic.
Haase
(primary_only: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Haase Phonetik.
Based on the algorithm described at [Pra15].
Based on the original [HH00].
New in version 0.3.6.
Initialize Haase instance.
- Parameters
primary_only (bool) -- If True, only the primary code is returned
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Haase Phonetik (numeric output) code for a word.
While the output code is numeric, it is nevertheless a str.
- Parameters
word (str) -- The word to transform
- Returns
The Haase Phonetik value as a numeric string
- Return type
str
Examples
>>> pe = Haase() >>> pe.encode('Joachim') '9496' >>> pe.encode('Christoph') '4798293,8798293' >>> pe.encode('Jörg') '974' >>> pe.encode('Smith') '8692' >>> pe.encode('Schmidt') '8692,4692'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Haase Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Haase Phonetik value
- Return type
str
Examples
>>> pe = Haase() >>> pe.encode_alpha('Joachim') 'AKAN' >>> pe.encode_alpha('Christoph') 'KRASTAF,SRASTAF' >>> pe.encode_alpha('Jörg') 'ARK' >>> pe.encode_alpha('Smith') 'SNAT' >>> pe.encode_alpha('Schmidt') 'SNAT,KNAT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
-
class
abydos.phonetic.
HenryEarly
(max_length: int = 3)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Henry code, early version.
The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].
New in version 0.3.6.
Initialize HenryEarly instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 3)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Calculate the early version of the Henry code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The early Henry code
- Return type
str
Examples
>>> pe = HenryEarly() >>> pe.encode('Marchand') 'MRC' >>> pe.encode('Beaulieu') 'BL' >>> pe.encode('Beaumont') 'BM' >>> pe.encode('Legrand') 'LGR' >>> pe.encode('Pelletier') 'PLT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
Koelner
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Kölner Phonetik.
Based on the algorithm defined by [Pos69].
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Kölner Phonetik (numeric output) code for a word.
While the output code is numeric, it is still a str because 0s can lead the code.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as a numeric string
- Return type
str
Example
>>> pe = Koelner() >>> pe.encode('Christopher') '478237' >>> pe.encode('Niall') '65' >>> pe.encode('Smith') '862' >>> pe.encode('Schmidt') '862' >>> pe.encode('Müller') '657' >>> pe.encode('Zimmermann') '86766'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the Kölner Phonetik (alphabetic output) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as an alphabetic string
- Return type
str
Examples
>>> pe = Koelner() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Müller') 'NLR' >>> pe.encode_alpha('Zimmermann') 'SNRNN'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
LEIN
(max_length: int = 4, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
LEIN code.
This is Michigan LEIN (Law Enforcement Information Network) name coding, described in [MKTM77].
New in version 0.3.6.
Initialize LEIN instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode('Christopher') 'C351' >>> pe.encode('Niall') 'N300' >>> pe.encode('Smith') 'S210' >>> pe.encode('Schmidt') 'S521'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode_alpha('Christopher') 'CLKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
-
class
abydos.phonetic.
MRA
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Western Airlines Surname Match Rating Algorithm.
A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the MRA personal numeric identifier (PNI) for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MRA PNI
- Return type
str
Examples
>>> pe = MRA() >>> pe.encode('Christopher') 'CHRPHR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHMDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
MetaSoundex
(lang: str = 'en')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
MetaSoundex.
This is based on [KV17]. Only English ('en') and Spanish ('es') languages are supported, as in the original.
New in version 0.3.6.
Initialize MetaSoundex instance.
- Parameters
lang (str) -- Either
en
for English ores
for Spanish
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode('Smith') '4500' >>> pe.encode('Waters') '7362' >>> pe.encode('James') '1520' >>> pe.encode('Schmidt') '4530' >>> pe.encode('Ashcroft') '0261'
>>> pe = MetaSoundex(lang='es') >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6754'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode_alpha('Smith') 'SN' >>> pe.encode_alpha('Waters') 'WTRK' >>> pe.encode_alpha('James') 'JNK' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKRP'
>>> pe = MetaSoundex(lang='es') >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NKLS'
New in version 0.4.0.
-
class
abydos.phonetic.
Metaphone
(max_length: int = - 1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Metaphone.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Metaphone code for a word.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
- Parameters
word (str) -- The word to transform
- Returns
The Metaphone value
- Return type
str
Examples
>>> pe = Metaphone() >>> pe.encode('Christopher') 'KRSTFR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SM0' >>> pe.encode('Schmidt') 'SKMTT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
NRL
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Naval Research Laboratory English-to-phoneme encoder.
This is defined by [EJMS76].
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Naval Research Laboratory phonetic encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The NRL phonetic encoding
- Return type
str
Examples
>>> pe = NRL() >>> pe.encode('the') 'DHAX' >>> pe.encode('round') 'rAWnd' >>> pe.encode('quick') 'kwIHk' >>> pe.encode('eaten') 'IYtEHn' >>> pe.encode('Smith') 'smIHTH' >>> pe.encode('Larsen') 'lAArsEHn'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
NYSIIS
(max_length: int = 6, modified: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
NYSIIS Code.
The New York State Identification and Intelligence System algorithm is defined in [Taf70].
The modified version of this algorithm is described in Appendix B of [LA77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the NYSIIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The NYSIIS value
- Return type
str
Examples
>>> pe = NYSIIS() >>> pe.encode('Christopher') 'CRASTA' >>> pe.encode('Niall') 'NAL' >>> pe.encode('Smith') 'SNAT' >>> pe.encode('Schmidt') 'SNAD'
>>> NYSIIS(max_length=-1).encode('Christopher') 'CRASTAFAR'
>>> pe_8m = NYSIIS(max_length=8, modified=True) >>> pe_8m.encode('Christopher') 'CRASTAFA' >>> pe_8m.encode('Niall') 'NAL' >>> pe_8m.encode('Smith') 'SNAT' >>> pe_8m.encode('Schmidt') 'SNAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
Norphone
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Norphone.
The reference implementation by Lars Marius Garshol is available in [Gar15].
Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the "not implemented" rules from the above file's rule set.
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Norphone code.
- Parameters
word (str) -- The word to transform
- Returns
The Norphone code
- Return type
str
Examples
>>> pe = Norphone() >>> pe.encode('Hansen') 'HNSN' >>> pe.encode('Larsen') 'LRSN' >>> pe.encode('Aagaard') 'ÅKRT' >>> pe.encode('Braaten') 'BRTN' >>> pe.encode('Sandvik') 'SNVK'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
ONCA
(max_length: int = 4, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Oxford Name Compression Algorithm (ONCA).
This is the Oxford Name Compression Algorithm, based on [Gil97].
I can find no complete description of the "anglicised version of the NYSIIS method" identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.
New in version 0.3.6.
Initialize ONCA instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Oxford Name Compression Algorithm (ONCA) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic ONCA code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode_alpha('Christopher') 'CRKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
class
abydos.phonetic.
PHONIC
(max_length: int = 5, zero_pad: bool = True, extended: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PHONIC code.
PHONIC is a Soundex-like algorithm defined in [Taf70].
New in version 0.4.1.
Initialize PHONIC instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 5)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
extended (bool) -- If True, this uses Taft's 'Extended PHONIC coding' mode, which simply omits the first character of the code.
New in version 0.4.1.
-
encode
(word: str) → str[source]¶ Return the PHONIC code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The PHONIC code
- Return type
str
Examples
>>> pe = PHONIC() >>> pe.encode('Christopher') 'C6401' >>> pe.encode('Niall') 'N2500' >>> pe.encode('Smith') 'S0310' >>> pe.encode('Schmidt') 'S0631'
New in version 0.4.1.
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic PHONIC code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic PHONIC value
- Return type
str
Examples
>>> pe = PHONIC() >>> pe.encode_alpha('Christopher') 'JRSTF' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'SJMT'
New in version 0.4.1.
-
class
abydos.phonetic.
PSHPSoundexFirst
(max_length: int = 4, german: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PSHP Soundex/Viewex Coding of a first name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate class,
PSHPSoundexLast
is used for last names.New in version 0.3.6.
Initialize PSHPSoundexFirst instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
-
encode
(fname: str) → str[source]¶ Calculate the PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W352' >>> pe.encode('James') 'J700' >>> pe.encode('Schmidt') 'S500' >>> pe.encode('Ashcroft') 'A220' >>> pe.encode('John') 'J500' >>> pe.encode('Colin') 'K400' >>> pe.encode('Niall') 'N400' >>> pe.encode('Sally') 'S400' >>> pe.encode('Jane') 'J500'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(fname: str) → str[source]¶ Calculate the alphabetic PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The alphabetic PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTNK' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SN' >>> pe.encode_alpha('Ashcroft') 'AKK' >>> pe.encode_alpha('John') 'JN' >>> pe.encode_alpha('Colin') 'KL' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Sally') 'SL' >>> pe.encode_alpha('Jane') 'JN'
New in version 0.4.0.
-
class
abydos.phonetic.
PSHPSoundexLast
(max_length: int = 4, german: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PSHP Soundex/Viewex Coding of a last name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate function,
PSHPSoundexFirst
is used for first names.New in version 0.3.6.
Initialize PSHPSoundexLast instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
-
encode
(lname: str) → str[source]¶ Calculate the PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W350' >>> pe.encode('James') 'J500' >>> pe.encode('Schmidt') 'S530' >>> pe.encode('Ashcroft') 'A225'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(lname: str) → str[source]¶ Calculate the alphabetic PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP alphabetic Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTN' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKKN'
New in version 0.4.0.
-
class
abydos.phonetic.
ParmarKumbharana
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Parmar-Kumbharana code.
This is based on the phonetic algorithm proposed in [PK14].
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Parmar-Kumbharana encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Parmar-Kumbharana encoding
- Return type
str
Examples
>>> pe = ParmarKumbharana() >>> pe.encode('Gough') 'GF' >>> pe.encode('pneuma') 'NM' >>> pe.encode('knight') 'NT' >>> pe.encode('trice') 'TRS' >>> pe.encode('judge') 'JJ'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
Phonem
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonem.
Phonem is defined in [GM88].
This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].
Phonem is intended chiefly for German names/words.
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Phonem code for a word.
- Parameters
word (str) --
word to transform (The) --
- Returns
The Phonem value
- Return type
str
Examples
>>> pe = Phonem() >>> pe.encode('Christopher') 'CRYSDOVR' >>> pe.encode('Niall') 'NYAL' >>> pe.encode('Smith') 'SMYD' >>> pe.encode('Schmidt') 'CMYD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
Phonet
(mode: int = 1, lang: str = 'de')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonet code.
phonet ("Hannoveraner Phonetik") was developed by Jörg Michael and documented in [Mic99].
This is a port of Jesper Zedlitz's code, which is licensed LGPL [Zed15].
That is, in turn, based on Michael's C code, which is also licensed LGPL [Mic07].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
mode (int) -- The ponet variant to employ (1 or 2)
lang (str) --
de
(default) for German,none
for no language
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the phonet code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The phonet value
- Return type
str
Examples
>>> pe = Phonet() >>> pe.encode('Christopher') 'KRISTOFA' >>> pe.encode('Niall') 'NIAL' >>> pe.encode('Smith') 'SMIT' >>> pe.encode('Schmidt') 'SHMIT'
>>> pe2 = Phonet(mode=2) >>> pe2.encode('Christopher') 'KRIZTUFA' >>> pe2.encode('Niall') 'NIAL' >>> pe2.encode('Smith') 'ZNIT' >>> pe2.encode('Schmidt') 'ZNIT'
>>> pe_none = Phonet(lang='none') >>> pe_none.encode('Christopher') 'CHRISTOPHER' >>> pe_none.encode('Niall') 'NIAL' >>> pe_none.encode('Smith') 'SMITH' >>> pe_none.encode('Schmidt') 'SCHMIDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
PhoneticSpanish
(max_length: int = - 1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PhoneticSpanish.
This follows the coding described in [AmonME12] and [delPAngelesEGGM15].
New in version 0.3.6.
Initialize PhoneticSpanish instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6454'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NSLS'
New in version 0.4.0.
-
class
abydos.phonetic.
Phonex
(max_length: int = 4, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonex code.
Phonex is an algorithm derived from Soundex, defined in [LR96].
New in version 0.3.6.
Initialize Phonex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Schmidt') 'S253' >>> pe.encode('Smith') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode_alpha('Christopher') 'CRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SSNT'
New in version 0.4.0.
-
class
abydos.phonetic.
Phonix
(max_length: int = 4, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonix code.
Phonix is a Soundex-like algorithm defined in [Gad90].
This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]
New in version 0.3.6.
Initialize Phonix instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode('Christopher') 'K683' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode_alpha('Christopher') 'KRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
class
abydos.phonetic.
RefinedSoundex
(max_length: int = - 1, zero_pad: bool = False, retain_vowels: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Refined Soundex.
This is Soundex, but with more character classes. It was defined at [Boy98].
New in version 0.3.6.
Initialize RefinedSoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode('Christopher') 'C93619' >>> pe.encode('Niall') 'N7' >>> pe.encode('Smith') 'S86' >>> pe.encode('Schmidt') 'S386'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode_alpha('Christopher') 'CRKTPR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
-
class
abydos.phonetic.
RethSchek
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Reth-Schek Phonetik.
This algorithm is proposed in [vonRethS77].
Since I couldn't secure a copy of that document (maybe I'll look for it next time I'm in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):
Rules that are unclear:
Should 'C' become 'G' or 'Z'? (PPRL has both, 'Z' rule blocked)
Should 'CC' become 'G'? (PPRL has blocked 'CK' that may be typo)
Should 'TUI' -> 'ZUI' rule exist? (PPRL has rule, but I can't think of a German word with '-tui-' in it.)
Should we really change 'SCH' -> 'CH' and then 'CH' -> 'SCH'?
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return Reth-Schek Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Reth-Schek Phonetik code
- Return type
str
Examples
>>> pe = RethSchek() >>> pe.encode('Joachim') 'JOAGHIM' >>> pe.encode('Christoph') 'GHRISDOF' >>> pe.encode('Jörg') 'JOERG' >>> pe.encode('Smith') 'SMID' >>> pe.encode('Schmidt') 'SCHMID'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
RogerRoot
(max_length: int = 5, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Roger Root code.
This is Roger Root name coding, described in [MKTM77].
New in version 0.3.6.
Initialize RogerRoot instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode('Christopher') '06401' >>> pe.encode('Niall') '02500' >>> pe.encode('Smith') '00310' >>> pe.encode('Schmidt') '06310'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode_alpha('Christopher') 'JRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'JMT'
New in version 0.4.0.
-
class
abydos.phonetic.
RussellIndex
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Russell Index.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
New in version 0.3.6.
-
encode
(word: str) → str[source]¶ Return the Russell Index (integer output) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe.encode('Christopher') '3813428' >>> pe.encode('Niall') '715' >>> pe.encode('Smith') '3614' >>> pe.encode('Schmidt') '3614'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str
-
encode_alpha
(word: str) → str[source]¶ Return the Russell Index (alphabetic output) for the word.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value as an alphabetic string
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe.encode_alpha('Christopher') 'CRACDBR' >>> pe.encode_alpha('Niall') 'NAL' >>> pe.encode_alpha('Smith') 'CMAD' >>> pe.encode_alpha('Schmidt') 'CMAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.phonetic.
SPFC
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Standardized Phonetic Frequency Code (SPFC).
Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].
New in version 0.3.6.
-
encode
(word: Union[str, Sequence[str]]) → str[source]¶ Return the Standardized Phonetic Frequency Code (SPFC) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SPFC value
- Return type
str
- Raises
AttributeError -- Word attribute must be a string with a space or period dividing the first and last names or a tuple/list consisting of the first and last names
Examples
>>> pe = SPFC() >>> pe.encode('Christopher Smith') '01160' >>> pe.encode('Christopher Schmidt') '01160' >>> pe.encode('Niall Smith') '01660' >>> pe.encode('Niall Schmidt') '01660'
>>> pe.encode('L.Smith') '01960' >>> pe.encode('R.Miller') '65490'
>>> pe.encode(('L', 'Smith')) '01960' >>> pe.encode(('R', 'Miller')) '65490'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic SPFC of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SPFC value
- Return type
str
Examples
>>> pe = SPFC() >>> pe.encode_alpha('Christopher Smith') 'SDCMS' >>> pe.encode_alpha('Christopher Schmidt') 'SDCMS' >>> pe.encode_alpha('Niall Smith') 'SDMMS' >>> pe.encode_alpha('Niall Schmidt') 'SDMMS'
>>> pe.encode_alpha('L.Smith') 'SDEMS' >>> pe.encode_alpha('R.Miller') 'EROES'
>>> pe.encode_alpha(('L', 'Smith')) 'SDEMS' >>> pe.encode_alpha(('R', 'Miller')) 'EROES'
New in version 0.4.0.
-
-
class
abydos.phonetic.
SfinxBis
(max_length: int = - 1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SfinxBis code.
SfinxBis is a Soundex-like algorithm defined in [Axe09].
This implementation follows the reference implementation: [Sjoo09].
SfinxBis is intended chiefly for Swedish names.
New in version 0.3.6.
Initialize SfinxBis instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The SfinxBis value
- Return type
str
Examples
>>> pe = SfinxBis() >>> pe.encode('Christopher') 'K68376' >>> pe.encode('Niall') 'N4' >>> pe.encode('Smith') 'S53' >>> pe.encode('Schmidt') 'S53'
>>> pe.encode('Johansson') 'J585' >>> pe.encode('Sjöberg') '#162'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SfinxBis value
- Return type
str
Examples
>>> pe = SfinxBis() >>> pe.encode_alpha('Christopher') 'KRSTFR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
>>> pe.encode_alpha('Johansson') 'JNSN' >>> pe.encode_alpha('Sjöberg') 'ŠPRK'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
-
class
abydos.phonetic.
SoundD
(max_length: int = 4)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SoundD code.
SoundD is defined in [VB12].
New in version 0.3.6.
Initialize SoundD instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode('Gough') '2000' >>> pe.encode('pneuma') '5500' >>> pe.encode('knight') '5300' >>> pe.encode('trice') '3620' >>> pe.encode('judge') '2200'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode_alpha('Gough') 'K' >>> pe.encode_alpha('pneuma') 'NN' >>> pe.encode_alpha('knight') 'NT' >>> pe.encode_alpha('trice') 'TRK' >>> pe.encode_alpha('judge') 'KK'
New in version 0.4.0.
-
class
abydos.phonetic.
Soundex
(max_length: int = 4, var: str = 'American', reverse: bool = False, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Soundex.
Three variants of Soundex are implemented:
'American' follows the American Soundex algorithm, as described at [Sta07] and in [Knu98]; this is also called Miracode
'special' follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
'Census' follows the rules laid out in GIL 55 [Sta97] by the US Census, including coding prefixed and unprefixed versions of some names
New in version 0.3.6.
Initialize Soundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to
American
):American
follows the American Soundex algorithm, as described at [Sta07] and in [Knu98]; this is also called Miracodespecial
follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].Census
follows the rules laid out in GIL 55 [Sta97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str, **kwargs: Any) → str[source]¶ Return the Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode("Christopher") 'C623' >>> pe.encode("Niall") 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
>>> Soundex(max_length=-1).encode('Christopher') 'C623160000000000000000000000000000000000000000000000000000000000' >>> Soundex(max_length=-1, zero_pad=False).encode('Christopher') 'C62316'
>>> Soundex(reverse=True).encode('Christopher') 'R132'
>>> pe.encode('Ashcroft') 'A261' >>> pe.encode('Asicroft') 'A226'
>>> pe_special = Soundex(var='special') >>> pe_special.encode('Ashcroft') 'A226' >>> pe_special.encode('Asicroft') 'A226'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode_alpha("Christopher") 'CRKT' >>> pe.encode_alpha("Niall") 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
class
abydos.phonetic.
SoundexBR
(max_length: int = 4, zero_pad: bool = True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SoundexBR.
This is based on [Mar15].
New in version 0.3.6.
Initialize SoundexBR instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode('Oliveira') 'O416' >>> pe.encode('Almeida') 'A453' >>> pe.encode('Barbosa') 'B612' >>> pe.encode('Araújo') 'A620' >>> pe.encode('Gonçalves') 'G524' >>> pe.encode('Goncalves') 'G524'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode_alpha('Oliveira') 'OLPR' >>> pe.encode_alpha('Almeida') 'ALNT' >>> pe.encode_alpha('Barbosa') 'BRPK' >>> pe.encode_alpha('Araújo') 'ARK' >>> pe.encode_alpha('Gonçalves') 'GNKL' >>> pe.encode_alpha('Goncalves') 'GNKL'
New in version 0.4.0.
-
class
abydos.phonetic.
SpanishMetaphone
(max_length: int = 6, modified: bool = False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Spanish Metaphone.
This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].
Modified version based on [delPAngelesBailonM16].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Spanish Metaphone of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Spanish Metaphone code
- Return type
str
Examples
>>> pe = SpanishMetaphone() >>> pe.encode('Perez') 'PRZ' >>> pe.encode('Martinez') 'MRTNZ' >>> pe.encode('Gutierrez') 'GTRRZ' >>> pe.encode('Santiago') 'SNTG' >>> pe.encode('Nicolás') 'NKLS'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
StatisticsCanada
(max_length: int = 4)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Statistics Canada code.
The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the "Census modified Statistics Canada name coding procedure".
The modified version of this algorithm is described in Appendix B of [MKTM77].
New in version 0.3.6.
Initialize StatisticsCanada instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
-
encode
(word: str) → str[source]¶ Return the Statistics Canada code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Statistics Canada name code value
- Return type
str
Examples
>>> pe = StatisticsCanada() >>> pe.encode('Christopher') 'CHRS' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHM'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.phonetic.
Waahlin
(encoder: Optional[abydos.phonetic._phonetic._Phonetic] = None)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Wåhlin code.
Wåhlin's first-letter coding is based on the description in [Eri97].
New in version 0.3.6.
Initialize Waahlin instance.
- Parameters
encoder (_Phonetic) -- An initialized phonetic algorithm object
New in version 0.4.0.
-
encode
(word: str, alphabetic: bool = False) → str[source]¶ Return the Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
alphabetic (bool) -- If True, the encoder will apply its alphabetic form (.encode_alpha rather than .encode)
- Returns
The Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode('Christopher') 'KRISTOFER' >>> pe.encode('Niall') 'NJALL' >>> pe.encode('Smith') 'SMITH' >>> pe.encode('Schmidt') '*MIDT'
New in version 0.4.0.
-
encode_alpha
(word: str) → str[source]¶ Return the alphabetic Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode_alpha('Christopher') 'KRISTOFER' >>> pe.encode_alpha('Niall') 'NJALL' >>> pe.encode_alpha('Smith') 'SMITH' >>> pe.encode_alpha('Schmidt') 'ŠMIDT'
New in version 0.4.0.