Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
mastodon
4.5.6
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
here are a couple words from the phoneme dictionary in BeSTspeech:
.data:100964B4 unk_100964B4 db 0Ch ; jh
.data:100964B5 db 28h ; O
.data:100964B6 db 36h ; '
.data:100964B7 db 12h ; r
.data:100964B8 db 0Ch ; jh
.data:100964B9 db 0h ;
.data:100964BA db 0h ;
.data:100964BB db 0h ;
.data:100964BC unk_100964BC db 1h ; f
.data:100964BD db 12h ; r
.data:100964BE db 26h ; E
.data:100964BF db 36h ; '
.data:100964C0 db 14h ; d
.data:100964C1 db 0h ;
.data:100964C2 db 0h ;
.data:100964C3 db 0h ;
.data:100964C4 unk_100964C4 db 14h ; d
.data:100964C5 db 1Eh ; e
.data:100964C6 db 36h ; '
.data:100964C7 db 4h ; v
.data:100964C8 db 24h ; =
.data:100964C9 db 14h ; d
.data:100964CA db 0h ;
.data:100964CB db 0h ;
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
The phoneme dictionary for BeSTspeech has been found in the DLL.
The phonemes are simply stored as indexes pointing to the phoneme inventory table within array entries for such words like "one" "two" "three" and so on. This is looking promising.
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
BestSpeech uses bit masking for retrieving phoneme segmental features from a lookup table. Here are my notes thus far:
first bit is v/c switch (0 - vowel, 1 - consonant)
second bit is plosive flag
third bit is voicing flag
fourth bit is alveolar flag
fifth bit is affricate flag
sixth bit is nasal flag
seventh bit is strident flag (pertains to sibilants and some fricative sounds)
eighth bit is v/c switch but in reverse (0 - consonant, 1 - vowel)
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
So turns out BeSTspeech has a couple feature translation tables for ASCII. One of them being a bit masked letter class feature. That class table determines whether an ASCII symbol is a vowel, letter, number, or punctuation.
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities
Someone confused my github profile as a business yesterday. Had a freelancer send me an email introducing himself. I assumed he was looking for a freelance gig.
Part of the text to speech and AI image generation communities
Part of the text to speech and AI image generation communities