625 points by pain_perdu 3 days ago | 154 comments | View on ycombinator
derHackerman 2 days ago |
armcat 2 days ago |
lukebechtel 2 days ago |
Just made it an MCP server so claude can tell me when it's done with something :)
NoSalt 1 day ago |
Ok, who knows where I can get those high-quality recordings of Majel Barrett' voice that she made before she died?
singpolyma3 2 days ago |
It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.
pain_perdu 2 days ago |
mgaudet 2 days ago |
So, on my M1 mac, did `uvx pocket-tts serve`. Plugged in
> It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only
(Beginning of Tale of Two Cities)
but the problem is Javert skips over parts of sentences! Eg, it starts:
> "It was the best of times, it was the worst of times, it was the age of wisdom, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the spring of hope, it was the winter of despair, we had everything before us, ..."
Notice how it skips over "it was the age of foolishness,", "it was the winter of despair,"
Which... Doesn't exactly inspire faith in a TTS system.
(Marius seems better; posted https://github.com/kyutai-labs/pocket-tts/issues/38)
dale_glass 1 day ago |
Like what if I want to graft on TTS to an existing text chat system and give each person an unique, randomly generated voice? Or want to try to get something that's not quite human, like some sort of alien or monster?
GaggiX 2 days ago |
Another recent example: https://github.com/supertone-inc/supertonic
Evidlo 2 days ago |
nmstoker 1 day ago |
Imustaskforhelp 2 days ago |
I saw some agentic models at 4B or similar which can punch above its weights or even some basic models. I can definitely see them in the context of home lab without costing too much money.
I think atleast unmute.sh is similar/competed with chatgpt's voice model. It's crazy how good and (effective) open source models are from top to bottom. There's basically just about anything for almost everyone.
I feel like the only true moat might exist in coding models. Some are pretty good but its the only industry where people might pay 10x-20x more for the best (minimax/z.ai subscription fees vs claude code)
It will be interesting to see if we will see another deepseek moment in AI which might beat claude sonnet or similar. I think Deepseek has deepseek 4 so it will be interesting to see how/if it can beat sonnet
(Sorry for going offtopic)
dust42 2 days ago |
febin about 13 hours ago |
https://github.com/jamesfebin/pocket-tts-candle
The port supports:
- Native compilation with zero Python runtime dependency
- Streaming inference
- Metal acceleration for macOS
- Voice cloning (with the mimi feature)
Note: This was vibecoded (AI-assisted), but features were manually tested.
akx 2 days ago |
All too often, new models' codebases are just a dump of code that installs half the universe in dependencies for no reason, etc.
d4rkp4ttern 1 day ago |
claude plugin marketplace add pchalasani/claude-code-tools
claude plugin install voice@cctools-plugins
More here: https://github.com/pchalasani/claude-code-tools?tab=readme-o...
Paul_S 2 days ago |
donpdonp 2 days ago |
snvzz 2 days ago |
smallerfish 1 day ago |
OfflineSergio 2 days ago |
butz 1 day ago |
britannio 2 days ago |
https://gist.github.com/britannio/481aca8cb81a70e8fd5b7dfa2f...
exceptione 1 day ago |
agentifysh 2 days ago |
lykahb 2 days ago |
syntaxing 2 days ago |
_ache_ 2 days ago |
In English, it's perfect and it's so funny in others languages. It sounds exactly like someone who actually doesn't speak the language, but got it anyway.
I don't know why Fantine is just better than the others in others languages. Javer seems to be the worst.
Try Jean in Spanish « ¡Es lo suficientemente pequeño como para caber en tu bolsillo! » sound a lot like they don't understand the language.
Or Azelma in French « C'est suffisament petit pour tenir dans ta poche. » is very good.I mean half of the words are from a Québécois accent, half French one but hey, it's correct French.
Però non capisce l'italiano.
tschellenbach 2 days ago |
aki237 1 day ago |
I just tried some sample verses, sounds natural.
But there seems to be a bug maybe? Just for fun, I had asked it to play the Real Slim Shady lyrics. It always seems to add 1 extra "please stand-up" in the chorus. Anyone see that?
kreelman 1 day ago |
aidenn0 2 days ago |
indigodaddy 2 days ago |
g947o 1 day ago |
anonymous344 1 day ago |
maxglute 2 days ago |
undefined 2 days ago |
grahamrr 2 days ago |
Zardoz84 2 days ago |
oybng 2 days ago |
fuzzer371 2 days ago |
tempaccountabcd 1 day ago |
https://github.com/lukasmwerner/pocket-reader