As an aside, while this tool can be used to create an audiobook from a book you have in text format, for your private consumption, having an author employ something like this to create files for distribution is extremely risky, even if they acknowledge its use and intend those files to only be available on their website.
Indie authors struggle a lot to promote their works, and the new normal is that potential readers, the polite ones[^1], use the slightest hint of AI usage to discard their title and move on...as they are entitled to, since there are so many books.
I in particular have started to hire voice actors that have good acting skills and good diction but for whom English is their second language, or it's their first language but they speak something else at home; sometimes I even ask them to go a notch up with their accents. It helps with the non-AI recognition, and it also increases the appeal of the book for people who would like to try out something new. Once, I did an audition for a project and was pleasantly surprised with how much life people from around the Mediterranean basin were able to inject into their renderings, compared with people from Britain and North America.
[^1] Impolite readers set the town on fire, and then go about and spread that fire to neighboring towns, for good measure.
baxtr 121 days ago [-]
I am big time user of Amazon’s WhisperSync feature. With that feature I can simultaneously read the book and listen to it.
This is especially helpful when you’re on the go but still want to have a visual now and then or highlight text for later.
The problem is that many books don’t offer that feature. There is a built-in read function now in the kindle app, but it’s crap.
So, if you ask me, I’d prefer a good human-written book with an additional AI voice on top to enable that feature for me.
OfflineSergio 120 days ago [-]
I'm obsessed with "simultaneously read the book and listen to it".
Thats why I built WithAudio. You can checkout the demo here: https://desktop.with.audio/reader-demo
I'd love to hear any feedback you have. "prefer a good human-written book with an additional AI voice on top to enable that feature for me" is exactly what I prefer when it comes to reading.
montag 120 days ago [-]
Very surprising that you're offering this without subscription. Huge selling point. Next time the need arises, I'll come back for WithAudio.
120 days ago [-]
baxtr 120 days ago [-]
Thanks! Will definitely check it out
em-bee 120 days ago [-]
yeah, i don't see the problem. using a generated voice, no matter how, only affects the audiobook, not the actual book. if i don't like the voice i can ignore it. i am part of a group that occasionally gets email from new authors wanting us to review their books. and some of them sound really interesting, and i'd love to read them, but i can only do audiobook, so i would be very happy if the author went through the effort to generate an audiobook that i could listen to.
I've been meaning to use its position sync protocol with KoReader, but it's not trivial.
crazygringo 120 days ago [-]
> and the new normal is that potential readers, the polite ones[^1], use the slightest hint of AI usage to discard their title and move on...
Is that the new normal?
My impression is that when it comes to reading text, nobody cares as long as the final product is good.
People don't want AI-written books, but people have been comfortably listening to AI voices reading text for a long time now. Text-to-speech isn't really a controversial thing for listening to articles or books.
(Which is very different from voice acting, for example, which requires acting not just reading.)
dsign 120 days ago [-]
Sadly, it is common enough these days. There are reports of authors that set up a stand to physically sell their books, and members of the public ask "is this AI-written? Who made the cover? Is this cover AI?"
> My impression is that when it comes to reading text, nobody cares as long as the final product is good.
Maybe so, but it only takes a vocal minority to ruin things for an author. And that minority tends to use anything, even a non-overtly-critical mention of AI in social media.
The consensus right now is that using LLMs for fixing grammar and typos is acceptable. I personally use them for word completion (specially the devil incarnate which are the prepositions on/in), but tend to discard suggestions that improve flow, sentence structure and readability, because those increase the odds of triggering "AI detectors". In fact, I've found a renewed taste for unconventional sentence structure and unconventional punctuation; things that three years ago, before the LLM boom, I really didn't care for.
m_sahaf 120 days ago [-]
I imagine a pipeline between Calibre-Web[0] and audiobookshelf[1] going through Abogen, where Calibre-Web supplies the books, Abogen generates the audio version of it, and Audiobookshelf serves them. Great solution for the hearing impaired.
Does it turn it into spoken word or an audiobook?
Because good audiobooks often have voice actors that read the characters with different emphasis and dialects.
I imagine tools like chatgpt could do this for a few sentences but what about an 8-20 hour audiobook?
I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.
Or am I missing something?
jamilton 121 days ago [-]
Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.
It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.
tummler 120 days ago [-]
I’m sure it’s doable. I think you’d want to break it into a few discrete steps for the best quality. First process the book and identify key info like genre, tone, etc. Use that to determine the best voice(s) and reading style, assign actors for multiple characters/subjects. Maybe output some examples to spot check for approval. Tweak based on that then generate the audio. Prob a couple other steps in there and maybe a bit of custom work to optimize in key areas. If someone wants to do this as a side project I can help scope the architecture and process but I don’t want to code it. :p
vorgol 121 days ago [-]
Have you heard results from it? How does it know for example, when there is a romantic scene in the book, which voice to read out as?
It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?
fudged71 120 days ago [-]
I don't think they do it automatically, though. I think you need to piece apart the transcript in their tool to decide which voice to use where.
pyman 121 days ago [-]
Is it open source?
JSR_FDED 121 days ago [-]
[flagged]
Thorrez 121 days ago [-]
I don't see a link to Elevenlabs. So I'll post one: https://elevenlabs.io/ . It doesn't look open source to me.
pyman 120 days ago [-]
Thanks for the link.
Not sure why my question got downvoted. We were talking about Abogen which is FOSS.
BenGosub 121 days ago [-]
There are a few character voices that also can be mixed using the mixer, achieving different nuances. You can then write your own code to use different voices for different characters.
parineum 120 days ago [-]
> Because good audiobooks often have voice actors that read the characters with different emphasis and dialects.
I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.
crazygringo 120 days ago [-]
I'm with you. It's as if a book decided to use a different font for each character's speech. It's distracting, not helpful.
frumiousirc 120 days ago [-]
This needs to be run from an environment where `pip` is available as that tool is used during the running of the abogen app. Using `uv tool run abogen` gets you started but then the app hangs at model install time. `uv venv && uv pip install pip && source .venv/bin/activate && abogen` lets it run properly.
Otherwise, it's a nicely packaged GUI. Well done!
I tried a PDF and the UI to select pages or sections is good and generation is fast on my laptop's GTX 1650.
The result is an .ogg audio and .ass subtitle file. Played with mpv allows listening and reading along in the terminal. Only issue I have with the result is that visual line breaks from the PDF are preserved resulting in long pauses "randomly" in the middle of sentences. This greatly interrupts understanding of the audio.
Edit: enabling the skipping of single newlines helps!
nnashrat 120 days ago [-]
I just converted a 110 page book to wav in about an hour with a RTX 4060.
I didn't have the newlines enabled though so it was pretty useless.
Enabling makes this pretty awesome.
af_heart is a great voice to me while af_jessica I find annoying. That is the main issue I have with audiobooks , the randomness of liking the voice actor or not almost matters as much as what the book says for me.
I knew this day was coming soon and I really am blown away. I have got so use to audiobooks that it is hard to actually sit and read a full book for me. I have about 20 books to convert that would never have a market to bother having someone read the book and in a voice I really like. Incredible.
logicprog 120 days ago [-]
I've been using this to try to make audiobooks out of various philosophy books I've been wanting to read, for accessibility reasons, and I ran into a critical problem: if the input text fed to Kokoro is too long, it'll start skipping words at the end or in the middle, or fade out at the end; and abogen chunks the text it feeds to Kokoro by sentence, so sentences of arbitrary length are fed to Kokoro without any guarding. This produces unusable audiobooks for me. I'm working on "vibe coding" my own Kokoro based tkinter personal gui app for the same purpise that uses nltk and some regex magic for better splitting.
gavinray 120 days ago [-]
I use "kokoro-tts" CLI, which has better chunking/splitting.
I've been meaning to write a post on this workflow because it's incredibly useful
logicprog 120 days ago [-]
I'll look into this! But I have to say I'm a bit attacked to the little app I've ended up habing AI make for myself lol. It's so cute, and its mine!
denizsafak 120 days ago [-]
Hey, can you share an example book or text so I can test it?
Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
logicprog 118 days ago [-]
> Hey, can you share an example book or text so I can test it?
> Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
> This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
I tried that, as well as doing it myself, and it didn't seem to help.
denizsafak 104 days ago [-]
I tested the files that you mentioned and found no issues. There are no missing words. Please check your files.
RicoElectrico 120 days ago [-]
I just can't stand how non-deterministic many deep learning TTSes are. At least the classical ones have predictable pronunciation which can be worked around if needed.
ethan_smith 120 days ago [-]
You could try implementing a character count limit per chunk instead of sentence-based splitting. A hybrid approach that breaks at sentence boundaries but enforces a maximum chunk size of ~150-200 characters would likely solve the word-skipping issue while maintaining natural speech flow.
logicprog 118 days ago [-]
That's precisely what I'm doing. I'm splitting by sentences, and then for each sentence that's still too long, I split them by natural breakpoints like colons, semicolons, commas, dashes, and conjunctions, and if any of /those/ are still too long, I then break by greedy-filling words. Then I do some fun manipulation on the raw audio tensors to maintain flow.
TOGoS 121 days ago [-]
The demo video doesn't seem to have any audio in it! At least none that either ffmpeg or whatever Firefox uses can recognize.
noisem4ker 121 days ago [-]
It's probably due to the unusual sound format, 24kHz mono PCM, and the fact that it was somehow forced into a WebM container, which only supports Vorbis and Opus officially.
It looks like the author created it using the "higher quality" ffmpeg command line, except for the "webm" final extension, producing the opposite of what's described as "an MP4 file that's compatible with more devices".
Yeah, I've run a local Kokoro instance, and it doesn't work with Firefox. This uses Kokoro under the hood.
noisem4ker 121 days ago [-]
The demo clip is static and has the Kokoro output encoded as the audio track. It's not Kokoro running and generating it in your browser in real time.
jamilton 121 days ago [-]
Same here, but it worked when I opened it in Chrome. What a weird error - you would think that playing an embedded mp4 with audio wouldn't differ from browser to browser.
mnmalst 121 days ago [-]
I was surprised by this as well at first but thinking about it, it would make sense when they use an audio codec which is not supported on the target system. In that case the video can still play but the audio can't. I wasn't aware tho that audio can be disabled separately.
frumiousirc 121 days ago [-]
Thanks for this. I thought I had some local issue with waterfox. Pasting the (long) video URL to the terminal let's mpv play it with audio.
Daunk 121 days ago [-]
Same on my end, no audio in the video.
huseyinkeles 121 days ago [-]
I can hear it on safari
gman83 120 days ago [-]
I love audiobooks, but I'm a stickler for good narration. I've stopped listening to plenty of audiobooks because I didn't like the narrator. I guess it will be a long time before I can use something like this.
NBJack 120 days ago [-]
I recall one series where R. C. Bray had been doing the narration for several books, then for undisclosed reasons they replaced him with another narrator. The drop in quality was so bad I eventually gave up trying to finish the series (though admittedly the author(s) didn't seem to be helping much with the content).
Some narrators, like Wil Wheaton, are so entertaining to me I actively search by what they have voiced.
In general, I have to agree the narrator can make or break a series.
ratelimitsteve 120 days ago [-]
Coming from the other side of this, I've had a good narrator sell me entire series in the past. The Grim Noir Chronicles is the first that comes to mind (idk if anyone remembers the terrible sitcom Perfect Strangers but he played Balki and in real life he has a buttery smooth baritone voice that I just adore) and anything that Soundbooth Theater touches, partly for Jeff Hays and partly because of the full-cast adaptations they do that feel like old school radio plays to me. I see no reason to use this over existing text to speech features. If i just want to mechanically turn patterns of light into patterns of vibration there's non-AI tech that will do that for free, and AI narration doesn't do what human narration does yet.
crazygringo 120 days ago [-]
> I've stopped listening to plenty of audiobooks because I didn't like the narrator.
I'm with you on this, but my reaction is the opposite -- I'm wondering if there are some books I couldn't stand to listen to, that now I could with a nice neutral narration voice? Instead of the weird untrained voice with weird vocal tics that was the official narration?
criddell 120 days ago [-]
What are some of your favorite audiobooks?
scotty79 121 days ago [-]
I think the quality of the voice is super important for audiobooks and I think we are just closing in on the required quality with TTS.
I played a bit with Eleven labs voices and while they aren't bad when I tried make them read fragment of a text that I wrote, it sounded chaotic, boring, quite terrible, for anything longer than a sentence or two. But when I tried their v3 voices which they are currently in the process of rolling out, the same text sounded consistent, emotional, engaging, simply amazing. I think we are just crossing vocal uncanny valley.
porker 121 days ago [-]
Strong agree that voice quality (and voice acting) is important. I listen to a lot of fiction audiobooks, and will listen to the end of a middling book with a good narrator, but if the narration is flat or out of keeping with the characters I'll stop after a chapter or two.
8s2ngy 121 days ago [-]
I've been using Kokoro TTS with the CLI app, audiblez, mentioned in the "Similar Projects" section of the README. The model is fast and delivers impressive quality for its small size. Some issues I have faced, however, are:
a) It doesn't distinguish periods at the end of sentences from the dots in abbreviations such as "Mr." or "Mrs." The result is an awkward pause between "Mr." and the name.
b) It doesn't handle ellipses well.
c) Words are pronounced the same way regardless of context.
beboplifa 120 days ago [-]
I fixed that here: https://github.com/cpttripzz/audiblez
The main problem with Kokoro is how flat and lifeless it sounds. But it is very fast. I prefer Chatterbox tts but it is around 20 times slower and will not work without a GPU
fudged71 120 days ago [-]
Look into SSML phoneme tags. Some TTS supports it. That was you can use a powerful LLM to fix these issues ahead of TTS
rkagerer 121 days ago [-]
The Mr. / Mrs. thing feels like it would be a pretty easy fix, at least to eliminate a lot of the more common cases.
hombre_fatal 121 days ago [-]
^ A thought that everyone has had at one point when processing human text before learning the hard way (like end of sentence detection). :P
The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.
leobg 121 days ago [-]
Kokoro is small and fast because all the text -> phoneme conversion is done by “dumb code” and only the phoneme -> sound part is done using a neural net.
amaccuish 121 days ago [-]
Amazing, but I'm personally waiting for the one that generates a well formated ePub from a PDF.
floppyd 121 days ago [-]
I tried Kokoro for voicing blog posts and articles and wasn't impressed to be honest. Right now Gemini 2.5 Flash TTS is a much more capable system with generous free limits (about 10 minutes per generation and about 90 minutes per day). Voices are not very consistent between generations, but for shorter pieces it's not a big deal (but will obviously be for books)
ekianjo 121 days ago [-]
Kokoro is fine for TTS, but it lacks emotion. But for a model of this size, that is kind of given.
robin_reala 121 days ago [-]
Ironic given the name: kokoro is Japanese for heart or sentiment.
SirHumphrey 120 days ago [-]
I played with ebook generation a bunch and find that (at least for English text) around 1B is needed to get something usable emotionally (Chatterbox is 0.5B, Orpheus is 3B).
xtracto 120 days ago [-]
I assume it doesn't work well for books that have non-text structured elements (code, diagrams, etc)or images (which is expected).
I wonder, is there some open source NN that can consume PDF pages and produce a "pure prose" version of it. Say, a page with mixed text and an image of a car engine would be output to the text and then a detailed description of the image, or what it is depicting.
numb7rs 120 days ago [-]
You will want to reconsider the name if you plan to have a presence in Australia or New Zealand. "Abo" is an ethnic slur similar in offensiveness to the N-word.
denizsafak 120 days ago [-]
Thank you for bringing this to my attention, I genuinely appreciate you taking the time to share this important feedback. As the repository owner, I want to clarify that "abogen" is simply a shortened form of "audiobook generator," which is what this project does.
I completely understand your concern, and I'm grateful that you pointed this out. It's clear that your comment comes from a place of wanting to help, and I really value community members like you who look out for potential issues.
The name was chosen purely based on the technical functionality (audiobook generation), and I had no awareness of the unfortunate similarity you've mentioned. As English is not my native language, I sometimes miss these cultural nuances that native speakers would naturally catch. I appreciate your understanding that this was entirely unintentional.
Thank you again for the thoughtful heads-up
isaacremuant 120 days ago [-]
Don't worry. Australia will probably ban it for some reason anyway. Better to be free.
Btw, Don't look into the name of a famous python formatter or you might be offended.
dumbasrocks 120 days ago [-]
"Abo" carries the same cultural impact as "Paki" in the UK, and isn't a word in its own right, let alone a very commonly used one. Completely different kettle of fish. Seeing as you want to derail this into some free speech rant - it's just distasteful. You really think anybody who wants legislative control over freedom of speech in Australia gives a damn about Aboriginals getting offended?
isaacremuant 120 days ago [-]
Intent matters and your phrasing "Seeing as you want to derail this into some free speech rant" is fundamentally offensive to me. Censor yourself.
Go fight for chat control under the guise of caring about others. It's been really successful so far. "Free speech rant"...
yepyip 120 days ago [-]
Everyone has a gimmick these days. If abos are that sensitive, let them not have access to solutions—no need to worry about them.
someperson 120 days ago [-]
The project presumably is a portmanteau of "audio book generator".
I agree that the project need not be renamed to remove the single syllable that may be an obscure slur, especially since every syllable may be an obscure slur in some language and you can't expect somebody to learn them all just to avoid them.
But there was no need to use that syllable as a slur.
dumbasrocks 120 days ago [-]
Having zero cultural sensitivity like this is incredibly embarrassing
perfect, I was looking for something like that ! is it gui only, or is there an api available ? I would like to be able to share a link or a text from my phone and get back the audio
dumbasrocks 120 days ago [-]
Would you call a network packet generation tool Pakigen?
u_sama 120 days ago [-]
AudioBOokGENerator -> abogen
One should stop asuming everyone is versed in all slangs (or slurs for that matter) existing in all languages in the world. The author seems to have a Turkish name, thus I will assume he is Turkish and so I would guess he didnt think much about the name.
leansensei 119 days ago [-]
"Abo" in German means "subscription" (from the French abonnement). Show some cultural sensitivity about these languages.
nikolayasdf123 121 days ago [-]
can I choose any voice? would love to read software engineering books in voice of Morgan Freeman, or maybe even better, Scarlett Johansson
Because the Stephen Hawking voice spends a quarter of its time joking/complaining how it never got a Nobel Prize.
hajimuz 121 days ago [-]
Yeah, could be a buff like 500% brain supercharge.
lynx97 120 days ago [-]
DAISY would be a desirable output format.
obfuscator 120 days ago [-]
My biggest selling point for this would be, that the volume is probably the same throughout the whole text. I am listening to audiobooks to fall asleep, and many voice actors go from very quiet to loud in conversations. It may be good narration, but it's sometimes to quiet to understand, so I need to increase the volume only to be woken up by some loud lines later.
So I imagine generated audiobooks to be good in that regard. Another option would be to have a "normalize volume" setting at audible, or other services.
leke 121 days ago [-]
How big is this app?
Rendered at 22:46:21 GMT+0000 (Coordinated Universal Time) with Vercel.
As an aside, while this tool can be used to create an audiobook from a book you have in text format, for your private consumption, having an author employ something like this to create files for distribution is extremely risky, even if they acknowledge its use and intend those files to only be available on their website.
Indie authors struggle a lot to promote their works, and the new normal is that potential readers, the polite ones[^1], use the slightest hint of AI usage to discard their title and move on...as they are entitled to, since there are so many books.
I in particular have started to hire voice actors that have good acting skills and good diction but for whom English is their second language, or it's their first language but they speak something else at home; sometimes I even ask them to go a notch up with their accents. It helps with the non-AI recognition, and it also increases the appeal of the book for people who would like to try out something new. Once, I did an audition for a project and was pleasantly surprised with how much life people from around the Mediterranean basin were able to inject into their renderings, compared with people from Britain and North America.
[^1] Impolite readers set the town on fire, and then go about and spread that fire to neighboring towns, for good measure.
This is especially helpful when you’re on the go but still want to have a visual now and then or highlight text for later.
The problem is that many books don’t offer that feature. There is a built-in read function now in the kindle app, but it’s crap.
So, if you ask me, I’d prefer a good human-written book with an additional AI voice on top to enable that feature for me.
I'd love to hear any feedback you have. "prefer a good human-written book with an additional AI voice on top to enable that feature for me" is exactly what I prefer when it comes to reading.
I've been meaning to use its position sync protocol with KoReader, but it's not trivial.
Is that the new normal?
My impression is that when it comes to reading text, nobody cares as long as the final product is good.
People don't want AI-written books, but people have been comfortably listening to AI voices reading text for a long time now. Text-to-speech isn't really a controversial thing for listening to articles or books.
(Which is very different from voice acting, for example, which requires acting not just reading.)
> My impression is that when it comes to reading text, nobody cares as long as the final product is good.
Maybe so, but it only takes a vocal minority to ruin things for an author. And that minority tends to use anything, even a non-overtly-critical mention of AI in social media.
The consensus right now is that using LLMs for fixing grammar and typos is acceptable. I personally use them for word completion (specially the devil incarnate which are the prepositions on/in), but tend to discard suggestions that improve flow, sentence structure and readability, because those increase the odds of triggering "AI detectors". In fact, I've found a renewed taste for unconventional sentence structure and unconventional punctuation; things that three years ago, before the LLM boom, I really didn't care for.
[0] https://github.com/janeczku/calibre-web
[1] https://github.com/advplyr/audiobookshelf
I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.
Or am I missing something?
It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.
It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?
Not sure why my question got downvoted. We were talking about Abogen which is FOSS.
I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.
Otherwise, it's a nicely packaged GUI. Well done!
I tried a PDF and the UI to select pages or sections is good and generation is fast on my laptop's GTX 1650.
The result is an .ogg audio and .ass subtitle file. Played with mpv allows listening and reading along in the terminal. Only issue I have with the result is that visual line breaks from the PDF are preserved resulting in long pauses "randomly" in the middle of sentences. This greatly interrupts understanding of the audio.
Edit: enabling the skipping of single newlines helps!
I didn't have the newlines enabled though so it was pretty useless.
Enabling makes this pretty awesome.
af_heart is a great voice to me while af_jessica I find annoying. That is the main issue I have with audiobooks , the randomness of liking the voice actor or not almost matters as much as what the book says for me.
I knew this day was coming soon and I really am blown away. I have got so use to audiobooks that it is hard to actually sit and read a full book for me. I have about 20 books to convert that would never have a market to bother having someone read the book and in a voice I really like. Incredible.
https://github.com/nazdridoy/kokoro-tts
It generates a directory of audio files, along with a metadata file for ebook chapters
You have to use m4b-tool to stitch the audio files together into an audiobook and include the chapter metadata, but it works great:
https://github.com/sandreas/m4b-tool
I've been meaning to write a post on this workflow because it's incredibly useful
Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
I was running into issues with this one: https://theanarchistlibrary.org/library/kevin-carson-studies..., this one: https://files.libcom.org/files/Accelerate%20-%20Robin%20Mack... (converted to plain text using MinerU, double checked to make sure the text was clean).
> Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
Ah, that's odd. So I don't know why abogen'd be doing the weird fading out and skipping words thing then when my tool (https://github.com/alexispurslane/kokoro-audiobook-reliable/) isn't.
> This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
I tried that, as well as doing it myself, and it didn't seem to help.
It looks like the author created it using the "higher quality" ffmpeg command line, except for the "webm" final extension, producing the opposite of what's described as "an MP4 file that's compatible with more devices".
https://github.com/denizsafak/abogen/tree/main/demo#for-high...
Some narrators, like Wil Wheaton, are so entertaining to me I actively search by what they have voiced.
In general, I have to agree the narrator can make or break a series.
I'm with you on this, but my reaction is the opposite -- I'm wondering if there are some books I couldn't stand to listen to, that now I could with a nice neutral narration voice? Instead of the weird untrained voice with weird vocal tics that was the official narration?
I played a bit with Eleven labs voices and while they aren't bad when I tried make them read fragment of a text that I wrote, it sounded chaotic, boring, quite terrible, for anything longer than a sentence or two. But when I tried their v3 voices which they are currently in the process of rolling out, the same text sounded consistent, emotional, engaging, simply amazing. I think we are just crossing vocal uncanny valley.
The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.
I wonder, is there some open source NN that can consume PDF pages and produce a "pure prose" version of it. Say, a page with mixed text and an image of a car engine would be output to the text and then a detailed description of the image, or what it is depicting.
I completely understand your concern, and I'm grateful that you pointed this out. It's clear that your comment comes from a place of wanting to help, and I really value community members like you who look out for potential issues.
The name was chosen purely based on the technical functionality (audiobook generation), and I had no awareness of the unfortunate similarity you've mentioned. As English is not my native language, I sometimes miss these cultural nuances that native speakers would naturally catch. I appreciate your understanding that this was entirely unintentional.
Thank you again for the thoughtful heads-up
Btw, Don't look into the name of a famous python formatter or you might be offended.
Go fight for chat control under the guise of caring about others. It's been really successful so far. "Free speech rant"...
I agree that the project need not be renamed to remove the single syllable that may be an obscure slur, especially since every syllable may be an obscure slur in some language and you can't expect somebody to learn them all just to avoid them.
But there was no need to use that syllable as a slur.
One should stop asuming everyone is versed in all slangs (or slurs for that matter) existing in all languages in the world. The author seems to have a Turkish name, thus I will assume he is Turkish and so I would guess he didnt think much about the name.
So I imagine generated audiobooks to be good in that regard. Another option would be to have a "normalize volume" setting at audible, or other services.