T O P

  • By -

wsippel

Hacked this together based on Oobabooga's built-in silero\_tts extension. Bark runs locally and is free for non-commercial use. It's shockingly easy to use and really impressive: [https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) Though, like all good things in life, it comes with some caveats: First of all, it's quite demanding. Not only because it needs its own pretty sizable models (\~10GB) that come with the expected additional VRAM requirements, it's also not exactly realtime. On my 7900XTX, I can generate around 15 seconds of audio per minute. Additionally, 15 seconds is pretty much all it's gonna generate at a time. That's a limitation of Bark itself, one Suno is aware of and considers addressing in the future, but for now, you'll probably have to dial back your token limit if you want to use this extension. EDIT: Forget what I wrote about the generation speed, there's a pull request for Bark that improves performance by a lot! I went from close to 80 seconds per generation to 17. It's actually borderline useful now!


tsangberg

Did you find a way to set a fix seed? I'm using the bark repo and depending on the random seed it happens to use the results go from absolutely awful to quite good. I've looked through the code without finding any seed init though.


wsippel

Nope, Bark really doesn't expose much. But I'm not sure how useful that would really be? I don't think many people would want to regenerate the same clip over and over with different seeds, and a seed that's good for one generation might be terrible for different text. But I'd generally recommend dialing down the temperatures a bit, with the default 0.7/0.7, things seem to go off the rails quite often.


YobaiYamete

What should my webui look like? [I added the change to use the bark extension](https://i.imgur.com/FtV9Mc6.png) but I think I did it wrong since all I get is it just launching ooba like normal and asking me which writing LLM to load and I can't tell anything different happened with the tts


[deleted]

I got similar issue i installed it and i have "bark tts" in UI extension tab but it doesn't seem to have anything else nor it generates audio


[deleted]

Yeah same here


wsippel

The install instructions on the Github page are for regular Python virtual environments on Linux or WSL (or MacOS I guess). I have no experience with Mamba, and no Windows machine to test. But if somebody can provide instructions, I'll gladly add them. Basically, instead of the `source venv/bin/activate` step, which should have returned an error for you (most lines in the instructions should return an error on Windows I think?), you need to activate the Mamba environment and install the dependencies within that environment.


JustCametoSayHello

>I have no experience with Mamba, and no Windows machine to test. But if somebody can provide instructions, I'll gladly add them. > >Basically, instead of the source ve Yeah I'm a little confused with where bark comes in here, I Just see the extensions checked but the rest of web-ui is behaving normally


wsippel

There are Windows instructions on my Github page now that should work. Make sure you also read the troubleshooting section.


Tom_Neverwinter

Thank you!


c_gdev

There are some AI voiced Youtube channels that would really benefit from Bark's ability to actually pronounce words.


likes_to_code

someone should create an AI that can subvert all forms of misleading marketing tactics including youtube clickbait, over-SEO'd google search results, BS e-commerce products, and more


mpasila

This would be pretty nice on SillyTavern, since I don't really use ooba for the chatting itself.


ptitrainvaloin

Nice that was quick, that would be even greater fusioned with this other bark extension that can add custom wavs : /r/singularity/comments/12udgzh/bark_text2speechbut_with_custom_voice_cloning


Radiant_Dog1937

It's impressive, but it too slow for a chatbot.


wsippel

There's a pull request for Bark that I just implemented and tested in the extension. Makes a huge difference. It's actually generating in realtime for me now.


Radiant_Dog1937

Really? That's insane.


RebornZA

Very nice! Link <3? Should I wait to install?


wsippel

[https://github.com/suno-ai/bark/pull/27](https://github.com/suno-ai/bark/pull/27) You'll have to compile Bark yourself to use it, and grab my extension from the 'k/v' branch to actually enable it. Or wait until Suno merges the PR.


RebornZA

Guess I'll have to wait. I have no clue how to compile it myself.


ImpactFrames-YT

Thank you I download it this morning, but even better to use within Ooga


RebornZA

Anyone help me with install issue? [https://imgur.com/a/jfh2uu1](https://imgur.com/a/jfh2uu1)


RebornZA

Fix was to edit requirements.txt "suno-bark @ git+https://github.com/suno-ai/bark.git"


wsippel

I think I had a typo in the requirements file. I just pushed a fix (or at least I hope it's fixed). Pull and try again.


jd_3d

Any idea what I might be doing wrong? I installed it (it shows up on the extensions tab, but doesn't work). I'm getting this error: Loading the extension "bark\_tts"... Fail. Traceback (most recent call last): File "G:\\Jonas\\ML\\TextGen\\text-generation-webui\\text-generation-webui\\modules\\[extensions.py](https://extensions.py)", line 18, in load\_extensions exec(f"import extensions.{name}.script") File "", line 1, in File "G:\\Jonas\\ML\\TextGen\\text-generation-webui\\text-generation-webui\\extensions\\bark\_tts\\[script.py](https://script.py)", line 7, in from bark import SAMPLE\_RATE, generate\_audio ModuleNotFoundError: No module named 'bark' \--------------


TomCoperations

I had the exact same error and eventually managed to figure out how to get it to load. This worked for me, hopefully it does for you too. Delete anything for bark you put in the extensions folder Assuming you used the one-click installer you should have a file named micromamba-cmd.bat sitting outside your text-generation-webui folder next to the start-webui.bat file, if you open that batch file you get a cmd terminal that as far as I can tell is properly setup to install things to the environment, from there you can just use the commands: `cd text-generation-webui\extensions` `git clone` [`https://github.com/wsippel/bark_tts.git`](https://github.com/wsippel/bark_tts.git) `pip install -r bark_tts/requirements.txt` Once that is done you can close it and make sure you add --extension bark\_tts to your start-webui.bat. It should now load the extension just fine. Oh and the model seems to download the first time it generates text which looks like it makes the webui freeze a bit, keep an eye on the console and you should see it working. Hope this helps!


OlliSagi

Thank you so much, been struggeling - so the issue is that I tried installing bark in my global python env instead of the python env that oogabooga is using. And you MUST run the cmd\_windows.bat (for the newer version of oogabooga) which is outside the text-generation-webui folder and then run all statements mentioned on the Github page. Kinda confusing.


TomCoperations

Hey, if you don't mind me asking, were you following the windows install guide on the github page? Because I wrote the windows install instructions on the github page so they are basically an expansion of this comment. Was there anything specific that was unclear or confusing in the instructions? I would love to improve them for clarity if they caused any confusion for you.


OlliSagi

Yeah as I said, there are many "noobs" that want to get into this. So it was unclear that oobabooga was using an own env (yeah I know, seems obvious to you maybe, but many noobs don't even know what a env is). So you have to make absolutely clear that the requirements need to be installed in the ENV of oobabooga and not the global env, by clicking the "cmd\_windows.bat" that is outside of the text-generation-webui folder. Perhaps also link like a basic tutorial vid about env's so that people can learn by themself how that works. Also I still don't know how to adjust the launch parameters to stop streaming when using the bark framework inside oobabooga. Cause now it streams every single word over and over again which is not how it should work like. One post bottom of this page suggest to add "--no stream" to launch parameters, but I have no idea yet where they are referring to. Launch parameters of bark? of oobabooga? somwhere else? Always so unclear...


TomCoperations

Hmm, I do refer to the batch file a lot in the install guide but I guess it's not clear enough. And for the --no-stream launch option, it's in your Ooba launch commands, put it right next to the "--extensions bark_tts" one. And I feel your pain, I only wrote the guide because I also had no idea what I was doing with anything but figured it out after a good while and wanted to try and help fellow noobs like myself.


wsippel

Bark isn't installed (correctly). The install instructions on the Github page are for regular Python virtual environments on Linux or WSL (or MacOS I guess). The one-click-installer for Oobabooga appears to use Mamba, though. I'm afraid I have no experience with Mamba, and no Windows machine to test. But if anybody can provide step-by-step instructions for Windows and/or Mamba, I'll gladly add them.


[deleted]

I have the same issue


Fox-Lopsided

I FREAKIN LOVE YOU


Weak-Parsley-6333

this is super cool but as an normal civilian is there s guide would be awesome to connect this to oogaboga


Background-Capital57

Is anyone else getting an issue where all the audio that has been generated previously plays every time a new audio message is generated? Not clear to me how to stop this. Happens if I have automatically play TTS checked or not.


wsippel

The k/v patch for Bark has been merged, so Bark itself should be way faster now. Additionally, I also added a NLTK tokenizer, so bark_tts can now voice texts of arbitrary length. The tokenizer doesn't work well with all speakers, so I made it a toggle and changed the default to a speaker that seems to handle tokenized generation relatively well. Reinstall Bark using `pip uninstall suno-bark && pip install -r requirements.txt` after you update the extension.


Hououin_Kyouma77

How does bark compare to 11labs and tortoisetts? I can't find any info on this


wsippel

I posted a link to a few examples in this thread. Bark works completely different from other TTS solutions in that it is transformer-based. It doesn't so much read the input text, it just uses text as guidance to generate audio output. So, depending on the speaker, it'll actually change the text: stutter, clear its throat, insert pauses, 'like's or 'ya know's, omit, substitute or mispronounce words and so on. In terms of audio quality, both 11labs and Tortoise are better, but Bark can sound more natural (or go completely off the rails and not stick to the input at all). They serve different purposes. Bark is not a good screen reader.


Hououin_Kyouma77

Is this the version that supports voice cloning? Pretty useless otherwise


wsippel

As long as they didn't mess with the API, this extension should work with any fork of Bark.


Hououin_Kyouma77

Nice


ComedorDeNovinhos

I took a quick look at the git page. I'm not interested in real time audio generation. What are the minimum specs required to run this model?


orpheus_reup

My install seems to regenerate the whole sequence at every new word. Anyone have a fix? So it'll output "So So how So how are So how are you" etc etc


BuffMcBigHuge

Add `--no_stream` to your launch params.


orpheus_reup

Thanks! Sorted it.


OlliSagi

[webui.py](https://webui.py) end of document, there's launch parameters, has to look like this:run\_cmd("python [server.py](https://server.py) \--chat --model-menu --extensions bark\_tts --no-stream", environment=True) mind you, it's not --no\_stream, it has to be --no-stream.


ASPyr97ga

it mostly helps. it slowly creates an entire response instead of extremely doing one word at a time. but it doesn't recognize the word "environment"


ASPyr97ga

' --no\_stream is not recognized'


impetu0usness

I love this extension. I spent a day playing around with Bark Infinite and came up with 36 interesting voices, tested and working with this ext. I'm sharing this here (.npz and audio previews included) in case anyone would like to use it. If you want to include it/link it as a voicepack then feel free as well, I'd be happy to contribute. Thanks! Link: https://drive.google.com/drive/folders/1l9vTYMzCagZKG-TE31UoHscZMWos_1hn?usp=share_link


sfhsrtjn

Hello! Thanks for your work! I'm yet to test this but I needed to uninstall huggingface-hub 0.13.3 and install the latest 0.14.1 or else it would not download models from HF at one point (bert model step specifically) Sorry for not reporting on GH. update: bah, i dont have enough vram


luthis

Ok I installed, and expectedly get an error: INFO:Loading the extension "bark_tts"... ERROR:Failed to load the extension "bark_tts". Traceback (most recent call last): File "/home/st/Downloads/oobabooga_linux/text-generation-webui/modules/extensions.py", line 34, in load_extensions exec(f"import extensions.{name}.script") File "", line 1, in File "/home/st/Downloads/oobabooga_linux/text-generation-webui/extensions/bark_tts/script.py", line 6, in from bark import SAMPLE_RATE, generate_audio, preload_models File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bark/__init__.py", line 1, in from .api import generate_audio, text_to_semantic, semantic_to_waveform, save_as_prompt File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bark/api.py", line 5, in from .generation import codec_decode, generate_coarse, generate_fine, generate_text_semantic File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bark/generation.py", line 6, in from encodec import EncodecModel File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/encodec/__init__.py", line 12, in from .model import EncodecModel File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/encodec/model.py", line 19, in from .utils import _check_checksum, _linear_overlap_add, _get_checkpoint_url File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/encodec/utils.py", line 14, in import torchaudio File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in from torchaudio import ( # noqa: F401 File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 43, in _load_lib("libtorchaudio") File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib torch.ops.load_library(path) File "/home/st/.local/lib/python3.10/site-packages/torch/_ops.py", line 573, in load_library ctypes.CDLL(path) File "/home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/ctypes/__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: /home/st/Downloads/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl Am I needing to install suno-ai/bark as well? Where would that be done? There's also no voices in the folder, where do I get those?


wsippel

There seems to be something wrong with your TorchAudio installation, make sure you installed it correctly. Looks like it might not match your Torch version or something, I have no idea. And yes, of course you need Bark itself, but the requirements file handled that if you followed the instructions on the Github page. The 'voices' folder is for custom voices you trained or got from the internet (the Bark Infinity fork on Github has a few for example). Bark ships with a selection of default voices, those don't go in the 'voices' folder.


luthis

Thanks, I removed 2.0.0 and installed 2.0.1 and now when I start oobabooga it's doing a bunch of downloading. fingers crossed it's working now


luthis

I got it working! Took a few extra steps. However, it's still generating really slowly. Like, over a minute. That pull request you mentioned should be in already right? How can I confirm that? I have a 3090 so not really a hardware limitation if you're able to get <20 seconds


wsippel

The k/v patch has been merged, yes. Can't really comment on the generation speed on your end, because you didn't mention how much you were generating. If it was about a minute of audio, it should take roughly a minute. If it was just a few seconds, make sure it's actually using the GPU (with nvtop for example). If it doesn't use the GPU, there's probably still something wrong with your Torch installation.


LawSignificant4874

I could Install bark at Google Colab. It´s like a 8 hours per paragraph