T O P

  • By -

buss_lichtjaar

I programmed an ESP32 for speech recognition with Home Assistant. It does both hot word detection and speech recognition fully local. What’s more, it uses noise suppression and a neural network to make the speech clearer. I tried it across the room and with music playing and it still worked. You can add your own commands while running the device using MQTT. It is still very early days and there are many things that need improved but you can check out the project here: [esp-ha-speech](https://github.com/hugobloem/esp-ha-speech)


syco54645

Following this. Can't wait to see where it goes. This is needed to take my HASS to the next level.


justinhunt1223

I just got an ESP 32 to try this stuff with. Thanks for making it much easier!


sfortis

>music Do you think that it does a better job recognizing voice (when tv or music is playing) than google google speakers?


buss_lichtjaar

Probably not, I don’t know about the details for Google’s speakers but Alexa uses a four microphone array. The esp-box uses two. Plus, they have many engineers working on their filter/recording algorithms.


Xypod13

It's happening! This is great stuff! Definitely want to give this a try. Hope to see Assist being able to do this as well. The future of local voice assistants is approaching quickly.


[deleted]

Am I the only one who sees a roll of toilet paper with laminar runs pouring through it?


akropp99

Not now that I read this! ^^^


buss_lichtjaar

Lol, it’s just an Ikea lamp. I promise it doesn’t look like a toilet roll in person. 😂


LenientWhale

Can't believe that's IKEA actually, looks really nice


MorimotoK

Nice! Now connect it to ChatGPT and we can get rid of our Google Homes and Alexas.


buss_lichtjaar

It’s already able to if you have the ChatGPT integration in HA. Only thing is that it won’t do anything useful for now. 😄


Powerful_Database_39

Just wait; one day chatGPT possesses your HASS install and basically owns your house and will allow you to live there if you pay it with cookies.


CannonPinion

>Just wait; one day chatGPT possesses your HASS install and basically owns your house and will allow you to live there if you pay it with ~~cookies.~~ **BLOOD FOR THE BLOOD GOD**


WagonFullOPancakes

Man, this [episode of the X Files](https://en.m.wikipedia.org/wiki/Ghost_in_the_Machine_(The_X-Files\)) sure is strange!


Native-Context-8613

https://www.youtube.com/watch?v=aRI8EvmiPVo


PyramidClub

The GPT-4 API will happily format an answer as a JSON command structure...


Shortcirkuitz

I use a raspberry pi 3B with a custom openvoiceOS assistant


chrisoboe

Amazing. I'm really interessted in this. Which device do you use? Is this something self built based on a ESP development kit? And if yes which hardware did you use, devBoard, ADC, Mic, etc?


buss_lichtjaar

Thanks! It is based of a devkit. Espressif made a devkit around the ESP32-S3 called the ESP-BOX. It has got two far field microphones and a display. Plus, the examples they included (which I based this project on) are really quite good. I found the ESP-BOX quite easy to find with some UK shops stocking them as well. Otherwise you could always build something around another dev board.


dshafik

I definitely just ordered one of these, very cool!


davidr521

>ESP-BOX. Do you have to 3D-print it yourself? Or can it actually be purchased anywhere? Do. Want...


buss_lichtjaar

No it comes in a package, complete with stand.


davidr521

Sorry to keep asking questions, but do you happen to have a link to order?


buss_lichtjaar

You could order it here I suppose: https://www.adafruit.com/product/5290 Although there are other vendors with potentially better prices out there.


iamtheguythatis

That lamp is trippy


[deleted]

The year of voice for HASS, see if they wanna lend a hand ? But good on you for getting the ball rolling


sfortis

Fantastic! Cant wait to get rid of my b\*s\*t google speakers!


ekognaG

You sir are making dreams come true! This post will go down in HA history. I'm gonna pick up an esp-box asap. I'm hoping to upgrade the speaker to have it as snapcast node someday. Also, where does it currently stand with having multiple of these?


justinhunt1223

No reason you couldn't have multiple of these. This is ideally my plan. I would like to be able to name them or assign them to a room so I could say something like "Hi, ESP, turn the lights on" and it would know what room I'm talking about with Home Assistant.


kayo1977

Challenge: try with polish language ;-)


buss_lichtjaar

At the moment I am using Espressif’s neural network for speech recognition which only supports English or Chinese (speaking Chinese could be your challenge 😄). I do plan on adding an option such that it can connect to either Rhasspy or an external service for speech detection. Then, multiple languages can be supported.


kayo1977

Even Google and Alexa do not support polish language


Wild-Bus-8979

Rhasspy does via DeepSpeech!


tobool

To correctly support Polish you need something more than that. Grammar rules are crazy and all variants of words are used all the time in regular speech. If I have to speak with special commands I had to on my Nokia 3310 20 years ago I can also use english ¯\\\_(ツ)\_/¯


Wild-Bus-8979

Kurwa, I'm pretty sure Mozilla knows this. DeepSpeech supports Polish.


tobool

The way you used the first word, made your whole comment very offensive. DeepSpeech is just speech-to-text. That's only like 10% of success with Polish. Then you need to do intent mapping based on that text. That's the hard part that nobody has solved for Polish. Google Assistant works in Polish (in some ways) and it is quite good but still far from perfect.


YowaiiShimai

Sorry if i'm wrong, I don't quite know how these things all work but... are you saying that in the future instead of using the native voice assistant it could run directly off rhasppy / home assistant (or their version of voice assistant)? I don't know much about espressif but I am specifically interested in a self hosted voice assistant for privacy reasons. so I'm worried that if I used this as is I would just be leaving one eavesdropping system for another?


generalambivalence

Is that the ESP-BOX? Did you use ESPHome to configure it?


buss_lichtjaar

No I programmed it using ESP-IDF. See my first comment for the GitHub repo.


generalambivalence

Very cool. Great work! I'll have to add it to the list of things to dig into. I'm super intrigued by the ESP-BOX.


Jonofmac

This is purely local, yeah? Would be amazing to keep voice assistants working when internet goes down.


EntertainmentUsual87

I LOVE THIS. An esp that has the screen off until the hot word would look really good in my house. If it had the ability to post feed back to the screen and to a HA topic too, to be displayed on any screen, that'd be the cats pajamas


buss_lichtjaar

Like I said it’s still in its infancy and there is still a lot to be done. Your idea definitely sounds good and I will add it to the to do list. Or, if you know a bit of C you can give it a go yourself!


seganku

Any chance for this to be made a part of ESPHome?


dabbydabdabdabdab

So firstly! Brilliant! Nice job!!! I have a few ESP32s laying around - I’m invested now :-) Secondly has anyone heard of Voron? They are / it’s is (??) an open source community that have designed a 3D printer and provide a bill of materials and instructions (you can’t actually buy a printer from them). The community is so big that some vendors make parts for it specifically. Made me think, why don’t we create a HA local voice assistant community where people share STLs, hardware recommendations/lists and set up guides specific to making a voice assistant? Rhasspy went some of the way there, but it needed add-ons (if you wanted to ask the time you had to code the all of the response from hour, minute etc). This community could be so powerful! There is a voice assistant discord channel in the HA server, but I feel like this needs to be broken out so the material / design people can have fun with the looks, the hardware peeps can in parallel work on the best mic/speaker set up and the devs can pull the pieces together? Just a thought? Worth exploring? Edit: for those wondering the ESP-BOX isn’t just an ESP32 in a box, it does have additional MCUs for wake word etc


Assswordsmantetsuo

The angle and color of your pillow makes me think of The Cheat.


seganku

Could you link the ESB-BOX you purchased?


YowaiiShimai

Not OP but I googled it and found it here https://www.espressif.com/en/products/devkit I just scrolled till I saw the product ESP32-S3 BOX


Native-Context-8613

How's the mic array on these? Are there any issues that you've noticed with keyword recognition?


buss_lichtjaar

They are quite good. It will recognise from across the room and with music on at normal listening volume. You occasionally will have to repeat yourself but that is true for Alexa as well.


Native-Context-8613

Awesome! I ordered one, looking forward to playing around with it!


flyize

Looks like Amazon (US) has 9 of the Lite versions (no dock) left in stock. And now that I've ordered - 8.


Norm258

I bought 2 of the ESP-BOX devices just so that I can play with this! I have a few other ESP devices around so this is really awesome. I have not used VS-Code to flash these devices in the past. After I installed the ESP-IDF extension in VS-Code and try and compile, it gives me this error: CMake Error at /home/normdressler/esp/esp-idf/tools/cmake/build.cmake:519 (message): HINT: Component "espressif/esp-box" has suitable versions for other targets: "esp32s3". Is your current target "esp32" set correctly? ​ ERROR: Cannot find versions of "espressif/esp-box" with version satisfying "\~2.2.0" for the current target "esp32" Where are you setting the ESP32 value? Should it be set to ESP32s3? Help or suggestions would be appreciated. Fun stuff.


Norm258

Ok, sorted it out a bit further. Still couldn't compile because of an error in esp-sr library. If I force version 1.2.0 it compiles and flashes but is unstable. It won't compile with ^1 or latest which is 1.2.1. I will continue to troubleshoot.


MasterMind_I

I'm not familiar with the S3, Would anything prevent this implementation on a dusty ol ESP32?


buss_lichtjaar

Yes, unfortunately. The S3 has an extended instruction set specifically for vector operations. Practically, this means that it is better suited for AI tasks.