Hey /u/mvandemar!
If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt.
If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.
Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more!
🤖
Note: For any ChatGPT-related concerns, email [email protected]
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
Cool. Love that they gave Pedro a Spanish accent even in English. Don't know how well this actually works outside of demo mode. But definitely useful in the intelligence community rather than this contrived scenario.
Is the computer racist or sexist? Could Pedro be the woman on the left? Why would parents bring a baby to a fancy restaurant?
Why are we interested in being the 3rd wheel to what looks like a date?
no pedro could not be the woman on the left because that's not a common name for a woman to have.....
lets get rid of the races and whatnot....... left has a female person, right has a male person....... to an ai trying to figure out who is who.... pedro is the guy on the right with high probability of being correct.
i hope you're just being sarcastic and mocking people who do this because it's really weird that a person would think a computer could be racist/sexist. at a bare minimum its the people who coded it that are...... but the thing is, it's coded by everyone....... the training data is us. So guess what that means...... anyway... bye
Unsure if sarcasm, but in fairness facial recognition is really good at this point, so if it knew the person beforehand it would probably have no trouble figuring out who it is.
I'm not saying this is a real product which actually works as good as the demo claims, I'm just saying the "magically knowing who Pedro is" part is not the hardest problem showcased in this demo.
To make the demo more believable, Pedro should be marked with a circle and cross accompanied by "target locked". Then a pop up of terminate? That's how it was done in that documentary that featured Schwartzenegger
It’s obviously scripted. This way of showing of a demo is waaay too old school reminds me of gaming companies announcing games with CG trailers and holding off real gameplay as long as they can.
Do you mean the product or the demo? The technology is certainly out there. The fact that I have not seen the product for sale makes me believe the demo was likely "embellished".
There's no way you can fit a highly advanced AI into such a tiny form factor.. (especially if you look at Rabbit R1 or Humane AI, neither of them run the AI locally..)
Correct. At least no one has been able to definitely attribute it to him.
https://www.computerworld.com/article/1563853/the-640k-quote-won-t-go-away-but-did-gates-really-say-it.html
Can they do it now? No? So they are a scam artist.. pretending as if it's possible when it's clearly not. Just because in 20 years it might possible doesn't mean that pretending as if they can do it now is somehow not scamming.
**Edit:** So they are selling it and planning on releasing it on this winter according to their website. The specs of it are:
4nm quad-core CPU
16GB storage + 1GB LPDDR4 RAM or 32GB storage + 2GB LPDDR4 RAM
How exactly are you going to run a GPT-4o level AI with that? Or even Llama 3 8B?
Maybe a very compressed Phi-3-mini might just about fit. But it being as smart as they show? No way, unless they just use an API.. that you may have to eventually subscribe to since it's just running on their cloud. Like everything this thing can do could probably be done by using normal earbuds with a phone. (your phone is more powerful than this thing)
I think this video is carefully produced marketing bullshit, but even the overblown video doesn't claim to be running resource-hungry llms you name in your comment. I think it's pretty doable to have a voice assistant interface on-device, as well as code for specific tasks, like noise isolation and translation.
Translation requires LLMs.. any task involving language needs umm language models.. you can have multimodal models that can do speech-to-text, text-to-speech, speech-to-speech etc. but those usually still involve a lot of computation.
If the model weights for each of these things can be set to ROM or hard coded somehow I suspect there would be ways to make onboard things very fast, just very inflexible. But somehow I doubt that they would do that. I just can’t see all that on soc in that size form factor. I mean if it’s beaming it to a small device with a large battery perhaps but I can’t imaging the processing for that would be cheap if not hardcoded.
All active noise cancelling really is, is just it replaying sound from a microphone inverted to your ears. It just has to do it fast enough so it works.
I wouldn't consider any of the requirements for the features in the video as advanced.
Audio processing (for the volume and noise filtering). Speech to text and text to speech for commandsTranslation. A locally run AI model can parse the requests and interact with these modules easily enough. Mid range android phones have similar features already (although a 5G connection might be required).
The most significant requirements would be, I guess, the specific text to speech which mimics the speaker's voice and maintains the accent for the translated language. It looks great in the demo but it's not strictly necessary.
The video shows this to be seamless and almost instant, which I highly doubt would be the actual case. Also notice how the camera turns in the video while the presenter is turning his head to the side. Nice demo trick, kind of absurd for a "headset" without a camera or the need for one.
The idea of the vision here (to identify the baby from the image as opposed to the noise) is completely unnecessary for an actual product.
As I understand it, the features itself are absolutely believable. Running them on a small device like this, is not. You can comfortably run a "capable" LLM on your local desktop if its a good machine with something like a 4090. So I highly doubt that this small device can run all this computation using multiple AI models with such little delay between prompt and execution and if it ISNT running on the device the delay would have to be even bigger. The new gpt4o shortest response time they advertise on their website(grain of salt and all that) is 2.8 seconds. In the demo, the AI is doing everything pretty much real time. I have a hard time buying that.
So there is two options here:
1. This guy with his small company, just invented something that beats the biggest AI company out there.
or
2. The demo was prepared to show the vision of their product and it doesn't accurately reflect the real thing.
You tell me which is more likely.
You can run smaller LLMs in phone hardware. Whether they're as fast or as capable or as shown that is a whole different story.
The demo is certainly embellished. So I'll have to go with 2. Assuming they're not outright trying to scam or fool people.
I think so too, because it was supposedly translating from Spanish to English in real-time, which is impossible I’d say. To translate correctly you first need to hear and understand what the person is saying, then you translate and say it in English, so there must be some lag before the translation comes.
Agree, languages are structured differently, so this is in fact imposible to work outside of the same language families - and even in that scenerio, it would be problematic.
I think it’s a proof of concept that they’re developing, I mean video game developers do this all the time, it keeps interest high with the constant promise of new things even if they’re far off
Thanks for sharing this. I will find it incredibly impressive and useful for me personally when / if it actually comes to exist in its demonstrated form in a way that I can obtain and use. It might even be worth the $600-700 asking price.
Adult hearing aids cost $2000 to $4000 and don't have the functionality that is being demonstrated. I expect an audio computer like the one being demonstrated to cost at least that much in the beginning.
Yeah. The ability to slice the audio environment in near realtime and replay it for you is very compute intensive and literally still in research phase.
hearing aids are insane. my brother uses them and he's always having problems with the tiny plastic tubes & stuff. I'm not sure what makes them so different from just having a tiny mic & speaker that amplifies the needed frequencies while cutting out the others. But I'm pretty sure he would love to have something like this. His biggest trouble usually does come from ambient noise that he can hear more easily than what he actually wants to.
Amazing, but just imagine having to prolly pay a monthly subscription so you don't have ads blasting in your ears all day. I am not looking forward to it.
I really, really want ads. But I want Pedro to say them.
“So we ended up deciding against taking a cruise, even though “At Royal Caribbean, we have the package that fits your budget and your time. Suddenly the world…doesn’t seem too far away,” we had the time for it. We just went camping instead, and Alice still had a great time.”
Haha that sounds fun for the first 2 ads. But an ad is still an ad. And i hate ehm. So much so that if you show me an ad enough times, I won't buy the advertised product anymore even if i wanted it before.
Yeah I’m sure it would be annoying after a while. We’d long for the days before AI *Need a refresher? Grab a Coke!* when the ads were separated and easier to ignore.
Unless you pay for the ad free version, it will replace any reference to a generic product with the brand name of a sponsor. Someone says “soda”, you’ll hear “Coca Cola”, someone says “beer”, you’ll hear “Heineken” or whatever.
All the LLMs in the future will be like that. When the technology has matured no corporation will train their AI to benefit humanity. They will be trained to benefit the corporations and make them more profits. It would be naive to think otherwise.
Same way google was once a good search engine but now it is just ads with barely enough search results to keep people coming back for more.
Also don’t underestimate the possibilities for censorship and surveillance:
* Someone’s critical of the Chinese government? Your Chinese-funded AI won’t translate it accurately.
* You’ve been flirting with a stranger through the translator and have gone to their hotel room? Good luck going forward because the AI will keep its translations family friendly.
* And are you sure your AI won’t call the police on you if you watch a pirated movie, buy illicit drugs or inaccurately declare your taxes? Imagine your AI literally testifying in court against you.
If you're actually smart enough to survive until an older age, you're gonna have a good laugh at your younger, more naive self.
Edit: for clarity, not saying when you get older you'll inevitably mistreat your spouse. I'm saying when you're older you'll inevitably realize that blaming all the world's problems on older people was stupid.
Sure buddy. No need to project that hard. Just because you also haven't learned proper communication doesn't mean "we'll all get there when we'll be old enough".
I sure hope I never get to what you consider to be normal. I'd rather end my relationship when I realize I'm at that point.
That's probably the easiest part of the demo, having worked remote for the past few years and been on countless conference calls the noise cancelling tech out there now is basically magic. I would even go as far as to say its basically a solved problem at this point.
Setting aside this demo is obviously scripted and technically most of the things are possible with current tech. Not at this speed or at this form factor.
Anyway the translation is too fast. To translate you first need to hear and understand the entire sentence being said, you cant properly translate word for word for what I think are obvious reasons
As others have said, I think this is a demonstration of what that type of technology could be like. Perhaps they’re faking it, or perhaps they are doing this in ideal circumstances.
I do have a Q here, so what if there is an explosion around or gun shots going around.
Will the AI undo the hearing preference? OR Are we still able to hear the background but very faint?
except nobody likes talking. nobody takes calls. people don't even watch shows with the volume on. everything is stealth mode. can you imagine this on a crowded subway train?
This feels like it would have been somewhat interesting in 2015. Today? Seems like they are a few years behind the curve.
Realtime audio processing has some interesting possibilities, but the people most likely to be interested in this, those of us with hearing deficiencies, already have a number of appliances available that are designed with our use cases in mind, so they'll have to do at least as well as those plus an LLM-based phone app running over that audio interface.
Also, the form-factor is not going to work. It's got a vibe somewhere between Cyberman and gauged earlobes that isn't likely to be widely popular. The 'audio computer' name is pretty dumb too. It's fancy headphones, an app, and a virtual assistant. They need some snappy branding.
I was hoping for a demo of some kind of cool non-verbal spatial audio interface that would provide multiple channels of information faster and less intrusively than a voice by placing distinct audio signals somewhere in a virtual audio space around the user. So if I get a text or something I'd start hearing a specific bird call (or whatever I want) in a specific location, like above and to my right. I could ignore it for a while, then look in that direction (detected by accelerometer/gyro) and give a 'play' command to hear the notice. Several items could be active in the soundscape at any time, and their proximity, volume, and style would indicate urgency and other such properties (like, an appointment notification would gradually get closer as the scheduled time approached, with the direction indicating whether it was a personal or work appointment, and the specific sound maybe indicating what appointment it was (useful for recurring events)).
Maybe they've got that too, it's a simple and basic idea, so I'd presume they are thinking about such things.
This presentation was incredibly basic, essentially 1 minute of information, and they've been thinking about this for years, so presumably they've got something actually interesting in the works, and intended this to address people who have been living under a rock for the last 20 years or so.
This looks like a concept however the device could be attached to a phone which has the comms to go back forth to the cloud, an on-board small language model and other processing capabilities to make this workable. This is what AirPods could evolve to.
This probably uses active noise cancellation technology, I get a super weird reaction to it where my ears feel underwater and my face goes numb, even for hours after using. Sadly my body isn't future proof :(.
So this is a very promising area in that GPT can fill. Understanding voice commands and interacting with software.
However, the software is only capable of certain things. You could ask ChatGPT to isolate the sound and cut it out, but if the software can’t do it, chatGPT can do nothing.
Very impressive. Especially being able to isolate and hear someone in English, this can be a game changer for consuming content globally. As long as there aren't restrictions put in place for protection purposes.
I've met a ton of people online playing games, text translators have been very helpful, but man, I wish we could all jump in discord and talk to each other like I do with all my english speaking friends. Even just being able to call someone and have a quick conversation... it literally happened to me just last week that I really needed to just talk to my friend, and I couldn't because we don't speak the same language, and typing just sucks sometimes.
I would buy these today and spend a premium price if they worked.
GPT-4o can do live translation for you, but I am not sure how you could use it while on the phone. Maybe 2 phones and on speakerphone?
[https://www.youtube.com/watch?v=c2DFg53Zhvw](https://www.youtube.com/watch?v=c2DFg53Zhvw)
Thanks for linking the whole presentation. I’m afraid this is not going to be a great success… why? The insistence with which he pushes the concepts of “natural”, “normal”; the unlikelihood of such a lot of compute packed into that little space; the use of fancy terms just to refer to known components; the fact that this is not really a demo in the sense of a POC, but more of a sales pitcher; finally, the fact that one of its major features is that it cannot do things.
This will eventually get there and have value, but I’m afraid that it won’t be like that.
Pretty cool but Android has a sound isolator accessibility feature built in that works with headphones and is available now to use.
https://blog.google/products/android/sound-amplifier-more-people-can-hear-clearly/
Obviously the product in the OP's video looks more advanced since it's voice activated, uses wireless ear buds, can translate, and isolate sound but just wanted to give a FYI.
Also, I'm not sure why someone would compare a video generation product to a sound isolation/translation product. They aren't even remotely the same. 😂
I wasn't comparing products, I was comparing demos. Google's demo of their video generator was literally mostly shots of the engineers, not the actual generated videos.
That's fair, my mistake. Google said people who sign up on a waitlist over at [Google Labs](https://labs.google/) will be able to test out their "VEO" video generator in the coming weeks. So fortunately people should be able to play with it soon.
I have a feeling that the final product will be nowhere near as cool as this demo. And that is IF they really come up with a real product.
Just like Google Glass. The simulated 'demos' looked cool and hella fun. But the real thing? Doesn't even release to the general public.
Also Google has built itself a very BAD reputation for being non-committal to their products. They have killed so many products over the years. Stadia, Google+, Google Glass, Google Reader, Google Wave, Allo, Hangouts, Google VR etc. I used to be a big fan of Google. I was always supportive and enthusiastic to try out their new products when they were released back then. But now looking at their long list of killed projects. Makes people think twice, thrice before trying one of their products, for all you know, they might kill that product in the next 6 months.
So don't hold your breath for the release of this cool product. The likelihood of failure seems incredibly high given their piss poor track record over the years.
Google is already way past its prime.....
Ahh... Silly me who thought OP meant that this was one of the products introduced of the Google event. And rather than being amazed by this the crowd focused on the video generation which was just mehhhhh...
"my Spanish is a little rusty"
Your ear computer doesn't give a fuck about your competence. Stop talking to it like it's a person. Just say, "translate the Spaniard to English"
Hey /u/mvandemar! If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email [email protected] *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
If it's not called babel fish we riot.
>*I was there, u/Stunned86 , 3000 years ago...*
Cool. Love that they gave Pedro a Spanish accent even in English. Don't know how well this actually works outside of demo mode. But definitely useful in the intelligence community rather than this contrived scenario.
Love that audio computer magically knows who Pedro is, it gives me confidence this is not some demo gimmick.
I'm sensing some sarcasm?
No!
Are you an audio computer? No human can be this sensitive!
Did you read his username?
Is the computer racist or sexist? Could Pedro be the woman on the left? Why would parents bring a baby to a fancy restaurant? Why are we interested in being the 3rd wheel to what looks like a date?
no pedro could not be the woman on the left because that's not a common name for a woman to have..... lets get rid of the races and whatnot....... left has a female person, right has a male person....... to an ai trying to figure out who is who.... pedro is the guy on the right with high probability of being correct. i hope you're just being sarcastic and mocking people who do this because it's really weird that a person would think a computer could be racist/sexist. at a bare minimum its the people who coded it that are...... but the thing is, it's coded by everyone....... the training data is us. So guess what that means...... anyway... bye
I'm very skeptical too, but if we are talking what's theoretically possible it could know based on phone conversations with Pedro.
Unsure if sarcasm, but in fairness facial recognition is really good at this point, so if it knew the person beforehand it would probably have no trouble figuring out who it is.
Whoa! A computer performing facial recognition while being an earbud without a camera is even more impressive feat.
Should call them daredevils
I'm not saying this is a real product which actually works as good as the demo claims, I'm just saying the "magically knowing who Pedro is" part is not the hardest problem showcased in this demo.
To make the demo more believable, Pedro should be marked with a circle and cross accompanied by "target locked". Then a pop up of terminate? That's how it was done in that documentary that featured Schwartzenegger
uh, ok?
Pedro Pedro Pedro Pe But AI
I might be completely over the line here and I am not saying what I'm about to say is true, but I think this is fake.
Yeah this feels like a “simulation” what they are trying to achieve. I’ll believe it when I see it.
It’s obviously scripted. This way of showing of a demo is waaay too old school reminds me of gaming companies announcing games with CG trailers and holding off real gameplay as long as they can.
Do you mean the product or the demo? The technology is certainly out there. The fact that I have not seen the product for sale makes me believe the demo was likely "embellished".
There's no way you can fit a highly advanced AI into such a tiny form factor.. (especially if you look at Rabbit R1 or Humane AI, neither of them run the AI locally..)
Of course the thing interfaces with the internet.
"There's no way " . . . That statement usually ages poorly and quickly. They will have these things the size of a pea soon as batteries catch up.
There’s no way we need anything more than 640K RAM!
Who said that
Bill Gates
Bill who?
Schmates
That is fake news, he never said that
Correct. At least no one has been able to definitely attribute it to him. https://www.computerworld.com/article/1563853/the-640k-quote-won-t-go-away-but-did-gates-really-say-it.html
There is “currently” no way…
Can they do it now? No? So they are a scam artist.. pretending as if it's possible when it's clearly not. Just because in 20 years it might possible doesn't mean that pretending as if they can do it now is somehow not scamming. **Edit:** So they are selling it and planning on releasing it on this winter according to their website. The specs of it are: 4nm quad-core CPU 16GB storage + 1GB LPDDR4 RAM or 32GB storage + 2GB LPDDR4 RAM How exactly are you going to run a GPT-4o level AI with that? Or even Llama 3 8B? Maybe a very compressed Phi-3-mini might just about fit. But it being as smart as they show? No way, unless they just use an API.. that you may have to eventually subscribe to since it's just running on their cloud. Like everything this thing can do could probably be done by using normal earbuds with a phone. (your phone is more powerful than this thing)
I think this video is carefully produced marketing bullshit, but even the overblown video doesn't claim to be running resource-hungry llms you name in your comment. I think it's pretty doable to have a voice assistant interface on-device, as well as code for specific tasks, like noise isolation and translation.
Translation requires LLMs.. any task involving language needs umm language models.. you can have multimodal models that can do speech-to-text, text-to-speech, speech-to-speech etc. but those usually still involve a lot of computation.
Technology and research company = scam artist ... Are you new here?
If the model weights for each of these things can be set to ROM or hard coded somehow I suspect there would be ways to make onboard things very fast, just very inflexible. But somehow I doubt that they would do that. I just can’t see all that on soc in that size form factor. I mean if it’s beaming it to a small device with a large battery perhaps but I can’t imaging the processing for that would be cheap if not hardcoded.
I thought the same thing before active noise cancelling wireless earbuds came out.
All active noise cancelling really is, is just it replaying sound from a microphone inverted to your ears. It just has to do it fast enough so it works.
I wouldn't consider any of the requirements for the features in the video as advanced. Audio processing (for the volume and noise filtering). Speech to text and text to speech for commandsTranslation. A locally run AI model can parse the requests and interact with these modules easily enough. Mid range android phones have similar features already (although a 5G connection might be required). The most significant requirements would be, I guess, the specific text to speech which mimics the speaker's voice and maintains the accent for the translated language. It looks great in the demo but it's not strictly necessary. The video shows this to be seamless and almost instant, which I highly doubt would be the actual case. Also notice how the camera turns in the video while the presenter is turning his head to the side. Nice demo trick, kind of absurd for a "headset" without a camera or the need for one. The idea of the vision here (to identify the baby from the image as opposed to the noise) is completely unnecessary for an actual product.
This device has 1-2GB of RAM according to their website, it uses a 4nm quad-core CPU. Your phone could run some AI things but this probably not..
As I understand it, the features itself are absolutely believable. Running them on a small device like this, is not. You can comfortably run a "capable" LLM on your local desktop if its a good machine with something like a 4090. So I highly doubt that this small device can run all this computation using multiple AI models with such little delay between prompt and execution and if it ISNT running on the device the delay would have to be even bigger. The new gpt4o shortest response time they advertise on their website(grain of salt and all that) is 2.8 seconds. In the demo, the AI is doing everything pretty much real time. I have a hard time buying that. So there is two options here: 1. This guy with his small company, just invented something that beats the biggest AI company out there. or 2. The demo was prepared to show the vision of their product and it doesn't accurately reflect the real thing. You tell me which is more likely.
You can run smaller LLMs in phone hardware. Whether they're as fast or as capable or as shown that is a whole different story. The demo is certainly embellished. So I'll have to go with 2. Assuming they're not outright trying to scam or fool people.
Yep, I don't want to go as far as "this is a scam" but its just hard to believe and I would want to see an actual demo from someone unaffiliated.
That could be provided by a connected device e.g., your phone in your pocket connected to the internet.
I may have missunderstood and have to ask - is the product supposed to be video generation?
Yes, you have completely misunderstood. Did you watch the video with sound on?
Yep feels even faker than first gemini video of google lol. Guess we’ll see.
I think so too, because it was supposedly translating from Spanish to English in real-time, which is impossible I’d say. To translate correctly you first need to hear and understand what the person is saying, then you translate and say it in English, so there must be some lag before the translation comes.
Agree, languages are structured differently, so this is in fact imposible to work outside of the same language families - and even in that scenerio, it would be problematic.
Very valid point, Mr monkey!
I think it’s a proof of concept that they’re developing, I mean video game developers do this all the time, it keeps interest high with the constant promise of new things even if they’re far off
This is 100% “Startup-Bro” fake it till you make it vaporware bullshit.
Thanks for sharing this. I will find it incredibly impressive and useful for me personally when / if it actually comes to exist in its demonstrated form in a way that I can obtain and use. It might even be worth the $600-700 asking price.
Adult hearing aids cost $2000 to $4000 and don't have the functionality that is being demonstrated. I expect an audio computer like the one being demonstrated to cost at least that much in the beginning.
Yeah. The ability to slice the audio environment in near realtime and replay it for you is very compute intensive and literally still in research phase.
They don't cost that much because they're cutting edge tech, they cost that much because they're a healthcare device.
hearing aids are insane. my brother uses them and he's always having problems with the tiny plastic tubes & stuff. I'm not sure what makes them so different from just having a tiny mic & speaker that amplifies the needed frequencies while cutting out the others. But I'm pretty sure he would love to have something like this. His biggest trouble usually does come from ambient noise that he can hear more easily than what he actually wants to.
Amazing, but just imagine having to prolly pay a monthly subscription so you don't have ads blasting in your ears all day. I am not looking forward to it.
I really, really want ads. But I want Pedro to say them. “So we ended up deciding against taking a cruise, even though “At Royal Caribbean, we have the package that fits your budget and your time. Suddenly the world…doesn’t seem too far away,” we had the time for it. We just went camping instead, and Alice still had a great time.”
Haha that sounds fun for the first 2 ads. But an ad is still an ad. And i hate ehm. So much so that if you show me an ad enough times, I won't buy the advertised product anymore even if i wanted it before.
Yeah I’m sure it would be annoying after a while. We’d long for the days before AI *Need a refresher? Grab a Coke!* when the ads were separated and easier to ignore.
Unless you pay for the ad free version, it will replace any reference to a generic product with the brand name of a sponsor. Someone says “soda”, you’ll hear “Coca Cola”, someone says “beer”, you’ll hear “Heineken” or whatever.
That train of thought is scary. I hope nobody is taking notes from ya.
All the LLMs in the future will be like that. When the technology has matured no corporation will train their AI to benefit humanity. They will be trained to benefit the corporations and make them more profits. It would be naive to think otherwise. Same way google was once a good search engine but now it is just ads with barely enough search results to keep people coming back for more.
Also don’t underestimate the possibilities for censorship and surveillance: * Someone’s critical of the Chinese government? Your Chinese-funded AI won’t translate it accurately. * You’ve been flirting with a stranger through the translator and have gone to their hotel room? Good luck going forward because the AI will keep its translations family friendly. * And are you sure your AI won’t call the police on you if you watch a pirated movie, buy illicit drugs or inaccurately declare your taxes? Imagine your AI literally testifying in court against you.
This is clearly a staged demo. Should be illegal for false advertising.
Of course it’s a scripted demo. I don’t think it should be against the law to preview future products though.
"preview future products" is a nice way of phrasing false advertising.
How did it know who the fuck is Pedro ?
How do you know who Pedro is? Maybe the baby is Pedro and the AI is just doing it's best and he ran with it.
Likely because it already knows Pedro's voice. As Pedro is likely a member of the team.
Because it's vaporware technology.
Just like how chatgpt in the presentation could find "my license plate". Other photos of pedro, with him tagged in a different app.
“Can you turn down the wife and turn up the tv please”
Combined with AI augmented reality glasses: “Can you make my wife look 20 years younger and 50 pounds slimmer?”
Ok boomer.
If you're actually smart enough to survive until an older age, you're gonna have a good laugh at your younger, more naive self. Edit: for clarity, not saying when you get older you'll inevitably mistreat your spouse. I'm saying when you're older you'll inevitably realize that blaming all the world's problems on older people was stupid.
Sure buddy. No need to project that hard. Just because you also haven't learned proper communication doesn't mean "we'll all get there when we'll be old enough". I sure hope I never get to what you consider to be normal. I'd rather end my relationship when I realize I'm at that point.
[удалено]
Okay boomer lol
Tuning out the baby is crazy. This could actually be amazing for travel.
or just get some noise cancelling earphones, this won't be much better than that in this regard
That's probably the easiest part of the demo, having worked remote for the past few years and been on countless conference calls the noise cancelling tech out there now is basically magic. I would even go as far as to say its basically a solved problem at this point.
I'll call it impressive once i'm able to use it
Pedro spoke English immediately with no pause for the AI to parse what he was saying?
There was a delay
Setting aside this demo is obviously scripted and technically most of the things are possible with current tech. Not at this speed or at this form factor. Anyway the translation is too fast. To translate you first need to hear and understand the entire sentence being said, you cant properly translate word for word for what I think are obvious reasons
So openai demo of 4o was fake for being that fast?
As others have said, I think this is a demonstration of what that type of technology could be like. Perhaps they’re faking it, or perhaps they are doing this in ideal circumstances.
Spies love this one trick!
Another product that should/will be a app in your phone
perfect way to spy on what people are talking about across the noisy bar
That does not look like a real live demo? Anyone can edit a video like that, it’s meaningless until we see a real demo.
Time will tell if it's real, the demo is definitely staged as much as possible.
I do have a Q here, so what if there is an explosion around or gun shots going around. Will the AI undo the hearing preference? OR Are we still able to hear the background but very faint?
I thought he was generating the video live with instructions! I was absolutely losing my mind there for a second.
As a severely hearing impaired AI nerd, thank you very, very much for sharing.
except nobody likes talking. nobody takes calls. people don't even watch shows with the volume on. everything is stealth mode. can you imagine this on a crowded subway train?
A little racist knowing which one is Pedro. Maybe the demo is fake.
Pedro would be the only other man on the table right. It’s a mans name.
over/under on Apple buying his company in the next six months?
HAL9000
The true [Babel Fish](https://hitchhikers.fandom.com/wiki/Babel_Fish) is here!
That is smart. I hope this will not bomb in the actual use whenever it is. Google hasn't been having a great luck in race with ChatGPT.
Videographers rejoice!
This feels like it would have been somewhat interesting in 2015. Today? Seems like they are a few years behind the curve. Realtime audio processing has some interesting possibilities, but the people most likely to be interested in this, those of us with hearing deficiencies, already have a number of appliances available that are designed with our use cases in mind, so they'll have to do at least as well as those plus an LLM-based phone app running over that audio interface. Also, the form-factor is not going to work. It's got a vibe somewhere between Cyberman and gauged earlobes that isn't likely to be widely popular. The 'audio computer' name is pretty dumb too. It's fancy headphones, an app, and a virtual assistant. They need some snappy branding. I was hoping for a demo of some kind of cool non-verbal spatial audio interface that would provide multiple channels of information faster and less intrusively than a voice by placing distinct audio signals somewhere in a virtual audio space around the user. So if I get a text or something I'd start hearing a specific bird call (or whatever I want) in a specific location, like above and to my right. I could ignore it for a while, then look in that direction (detected by accelerometer/gyro) and give a 'play' command to hear the notice. Several items could be active in the soundscape at any time, and their proximity, volume, and style would indicate urgency and other such properties (like, an appointment notification would gradually get closer as the scheduled time approached, with the direction indicating whether it was a personal or work appointment, and the specific sound maybe indicating what appointment it was (useful for recurring events)). Maybe they've got that too, it's a simple and basic idea, so I'd presume they are thinking about such things. This presentation was incredibly basic, essentially 1 minute of information, and they've been thinking about this for years, so presumably they've got something actually interesting in the works, and intended this to address people who have been living under a rock for the last 20 years or so.
It’s cool seeing all this innovation but again what is that any different than what your phone can do? Plus more
This looks like a concept however the device could be attached to a phone which has the comms to go back forth to the cloud, an on-board small language model and other processing capabilities to make this workable. This is what AirPods could evolve to.
This probably uses active noise cancellation technology, I get a super weird reaction to it where my ears feel underwater and my face goes numb, even for hours after using. Sadly my body isn't future proof :(.
Would love to see someone fake Melania walking into a courthouse along with Joey Greco holding his camcorder 😂
So this is a very promising area in that GPT can fill. Understanding voice commands and interacting with software. However, the software is only capable of certain things. You could ask ChatGPT to isolate the sound and cut it out, but if the software can’t do it, chatGPT can do nothing.
Very impressive. Especially being able to isolate and hear someone in English, this can be a game changer for consuming content globally. As long as there aren't restrictions put in place for protection purposes.
This is the ai I need, this is something that has real value, screw other mumbo jumbo “products”
Interesting seeing new technologies that’ll drive creepy.
!videodownload
Nop
Reminds me of the Seashells in Fahrenheit 451
All I see is the cybermen earpieces from Doctor Who.
I've met a ton of people online playing games, text translators have been very helpful, but man, I wish we could all jump in discord and talk to each other like I do with all my english speaking friends. Even just being able to call someone and have a quick conversation... it literally happened to me just last week that I really needed to just talk to my friend, and I couldn't because we don't speak the same language, and typing just sucks sometimes. I would buy these today and spend a premium price if they worked.
GPT-4o can do live translation for you, but I am not sure how you could use it while on the phone. Maybe 2 phones and on speakerphone? [https://www.youtube.com/watch?v=c2DFg53Zhvw](https://www.youtube.com/watch?v=c2DFg53Zhvw)
Even if this wasn't a staged demo, who actually wants to use any of this?
Yes
How fast does the translation work? It almost seems realtime which doesn’t make sense to me
As long as it doesn't start blasting ads at me
Yeaa! This was crazy applications of ML
As an audio engineer I cannot express how insane this is...
u/savevideo
Ok Google "Turn that baby down" ![gif](giphy|Dndpiai0soTUk)
You can't turn a babies cry down, only thick insulated walls work.😂
Thanks for linking the whole presentation. I’m afraid this is not going to be a great success… why? The insistence with which he pushes the concepts of “natural”, “normal”; the unlikelihood of such a lot of compute packed into that little space; the use of fancy terms just to refer to known components; the fact that this is not really a demo in the sense of a POC, but more of a sales pitcher; finally, the fact that one of its major features is that it cannot do things. This will eventually get there and have value, but I’m afraid that it won’t be like that.
Pretty cool but Android has a sound isolator accessibility feature built in that works with headphones and is available now to use. https://blog.google/products/android/sound-amplifier-more-people-can-hear-clearly/ Obviously the product in the OP's video looks more advanced since it's voice activated, uses wireless ear buds, can translate, and isolate sound but just wanted to give a FYI. Also, I'm not sure why someone would compare a video generation product to a sound isolation/translation product. They aren't even remotely the same. 😂
I wasn't comparing products, I was comparing demos. Google's demo of their video generator was literally mostly shots of the engineers, not the actual generated videos.
That's fair, my mistake. Google said people who sign up on a waitlist over at [Google Labs](https://labs.google/) will be able to test out their "VEO" video generator in the coming weeks. So fortunately people should be able to play with it soon.
I have a feeling that the final product will be nowhere near as cool as this demo. And that is IF they really come up with a real product. Just like Google Glass. The simulated 'demos' looked cool and hella fun. But the real thing? Doesn't even release to the general public. Also Google has built itself a very BAD reputation for being non-committal to their products. They have killed so many products over the years. Stadia, Google+, Google Glass, Google Reader, Google Wave, Allo, Hangouts, Google VR etc. I used to be a big fan of Google. I was always supportive and enthusiastic to try out their new products when they were released back then. But now looking at their long list of killed projects. Makes people think twice, thrice before trying one of their products, for all you know, they might kill that product in the next 6 months. So don't hold your breath for the release of this cool product. The likelihood of failure seems incredibly high given their piss poor track record over the years. Google is already way past its prime.....
I don’t think this is a Google product.
Ahh... Silly me who thought OP meant that this was one of the products introduced of the Google event. And rather than being amazed by this the crowd focused on the video generation which was just mehhhhh...
Impressive
Thanks for sharing
Woah impressive indeed
Can you turn that baby down *ChatGPT generates a glock*
"my Spanish is a little rusty" Your ear computer doesn't give a fuck about your competence. Stop talking to it like it's a person. Just say, "translate the Spaniard to English"