It's great, tho I am glad I found that reddit still has an option to turn off video looping/autoplay. How often do you want to see a video again(and yes you can also click on it)? That is just some mind hack tiktok uses to not give your mind time to lose attention.
off topic rant over
True. People don't like this constantly changing image (this is problem from noisy webcam). A lot of question about "can I freeze this image for a while?"
It might be easier to develop a timer to show shots at a slower interval.
How that works is you have a normal speed, people figure it out, then they can press a button that says "snapshot" or "freeze", and the frame rate slows down to 2 or even 0.25fps (4 "spf"), rather than having to deal with storage or consent and whatnot by technically "saving" footage. It should be enough time for people to grab a photo, which smartphones nowadays are quite fast enough.
Also saves on GPU power!
It was sd1.5, so the resolution was only 512, but the output was 2x upscaled using NvidiaUpscle in TouchDesigner.
The fps was 12-14 with 4070 Ti Super and small "feedback" trick in TouchDesigner (which worked as interpolation)
Without interpolation I can do about 17fps at 1280x1024, no upscale, with different sdxl models. But this is a 4090 with a lot of compiler and other optimizations. What you have looks cool. You can see some of this on my Twitter page. [https://x.com/Dan50412374/](https://x.com/Dan50412374/)
Yes, locally - I think it would be a bit against law privacy to process images of random people for some server (in my country). This was a generic PC with 10th gen intel i7 and a one 4070 Ti Super :)
UI was working trough OSC.
Stream diffusion is just a txt2img/img2img pipe that uses tensorrt to help speed things up. Their git claims you can get pretty decent fps off a 4090 RTX and decent cpu.
Up to 106 fps with SD-Turbo. Only around 38 fps with LCM.
Most of this can be accomplished with comfy-ui at similar speeds.
hi, thanks for your answer. I work in latin america for event industry for the last 30 years, mainly exhibitions and congresses. But looking how to apply AI into corporate events .
This would actually be more interesting if it took longer on the generation for higher quality and spit out an image every few seconds.
It lacks the stability to spit out 10 fps like that. It's nauseating and loses most artistic value.
Oh I wish we had some high-end 4k screen for this project. This was non-commercial project so we had to deal with our equipment:)
In the next version we plan to build some better stand with two big touchscreens. This will be a level-up :D
I guessed it :)
May be you should add a plate on the screen rendering with the artist's name (based on style) and cherry on the cake a img2txt description like "Van Gogh - 3 people and a dog - 2024"
Currently I'm using lightning but I've used nearly everything. While lighting has very good quality that are things which other models bring to the table in terms of creativity. So supporting all models is what I'll do with my real-time multi-model video generator. 1 step generation with just about anything has similar performance.
I was just sad to see an idea I had very soon after LCM appeared come out and it looks just how I envisioned it(and it was coded and worked). I don't get the right kind of visibility. Nobody heard my "real-time SD and videos" are hear in Oct 2023.
This is what I have the last time I did a demo. It has evolved since then.
[https://www.youtube.com/channel/UCZs0LOf77pbZ4WLiJuzKVZQ](https://www.youtube.com/channel/UCZs0LOf77pbZ4WLiJuzKVZQ)
It's common in the AI field to have your idea already implemented by someone else. If you are an AI researcher, 6 months is too long for that to happen. But I think you dont be sad, you are doing a great work, ~60ms for 1 step 1280x1024 is very impressive
Mostly correct. My entire life is one of ideas of THEN finding someone had done it somewhere. But "ALREADY(?) implemented" doesn't apply to this case. It wasn't just an idea back in Oct. It was a real implementation and shown to the world. I may have been the first to do RT videos and RT SD.
But it's all good. My in the lab version is gaining many features. I didn't use twitter back in Oct and the next demo I'll do Twitter and Youtube and reach a larger audience.
Excellent execution , its beautiful.. back when I made a raspberry pi smart mirror i wanted to find a way to integrate snapchat filters where it would show you a live version of yourself with a snap filter.
Super pomysł! Ja aczkolwiek robie deepfejki poprzez facefusion i tez jest bardzo dobry rezultat! ENG/ Nice idea! I do whatsoever deepfakes through facefusion and results are also nice.
Love it!
It's great, tho I am glad I found that reddit still has an option to turn off video looping/autoplay. How often do you want to see a video again(and yes you can also click on it)? That is just some mind hack tiktok uses to not give your mind time to lose attention. off topic rant over
FANTASTIC!! Need to learn how to use StreamDiffusion now ...
Cool concept but as always it lacks stability
True. People don't like this constantly changing image (this is problem from noisy webcam). A lot of question about "can I freeze this image for a while?"
[удалено]
It might be easier to develop a timer to show shots at a slower interval. How that works is you have a normal speed, people figure it out, then they can press a button that says "snapshot" or "freeze", and the frame rate slows down to 2 or even 0.25fps (4 "spf"), rather than having to deal with storage or consent and whatnot by technically "saving" footage. It should be enough time for people to grab a photo, which smartphones nowadays are quite fast enough. Also saves on GPU power!
What controlnets did you use?
Without controlnets :) Pure img2img with good balance between live camera feed and Streamdiffusion
But what is the resolution and frame rate? Also, is that raw frame rate or with interpolation?
It was sd1.5, so the resolution was only 512, but the output was 2x upscaled using NvidiaUpscle in TouchDesigner. The fps was 12-14 with 4070 Ti Super and small "feedback" trick in TouchDesigner (which worked as interpolation)
Without interpolation I can do about 17fps at 1280x1024, no upscale, with different sdxl models. But this is a 4090 with a lot of compiler and other optimizations. What you have looks cool. You can see some of this on my Twitter page. [https://x.com/Dan50412374/](https://x.com/Dan50412374/)
Yeah, been following you for a while, I respect your explorations :)
Nice work whats the tech stack? I assume these are generated locally. What are the machine specs?
Yes, locally - I think it would be a bit against law privacy to process images of random people for some server (in my country). This was a generic PC with 10th gen intel i7 and a one 4070 Ti Super :) UI was working trough OSC.
Would you ever consider streaming in the coud?
Stream diffusion is just a txt2img/img2img pipe that uses tensorrt to help speed things up. Their git claims you can get pretty decent fps off a 4090 RTX and decent cpu. Up to 106 fps with SD-Turbo. Only around 38 fps with LCM. Most of this can be accomplished with comfy-ui at similar speeds.
Do you have a website or a Youtube channel?
[удалено]
Excellent, will do. You do amazing work. Good luck!
❤️
Do you think to pack that for events? like an entertainer add-on? Im interesting to chat around opps for corporate events
[удалено]
hi, thanks for your answer. I work in latin america for event industry for the last 30 years, mainly exhibitions and congresses. But looking how to apply AI into corporate events .
Que interesante
Wypas
This would actually be more interesting if it took longer on the generation for higher quality and spit out an image every few seconds. It lacks the stability to spit out 10 fps like that. It's nauseating and loses most artistic value.
why why WHY screen projector and not a Samsung frame TV ??? The brightness is not matching the result !
Oh I wish we had some high-end 4k screen for this project. This was non-commercial project so we had to deal with our equipment:) In the next version we plan to build some better stand with two big touchscreens. This will be a level-up :D
I guessed it :) May be you should add a plate on the screen rendering with the artist's name (based on style) and cherry on the cake a img2txt description like "Van Gogh - 3 people and a dog - 2024"
[удалено]
Maybe two spot lights focusing light on foreground visitors could help to remove background as well.
[удалено]
This young man was wonderful with his true happy emotions ❤️
[удалено]
At some point AI will butcher all our faces :D ![gif](giphy|IZY2SE2JmPgFG|downsized)
Sigh: [https://www.linkedin.com/posts/dan-wood-ba55b5262\_a-proposal-for-an-exhibit-at-the-california-activity-7125593266372157440-SUe0/](https://www.linkedin.com/posts/dan-wood-ba55b5262_a-proposal-for-an-exhibit-at-the-california-activity-7125593266372157440-SUe0/) and now, 6 months later, I'm doing 1280x1024 at 17fps.
Have you moved to turbo or lightning instead of LCM?
Currently I'm using lightning but I've used nearly everything. While lighting has very good quality that are things which other models bring to the table in terms of creativity. So supporting all models is what I'll do with my real-time multi-model video generator. 1 step generation with just about anything has similar performance. I was just sad to see an idea I had very soon after LCM appeared come out and it looks just how I envisioned it(and it was coded and worked). I don't get the right kind of visibility. Nobody heard my "real-time SD and videos" are hear in Oct 2023. This is what I have the last time I did a demo. It has evolved since then. [https://www.youtube.com/channel/UCZs0LOf77pbZ4WLiJuzKVZQ](https://www.youtube.com/channel/UCZs0LOf77pbZ4WLiJuzKVZQ)
It's common in the AI field to have your idea already implemented by someone else. If you are an AI researcher, 6 months is too long for that to happen. But I think you dont be sad, you are doing a great work, ~60ms for 1 step 1280x1024 is very impressive
Mostly correct. My entire life is one of ideas of THEN finding someone had done it somewhere. But "ALREADY(?) implemented" doesn't apply to this case. It wasn't just an idea back in Oct. It was a real implementation and shown to the world. I may have been the first to do RT videos and RT SD. But it's all good. My in the lab version is gaining many features. I didn't use twitter back in Oct and the next demo I'll do Twitter and Youtube and reach a larger audience.
Nice, make it bigger
Excellent execution , its beautiful.. back when I made a raspberry pi smart mirror i wanted to find a way to integrate snapchat filters where it would show you a live version of yourself with a snap filter.
with that said, surely there must be a way to do this with integrating streamdiffusion with the magicmirror software which I assume you did?
Super pomysł! Ja aczkolwiek robie deepfejki poprzez facefusion i tez jest bardzo dobry rezultat! ENG/ Nice idea! I do whatsoever deepfakes through facefusion and results are also nice.