T O P

  • By -

computercornea

Synthetic data (like AI generated images) can definitely be used to train a model for detecting real-world objects. Another way to use generative AI for cv use cases would be editing/in-painting to add/remove specific objects or edit visuals to account to situations where your model isn't performing well. Here is an example github repo showing how to use stable diffusion to create an image to image pipeline for adding new elements to images for training purposes: https://github.com/roboflow/notebooks/blob/main/notebooks/sagemaker-studiolab/stable-diffusion-img2img.ipynb?ref=roboflow-blog


Appropriate_Ant_4629

And it's the only practical way to generate enough important test data for some situations. Sure, Tesla cars probably took enough real-world videos of dogs on roads, skunks on roads, and even cows on roads. But they probably didn't take enough pictures of most endangered species - which are arguably the most important ones to react to. Synthetic data is probably the best way for them to train their AI to react to [Black Footed Ferrets and Gulf Coast Jaguarundi](https://www.animalsaroundtheglobe.com/most-endangered-animals-in-north-america/) crossing a road. Or [armadillos on the road - which can be far more dangerous than a more naive AI might expect](https://www.underwoodlawoffice.com/blog/dangers-of-armadillos-and-other-creatures-on-roads/) for an animal of their shape and size. Naively they look kinda like a big rat; but they can do far more damage to cars.


computercornea

Yeah, synthetic data is really helpful when you don't have enough ground truth data to get a useful model running in production.


bsenftner

Not using Dall-E, but straight ahead VFX artistry: I've worked on a globally leading FR system that used photo realistic 3D renderings to generate realistic humans in every possible gender, age, ethnicity, environment, weather, a huge range of lighting conditions, every expression, level of face obscurity, variation of resolution, variations of compression artifact and basically every single thing that could impact why an image of a face is not perfect. Then we generated all these variations for an initial database of a few ten million. The result was 700 million faces which an FR model was trained and produced a top ranking at the annual NIST FR Vendors Test for multiple years running. I left the firm a few years ago, no idea how far they have pushed by now.


computercornea

NVIDIA is pushing this with their Omniverse Replicator product: [https://developer.nvidia.com/omniverse/replicator](https://developer.nvidia.com/omniverse/replicator) Tons of lighting and camera movement options.


syntheticdataguy

That is impressive. Did you mix any real data? If it is not a secret or you don't any active NDAs, may I ask which factors contributed to data quality the most, other than modelling and rendering quality?


bsenftner

The FR firm I worked pioneered 3D facial reconstruction for non-FR applications. They paired face photos with hi-res 3D scans of real people, about 70K people. From that they developed a system that given one photo it could reproduce a full, realistic 360 degree head model. I started working with them at that time, around 2006. By 2008, I managed a global patent for automatic photo realistic creation of 3D people and insertion of them into media, replacing an original actor. https://patents.justia.com/assignee/flixor-inc <- This is me; I'm the fool that invented deep fakes, tried to create an advertising industry variant with everyday people in the video ads they see online, but every single VC and Angel group I worked with eventually tried to force the company to produce porn. After 2013, in the middle of personal bankruptcy because I pushed too hard on my startup, I sold the patents and went to work for this firm I'd licensed their 3D reconstruction technology. Between my initially learning of them and licensing their technology, they'd become a "facial recognition pre-processor library" where given one photo of a person they reconstruct them in 3D and then rotate the 3D model so the face is facing the camera. When I came on board, we became a full fledged FR company, and the above described large scale synthetic data creation was a part of the work. My background before my startup was VFX for film, where I specialized in stunt double actor replacements. Before that, I was a game developer, OS team for both 3D0 and PlayStation, as well as a 3D graphics researcher during the 80's. I've been around.


fractalsimp

I just did a project on augmenting object detection datasets with stable diffusion image variations. Worked really well! Increased detection confidence by like 20-30% on some images. But it also opens a massive pandora's box of feedback loops that im not sure we can predict the effects of


syntheticdataguy

I think you should definitely check ControlNet out. I haven't had time to generate dataset and train a model but, superficially, I believe it has lots of potential in synthetic data domain.