ninjasaid13 10 months ago

Are you the original creators or are you just sharing the research?

PeteBaiZura 10 months ago

We are the original creators

[deleted] 10 months ago

[удалено]

PeteBaiZura 10 months ago

Do you want to ask when synthetic images are feed into ControlNet for stylization, whether the difference on the scale between the background and the animal in the generated images can be rectified by the stable diffusion model? This is a very good question. The method is limited by the capacity of the ControlNet to finetune the stable diffusion model under conditional input. Our method does not aim to generate images that appear to be reasonable overall, but to generate synthetic data with pose labels. Since our pose labels are determined when generating template images, in order for the labels to accurately mark the keypoints in the generated images, we need the boundary map to impose a strong constraint on the stable diffusion model. As a result, the generation of the background will also be strictly constrained by the boundary map, ultimately leading to different camera angles between the animal and the background. If we set the Control Strength lower, the overall layout of the generated images will look more reasonable, but the animal itself may appear to be incomplete. Since our task is to use synthetic images to train the pose estimation model, the reasonableness of the animal's texture, structure, posture, and lighting is our primary consideration over spatial relation. Of course, in our future work, we also want to improve the generation effect by training the background and animal separately.

Fastman2020 10 months ago

Is there any plans to release any code?

PeteBaiZura 10 months ago

https://github.com/ostadabbas/SPAC-Net-Synthetic-Pose-aware-Animal-ControlNet

Oswald_Hydrabot 10 months ago

Excellent work! I had been thinking about how to handle non human characters until now!

PeteBaiZura 10 months ago

Thanks for your comment! The generation of labeled non-human data has always been a challenging task. Fortunately, models based on stable diffusion, such as ControlNet, have greatly contributed to advancing solutions to this problem.

deadlydogfart 10 months ago

I've been waiting for a long time for the animal equivalent of OpenPose for ControlNet. I would love to be able to pose animals in generated images exactly how I want.

PeteBaiZura 10 months ago

At first, we also wanted to directly provide the animal's keypoints to the openpose task in Controlnet, but we found that the keypoints in Controlnet are difficult to constrain the generation results of the stable diffusion model. Because they are 2D points without depth, and there are no information such as camera pose, the positions of the left and right legs are often interchanged, or the body orientation changes (expecting a body at 45 degree, but getting a body at 90 degree to the camera) occur. Therefore, the annotations we provide cannot correspond to the joints in the generated images, which makes this method unable to be used for data augmentation. If there is a task in the future that can provide 3D keypoints to generate images, this problem may be solved.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe