SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

Binyuan Huang1*†,    Yuqing Wen2*†,    Yucheng Zhao3*,    Yaosi Hu4*,    Yingfei Liu3 ,    Fan Jia3,    Weixin Mao3,    Tiancai Wang3‡,   Chi Zhang5,    Chang Wen Chen4,    Zhenzhong Chen1,    Xiangyu Zhang3   

1Wuhan University   2University of Science and Technology of China   3MEGVII Technology   4The Hong Kong Polytechnic University   5Mach Drive  

*Equal Contribution    This work was done during the internship at MEGVII    Corresponding author


🔥   Stronger Generative Scalability for Autonomous Driving   🔥

Overview of the proposed SubjectDrive framework and its effectiveness in enhancing BEV perception tasks. (a) Traditional data generation framework that uses the control sequence and sampling noise to generate synthetic data. (b) Compared with the traditional framework, our SubjectDrive introduces additional synthesis diversity by incorporating extra subject control. (c)-(d) Evaluation of detection and tracking performance with data scaling. (e) Illustration of using the SubjectDrive framework to produce perception training data in autonomous driving.

   Controllable and Multi-View Generative Framework With Subject Control For Autonomous Driving   

Detailed Content of SubjectDrive. (a). The diffusion training process of SubjectDrive, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.

🎬   Subject-Controlled Video Generation   🎬

Generate subject-controlled videos by SubjectDrive. Given the image of a reference subject, SubjectDrive can generate layout-aligned driving videos featuring the desired subject. By using reference subjects as control signals, SubjectDrive offers a mechanism for incorporating external diversity into the generated data.

🎬   Controllable Video Generation   🎬

Controllable multi-view videos generated by SubjectDrive. From this visualization, it is evident that our generated synthetic data closely aligns with the specified BEV conditions, showcasing superior layout control and alignment capabilities.

🎬   Consistent Multi-View Video Generation   🎬

Multi-view videos generated by SubjectDrive. For the six-view, eight-frame generated video, SubjectDrive produces temporally and view-consistent videos on the nuScenes validation set.

Contact

Feel free to contact us at huangbinyuan AT megvii.com or wangtiancai AT megvii.com

free web counter

Visitor Count