profile image

Guangcong Zheng (郑光聪)

Contact Me
I am currently a Phd student in Zhejiang University and about to graduate at 2026.06.

Reviewr Experience:

CVPR, NeurIPS, SIGGRAPH Asia, ICML, ECCV, ICCV, ACM MM, ICLR, AISTATS, AAAI, ICPR

近期研究: DiT架构的相机运镜与转场可控的视频生成模型;浙大华为联合开源,持续维护首个动态场景、Metric Scale相机标注的高质量视频数据集RealCam-Vid.近期在研究FlashAttention,注意力稀疏化(CUDA底层实现)与并行推理的相关技术。

历史研究领域:

1. 视频与图像可控生成/编辑。 掌握主流图像/视频生成方法的多分辨率训练的工程经验以及adapter、lora、梯度反传、噪声反演等多种可控方法。

2. Transformer架构扩散模型研究:在Sora发布前,已有长时间的转向Transformer架构扩散模型的研究经历。掌握U-ViT、DiT、MaskDiT、离散的VQDiT、自回归的MaskGiT、LFQ压缩、VAE压缩、TikTok压缩等研究经历。熟练掌握RMSNorm, QK Norm,双向Attention, ZeroAdaLn, GemmaMlp, 噪声策略,eps/x0/v-prediction等关键技巧,有丰富的训练Pixart, DiT的工程经验。

3. 位置编码,尺度外推的相关研究:在RoPE未广泛应用到生成领域时,独立实现2D RoPE应用于DiT的图像生成,复现Training-free的尺度外推技术例如NTK RoPE, ReRoPE, Leaky ReRoPE等。

4. 加速:掌握通用蒸馏剪枝技术及diffusion领域特有的加速技术,在官方代码未开源前独立实现w-embedding CFG蒸馏,Consistency Model,LCM,PCM,UFOGen,DMD等1步/少步生成加速方法,参与并指导学弟发表加速工作Target-Driven Distillation,应用于小红书,huggingface模型已被下载5.35k次。

5. 3D生成:具有基于MVDream,Wonder3D,Zero123等diffusion-based的多视角生成、MeshGPT,LRM等One-step Feed-Forward的原生3D基础模型的较长时间的研究经历。

6. 强化学习:具有使用DPO、PPO对DiT模型进行不同时间步自动化最优CFG探索的研究经历。

7. 自动驾驶BEV:具有自动驾驶场景的4D高斯生成的少部分研究经历和BEV生成的研究经历。

Contributed to Open Source GitHub Repositories

rank #6 contributor of CogVideoX GitHub stars

Finetuning Code for CogVideoX (Video Diffusion Model)

LayoutDiffusion GitHub stars

the first opensourced layout-to-image generation diffusion model

RealCam-Vid GitHub stars

The first open-sourced large-scale and high-quality video dataset for camera control of dynamic scenes along with camera movement

RealCam-I2V GitHub stars

Camera-controlled Video Diffusion Model For Real-World Application

CamI2V GitHub stars

Camera-controlled Video Diffusion Model

ED-DPM GitHub stars

DDPM on image space, rank #1 in class-conditional image generation 128x128, 256x256 in year 2022

Publications [Full List]

(* equal contribution, # corresponding author)

Controllable Video Generation

teaser

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

Teng Li*, Guangcong Zheng*, Rui Jiang, Shuigenzhan, Tao Wu, Yehao Lu, Yining Lin, Xi Li#

arXiv preprint: 2502.10059
Project Page Paper (arXiv)  Codes  GitHub stars

teaser

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Guangcong Zheng*, Teng Li*, Rui Jiang, Yehao Lu, Tao Wu, Xi Li#

arXiv preprint: 2410.15957
Project Page Paper (arXiv)  Codes  GitHub stars

teaser

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Tao Wu, Yong Zhang, Xiaodong Cun, Zhongang Qi, Junfu Pu, Huanzhang Dou, Guangcong Zheng, Ying Shan, Xi Li#

https://arxiv.org/abs/2412.19645
Project Page Paper (arXiv)  Codes  GitHub stars

teaser

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Tao Wu, Yong Zhang, Xintao Wang, Xianpan Zhou, Guangcong Zheng*, Zhongang Qi, Ying Shan, Xi Li#

arXiv preprint: 2408.13239
AAAI, 2025.   Project Page Paper (arXiv)  Codes  GitHub stars

Controllable Image Generation

teaser

A survey of multimodal controllable diffusion models

Rui Jiang*, Guang-Cong Zheng*, Teng Li, Tian-Rui Yang, Jing-Dong Wang, Xi Li#

JCST, CCF B, 2024.  

teaser

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

Rui Jiang, Xinghe Fu, Guangcong Zheng, Teng Li, Taiping Yao, Xi Li#

AAAI, 2025.  

teaser

Layoutdiffusion: Controllable diffusion model for layout-to-image generation

Guangcong Zheng*, Xianpan Zhou*, Xuewei Li, Zhongang Qi, Ying Shan, Xi Li#

arXiv preprint: 2303.17189
CVPR, 2023.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation

Guangcong Zheng*, Shengming Li*, Hui Wang, Taiping Yao, Yang Chen, Shoudong Ding, Xi Li#

ECCV, 2022.   Project Page Paper (arXiv)  Codes  GitHub stars

Incremental Learning + Scene Graph Generation

Bird's Eyes View

teaser

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li#

CVPR 2024.