AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence

1Shanghai Qi Zhi Institute 2Tsinghua University 3Fudan University 4UC Berkeley
CVPR 2026
AffordGen teaser showing large-scale affordance-aware demonstration generation

AffordGen scales a handful of source demonstrations into large, affordance-grounded manipulation datasets that generalize to unseen objects and new object categories.

Abstract

Despite the recent success of modern imitation learning methods in robot manipulation, their performance is often constrained by geometric variations due to limited data diversity. Leveraging powerful 3D generative models and semantic foundation models (VFMs), the proposed AffordGen framework overcomes this limitation by utilizing the semantic correspondence of meaningful keypoints across large-scale 3D meshes to generate new robot manipulation trajectories. This large-scale, affordance-aware dataset is then used to train a robust, closed-loop visuomotor policy, combining the semantic generalizability of affordances with the reactive robustness of end-to-end learning. Experiments in simulation and the real world show that policies trained with AffordGen achieve high success rates and enable zero-shot generalization to truly unseen objects, significantly improving data efficiency in robot learning.

Key Contributions

  • Uses affordance correspondence as a generative source for synthesizing semantically meaningful robot demonstrations.
  • Scales a few human demonstrations to thousands of trajectories across novel objects and full 6D pose variations.
  • Trains closed-loop visuomotor policies that improve unseen-object and cross-category generalization in sim and real.

Video Overview

Method Overview

AffordGen first decomposes a source demonstration, establishes semantic keypoint correspondence on 3D meshes, and then replays grasp and skill segments to generate diverse demonstrations for new objects.

AffordGen pipeline overview
Pipeline. The framework preprocesses a source demonstration, predicts affordance/function correspondences, and generates diverse in-category and cross-category demonstrations.
AffordGen keypoint correspondence illustration
Affordance correspondence. Key semantic points are transferred in canonical 3D space, enabling manipulation knowledge to move across object instances and categories.

Results

24.1% average improvement in simulation
24.3% average improvement in real-world tasks
Zero-shot generalization to unseen and cross-category objects
In-category generalization results
In-category generalization. AffordGen consistently improves success on unseen objects over DemoGen and CPGen across teapot pouring, mug hanging, knife cutting, and shoe organizing.
Cross-category generalization results
Cross-category transfer. AffordGen transfers behaviors such as Teapot → Mug, Mug → Handbag, and Knife → Saw, where baselines struggle to achieve non-trivial success.

Qualitative Tasks

Each card shows paired simulation and real-world examples generated from the same manipulation objective.

In-Category Tasks

Teapot Pouring

Simulation teapot pouring
Simulation
Real-world teapot pouring
Real world

Mug Hanging

Simulation mug hanging
Simulation
Real-world mug hanging
Real world

Knife Cutting

Simulation knife cutting
Simulation
Real-world knife cutting
Real world

Shoe Organizing

Simulation shoe organizing
Simulation
Real-world shoe organizing
Real world

Cross-Category Transfer

Teapot → Mug

Simulation teapot to mug transfer
Simulation
Real-world teapot to mug transfer
Real world

Mug → Handbag

Simulation mug to handbag transfer
Simulation
Real-world mug to handbag transfer
Real world

Knife → Saw

Simulation knife to saw transfer
Simulation
Real-world knife to saw transfer
Real world

BibTeX

@article{zhang2026affordgen,
  title={AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence},
  author={Zhang, Jiawei and Hu, Kaizhe and Huang, Yingqian and Ju, Yuanchen and Xue, Zhengrong and Xu, Huazhe},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
  url={https://arxiv.org/abs/2604.10579}
}