AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence

Zhang, Jiawei; Hu, Kaizhe; Huang, Yingqian; Ju, Yuanchen; Xue, Zhengrong; Xu, Huazhe

AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence

Jiawei Zhang¹, Kaizhe Hu^2,1, Yingqian Huang^1,3, Yuanchen Ju⁴, Zhengrong Xue^2,1, Huazhe Xu^2,1

¹Shanghai Qi Zhi Institute ²Tsinghua University ³Fudan University ⁴UC Berkeley

CVPR 2026

Paper Supplementary arXiv Video Code

AffordGen teaser showing large-scale affordance-aware demonstration generation

AffordGen scales a handful of source demonstrations into large, affordance-grounded manipulation datasets that generalize to unseen objects and new object categories.

Abstract

Despite the recent success of modern imitation learning methods in robot manipulation, their performance is often constrained by geometric variations due to limited data diversity. Leveraging powerful 3D generative models and semantic foundation models (VFMs), the proposed AffordGen framework overcomes this limitation by utilizing the semantic correspondence of meaningful keypoints across large-scale 3D meshes to generate new robot manipulation trajectories. This large-scale, affordance-aware dataset is then used to train a robust, closed-loop visuomotor policy, combining the semantic generalizability of affordances with the reactive robustness of end-to-end learning. Experiments in simulation and the real world show that policies trained with AffordGen achieve high success rates and enable zero-shot generalization to truly unseen objects, significantly improving data efficiency in robot learning.

Key Contributions

Uses affordance correspondence as a generative source for synthesizing semantically meaningful robot demonstrations.
Scales a few human demonstrations to thousands of trajectories across novel objects and full 6D pose variations.
Trains closed-loop visuomotor policies that improve unseen-object and cross-category generalization in sim and real.

Video Overview

Method Overview

AffordGen first decomposes a source demonstration, establishes semantic keypoint correspondence on 3D meshes, and then replays grasp and skill segments to generate diverse demonstrations for new objects.

AffordGen pipeline overview — **Pipeline.** The framework preprocesses a source demonstration, predicts affordance/function correspondences, and generates diverse in-category and cross-category demonstrations.

AffordGen keypoint correspondence illustration — **Affordance correspondence.** Key semantic points are transferred in canonical 3D space, enabling manipulation knowledge to move across object instances and categories.

Results

            24.1%
            average improvement in simulation
          

            24.3%
            average improvement in real-world tasks
          

            Zero-shot
            generalization to unseen and cross-category objects
          

In-category generalization results — **In-category generalization.** AffordGen consistently improves success on unseen objects over DemoGen and CPGen across teapot pouring, mug hanging, knife cutting, and shoe organizing.

Cross-category generalization results — **Cross-category transfer.** AffordGen transfers behaviors such as *Teapot → Mug*, *Mug → Handbag*, and *Knife → Saw*, where baselines struggle to achieve non-trivial success.

Qualitative Tasks

Each card shows paired simulation and real-world examples generated from the same manipulation objective.

In-Category Tasks

Teapot Pouring

Mug Hanging

Knife Cutting

Shoe Organizing

Cross-Category Transfer

Teapot → Mug

Simulation teapot to mug transfer — Simulation

Real-world teapot to mug transfer — Real world

Mug → Handbag

Simulation mug to handbag transfer — Simulation

Real-world mug to handbag transfer — Real world

Knife → Saw

Simulation knife to saw transfer — Simulation

Real-world knife to saw transfer — Real world

BibTeX

@article{zhang2026affordgen,
  title={AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence},
  author={Zhang, Jiawei and Hu, Kaizhe and Huang, Yingqian and Ju, Yuanchen and Xue, Zhengrong and Xu, Huazhe},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
  url={https://arxiv.org/abs/2604.10579}
}