DHAGrasp: Synthesizing Affordance-Aware Dual-Hand Grasps with Text Instructions


Quanzhou Li1    Zhonghua Wu2    Jingbo Wang3    Chen Change Loy1†    Bo Dai4

1S-Lab, Nanyang Technological University    2SenseTime Research;   3Shanghai AI Laboratory;   4The University of Hong Kong   

Corresponding author  




Abstract


Learning to generate dual-hand grasps that respect object semantics is essential for robust hand–object interaction but remains largely underexplored due to dataset scarcity. Existing grasp datasets predominantly focus on single-hand interactions and contain only limited semantic part annotations. To address these challenges, we introduce a pipeline, SymOpt, that constructs a large-scale dual-hand grasp dataset by leveraging existing single-hand datasets and exploiting object and hand symmetries. Building on this, we propose a text-guided dual-hand grasp generator, DHAGrasp, that synthesizes Dual-Hand Affordance-aware Grasps for unseen objects. Our approach incorporates a novel dual-hand affordance representation and follows a two-stage design, which enables effective learning from a small set of segmented training objects while scaling to a much larger pool of unsegmented data. Extensive experiments demonstrate that our method produces diverse and semantically consistent grasps, outperforming strong baselines in both grasp quality and generalization to unseen objects.


Method


Our work consists of two main phases: data generation and grasp synthesis. In the data generation phase, we first mirror right-hand grasps across the pseudo-symmetry plane of the object to obtain left-hand grasp proposals. By combining the right and left grasps, we then apply an energy-based optimization scheme to construct two-hand grasps. In the grasp synthesis phase, we propose a two-hand contact representation, as illustrated in the second box, and a two-stage grasp generation pipeline, outlined in the third box. In the first stage of the generator, we design a diffusion model Text2Dir, conditioned on the object and text, to predict the affordance directions of a grasp. The object and the predicted affordance directions are subsequently passed to our Dir2Grasp model to generate the final grasp.



Video




Qualitative results


Generated grasps on unseen objects with DHAGrasp.



Citation

@article{li2025dhagrasp,
  title     = {DHAGrasp: Synthesizing Affordance-Aware Dual-Hand Grasps with Text Instructions},
  author    = {Li, Quanzhou and Wu, Zhonghua and Wang, Jingbo and Loy, Chen Change and Dai, Bo},
  journal   = {arXiv preprint arXiv:2509.22175},
  year      = {2025},
  doi       = {10.48550/arXiv.2509.22175},
  url       = {https://doi.org/10.48550/arXiv.2509.22175}
}