UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

CVPR 2024

Junsheng Zhou1*      Weiqi Zhang1*      Baorui Ma1,2      Kanle Shi3      Yu-Shen Liu1      Zhizhong Han4

* Equal Contribution

1School of Software, Tsinghua University, 2BAAI,
3Kuaishou Technology, 4Wayne State University

Abstract

Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this work, we present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally. Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation. Specifically, instead of selecting an appropriate wavelet transformation which requires expensive manual efforts and still leads to large information loss, we propose a data-driven approach to learn the optimal wavelet transformation for UDFs. We evaluate UDiFF to show our advantages by numerical and visual comparisons with the latest methods on widely used benchmarks.

Method

Overview of UDiFF. (a) We propose a data-driven approach to attain the optimal wavelet transformation for UDF generation. We optimize wavelet filter parameters through the decomposition and inversion by minimizing errors in UDF self-reconstruction. (b) We fix the learned decomposition wavelet parameters and leverage it to prepare the data as a compact representation of UDFs including pairs of coarse and fine coefficient volumes. (c) is the architecture of the generator in diffusion models, where text conditions are introduced with cross-attentions. (d) The diffusion process of UDiFF. We train the generator to produce coarse coefficient volumes from random noises guided by input texts and train the fine predictor to predict fine coefficient volumes from the coarse ones. Follow the green arrows for inference, we start from a random noise and an input text to leverage the trained generator to produce a coarse coefficient volume. The trained fine predictor then predicts the fine coefficient volume. Together with the coarse one, we recover the UDFs with the fixed pre-optimized inversion wavelet filter parameters. Finally, we extract surfaces from UDFs and further texture them with the guiding text.

Visualization Results

Category Conditional Generations

Outfit Designs with UDiFF Garment Generations

Generation Results

DeepFashion3D

Category conditional generations.


ShapeNet dataset

Unconditional generations.

Comparison Results

BibTeX

@inproceedings{udiff,
      title={UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion},
      author={Zhou, Junsheng and Zhang, Weiqi and Ma, Baorui and Shi, Kanle and Liu, Yu-Shen and Han, Zhizhong},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2024}
    }