Miguel Angel BautistaPengsheng GuoSamira AbnarWalter TalbottAlexander ToshevZhuoyuan ChenLaurent DinhShuangfei ZhaiHanlin GohDaniel UlbrichtAfshin DehghanJosh Susskind

摘要
我们提出GAUDI,一种能够捕捉复杂且逼真三维场景分布的生成模型,该模型支持从运动相机视角进行沉浸式渲染。针对这一具有挑战性的问题,我们采用了一种可扩展且强大的方法:首先优化一个潜在表示,以解耦辐射场与相机位姿;随后,利用该潜在表示学习一个生成模型,从而实现三维场景的无条件生成与条件生成。与以往仅聚焦于单个物体的研究不同,GAUDI摒弃了“相机位姿分布可在不同样本间共享”的假设,显著提升了模型的泛化能力。实验结果表明,GAUDI在多个数据集上均实现了无条件生成任务的当前最优性能,并能够基于条件变量(如稀疏图像观测或描述场景的文本)实现三维场景的条件生成。
代码仓库
apple/ml-gaudi
官方
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-generation-on-arkitscenes | GRAF | FID: 87.06 FID (SwAV): 13.44 |
| image-generation-on-arkitscenes | π-GAN | FID: 134.8 FID (SwAV): 15.58 |
| image-generation-on-arkitscenes | GAUDI | FID: 37.35 FID (SwAV): 4.14 |
| image-generation-on-arkitscenes | GSN | FID: 79.54 FID (SwAV): 10.21 |
| image-generation-on-replica | GRAF | FID: 65.37 FID (SwAV): 5.76 |
| image-generation-on-replica | GSN | FID: 41.75 FID (SwAV): 4.14 |
| image-generation-on-replica | π-GAN | FID: 166.55 FID (SwAV): 13.17 |
| image-generation-on-replica | GAUDI | FID: 18.75 FID (SwAV): 1.76 |
| image-generation-on-vizdoom | GAUDI | FID: 33.7 FID (SwAV): 3.24 |
| image-generation-on-vizdoom | GSN | FID: 37.21 FID (SwAV): 4.56 |
| image-generation-on-vizdoom | GRAF | FID: 47.5 FID (SwAV): 5.44 |
| image-generation-on-vizdoom | π-GAN | FID: 143.55 FID (SwAV): 15.26 |
| image-generation-on-vln-ce | GSN | FID: 43.32 FID (SwAV): 6.19 |
| image-generation-on-vln-ce | π-GAN | FID: 151.26 FID (SwAV): 14.07 |
| image-generation-on-vln-ce | GAUDI | FID: 18.52 FID (SwAV): 3.63 |
| image-generation-on-vln-ce | GRAF | FID: 90.43 FID (SwAV): 8.65 |