🤗 Paper Upvotes: 7 | cs.CV, cs.AI
Authors:
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
Title:
Scaling Properties of Diffusion Models for Perceptual Tasks
Arxiv:
http://arxiv.org/abs/2411.08034v2
Abstract:
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .
Fler avsnitt av Daily Paper Cast
Visa alla avsnitt av Daily Paper CastDaily Paper Cast med Jingwen Liang, Gengyu Wang finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
