Aalto computer scientists in CVPR 2024
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. The conference is held on 17-21 June, 2024 at the Seattle Convention Center.
The paper "Analyzing and Improving the Training Dynamics of Diffusion Models" was also selected for Oral presentation (top 1% of all submissions).
Accepted papers
In alphabetical order. Click the title to see the authors and the abstract.
Selected for Oral presentation - top 1% of all submissions
Authors
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine
Abstract
Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling.
As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.
Authors
Shuzhe Wang, Juho Kannala, and Daniel Barath
Abstract
Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms often compromise on performance, resulting in a significant deterioration compared to their descriptor-based counterparts. In this paper, we introduce DGC-GNN, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. Our procedure encodes both Euclidean and angular relations at a coarse level, forming the geometric embedding to guide the point matching. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods.
Department of Computer Science
We are an internationally-oriented community and home to world-class research in modern computer science.
School of Science
Science for tomorrow’s technology, innovations and businesses
Read more news
Research Council of Finland establishes a Center of Excellence in Quantum Materials
The Centre, called QMAT, creates new materials to power the quantum technology of coming decades.
Major funding powers development of next-generation machine technology aimed at productivity leap in export sectors
The BEST research project is developing new types of sealing, bearing, and damping technology.
The TAIMI project builds an equal working life – a six-year consortium project seeks solutions to recruitment and skill challenges
Artificial intelligence (AI) is changing skill requirements, the population is aging, and the labor shortage is deepening. Meanwhile, the potential of international experts often remains unused in Finland. These challenges in working life are addressed by the six-year TAIMI project funded by the Strategic Research Council, and implemented by a broad consortium.