VEGS : View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
ECCV 24

Sungwon Hwang*
KAIST
Min-Jung Kim*
KAIST
Taewoong Kang
KAIST
Jayeon Kang
Ghent University
Jaegul Choo
KAIST
(*: equal contribution)
icon-pdf   Paper
icon-arxiv   arXiv
icon-github   Code
icon-Data   Data (KITTI-360)

Our method achieves high-fidelity renderings from views distanced from training camera distribution.

Our method jointly reconstructs static scene with dynamic objects such as cars.

Paper Summary at a Glance


Problem Statement

We tackle the Extrapolated View Synthesis (EVS) problem on views such as looking left, right or downwards from train camera distributions.


Overall Pipeline

We initialize gaussian means using dense LiDAR map and point cloud from SfM. We leveraged prior scene knowledge, such as surface normal estimation and large-scale diffusion models, to improve rendering quality for EVS.



Covariance Guidance with Surface Normal Prior


The Lazy Covariance Optimization (LCO) Problem

    LCO problem refers to the case where the covariance is trained to cover the the frustum of a training pixel with a minimal optimization effort. As a result, these covariances are prone to produce unwanted cavity on an underlying scene surface.

Covariance Guidance Loss

Our key idea is to guide the orientation and shape of covariances to make them behave like the underlying scene surface. Specifically, we propose \(\mathcal{L}_{cov}\) =\(\mathcal{L}_{axis}\) + \(\mathcal{L}_{scale}\), where \(\mathcal{L}_{axis}\) aligns covariance axes to a surface normal vector and \(\mathcal{L}_{scale}\) minimizes the scale along the covariance axis aligned with surface normal

w/o Lcov VEGS
w/o Lcov VEGS
w/o Lcov VEGS


Score Distillation from Large-scale Diffusion Model


Noise (score) predicted from a diffusion model \( \textbf{s}_{\theta} \) is proportional to the log-gradient of a prior distribution \( p(\textbf{x}) \), or \( \textbf{s}_{\theta}(\textbf{x}_{\tau}, \tau) \approx - \nabla_{\textbf{x}} \text{log} p(\textbf{x}) \). Thus, optimizing \( \textbf{x}_{\tau} \) to yield smaller score pushes \( \textbf{x} \) to our prior distribution \( p(\cdot) \). We model our prior distribution using Stable Diffusion fine-tuned with LoRA.
w.o/ score loss w/ score loss
w.o/ score loss w/ score loss
w.o/ score loss w/ score loss


Comparison to Baseline


EVS-D and EVS-LR refers to extrapolated views facing downwards and left/right, respectively.

MARS BlockNeRF++ 3DGS VEGS
MARS BlockNeRF++ 3DGS VEGS
MARS BlockNeRF++ 3DGS VEGS
EVS-D
EVS-D
EVS-LR