Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

What problem with NeRFs does Mip-NeRF solve?


The original NeRF's rendering procedure can lead to aliasing artifacts when rendering content from multiple resolutions (e.g. zoomed in or out).

Example: Nerf (left) vs Mip-NeRF (right):

How does Mip-NeRF extend NeRF's representation?


Mip-NeRF extends NeRF to represent the scene at a continuously-valued scale, By efficiently rendering anti-aliased conical frustums instead of rays.

rays.png

In Mip-NeRF how are the conical frustrums modelled such that they can be efficiently used in NeRFs?


The conical frustrums are approximated with multivariate Gaussians. newplot_1.pngnewplot.png These gaussians are fully characterized by the mean distance along the ray μt\mu_t, the variance along the ray σt2\sigma^2_t and the variance perpendicular to the ray σr2\sigma^2_r. μt=tμ+2tμtδ23tμ2+tδ2\mu_t = t_\mu + \frac{2 t_\mu t_\delta^2}{3 t_\mu^2 + t_\delta^2} σt2=tδ234tδ4(12tμ2tδ2)15(3tμ2+tδ2)2\sigma_t^2 = \frac{t_\delta^2}{3} -\frac{4 t_\delta^4 (12 t_\mu^2 - t_\delta^2)}{15 (3 t_\mu^2 + t_\delta^2)^2} σr2=r2(tμ24+5tδ2124tδ415(3tμ2+tδ2))\sigma_{r}^2 = r^2 \left( \frac{t_\mu^2}{4} + \frac{5 t_\delta^2}{12} - \frac{4 t_\delta^4}{15 (3 t_\mu^2 + t_\delta^2)} \right)with tμ=(tstart+tend)/2t_\mu = (t_{\text{start}} + t_{\text{end}})/2 and tδ=(tend+tstart)/2t_\delta = (t_{\text{end}} + t_{\text{start}})/2 and the radius rr the width of the pixel in world coordinates scaled by 2/122/\sqrt{12}. Applying this to a ray gives the final guassian characteristics: μ=o+μtd,Σ=σt2(dd)+σr2(Iddtopd22)\mathbf{\mu} = \mathbf{o} + \mu_t\mathbf{d}, \Sigma = \sigma^2_t(\mathbf{dd}^\top) + \sigma^2_r \left( \mathbf{I} - \frac{\mathbf{dd}^top}{||\mathbf{d}||^2_2} \right)

See paper appendix for full derivation of the above formulas.

How are the multivariate Gaussians in Mip-NeRF converted to inputs for the NeRF?


The multivariate Gaussians are transformed to integrated positional encodings (IPE). This is the expected value of the positional encodings of samples from mulitvariate Gaussian. γ(μ,Σ)=ExN(μγ,Σγ)[γ(x)]\gamma(\boldsymbol{\mu}, \Sigma) = \mathbb{E}_{\mathbf{x} \sim \mathcal N(\boldsymbol{\mu}_\gamma, \Sigma\gamma)}\left[\gamma(\mathbf{x})\right]

γ(μ,Σ)=[sin(μγ)exp(12diag(Σγ))cos(μγ)exp(12diag(Σγ))]\gamma(\boldsymbol{\mu}, \Sigma) = \begin{bmatrix} \sin(\boldsymbol{\mu}_\gamma) \circ \exp\left(-\frac{1}{2} \operatorname{diag}\left(\Sigma_\gamma\right)\right) \\ \cos(\boldsymbol{\mu}_\gamma) \circ \exp\left(-\frac{1}{2} \operatorname{diag}\left(\Sigma_\gamma\right)\right)\end{bmatrix}

Machine Learning Research Flashcards is a collection of flashcards associated with scientific research papers in the field of machine learning. Best used with Anki or Obsidian. Edit MLRF on GitHub.