Neural Image Representations
for Multi-Image Fusion and Layer Separation

Seonghyeon Nam     Marcus A. Brubaker     Michael S. Brown
York University


We propose a framework for aligning and fusing multiple images into a single coordinate-based neural representations. Our framework targets burst images that have misalignment due to camera ego motion and small changes in the scene. We describe different strategies for alignment depending on the assumption of the scene motion, namely, perspective planar (i.e., homography), optical flow with minimal scene change, and optical flow with notable occlusion and disocclusion. Our framework effectively combines the multiple inputs into a single neural implicit function without the need for selecting one of the images as a reference frame. We demonstrate how to use this multi-frame fusion framework for various layer separation tasks.

Overview - Multi-Image Fusion

overview of multi-image fusion

Figure 1. Overview of the neural image representations for multi-image fusion. Assuming that f(x, y) learns a canonical view that summarizes all input images, the rendering of each image is formulated as a projection of the canonical view onto the view of the image. According to the assumption of the world, we use different parameterization of motion such as (a) homography-based neural representations, (b) occlusion-free flow-based neural representations, and (c) occlusion-aware flow-based neural representations. Unlike conventional multi-image fusion working on discrete 2D grids, our method fuses multiple images in a continuous image space. In addition, our method does not rely on a reference image manually selected among input images.

Visualization of Canonical View


Figure 2. Visualization of learned canonical view. We capture 9 consecutive images (left), and fit a homography-based neural representation to them. As can be seen, our method automatically stitches all the images in the canonical view (right) learned in the neural representation.

Overview - Multi-Image Layer Separation

overview of multi-image layer separation

Figure 3. Overview of our two-stream neural representations for multi-image layer separation. The goal of our method is to separate the underlying scene and interference moving differently in images into two layers stored in a different neural representation. To this end, we simultaneously train two neural image representations. f1 is parameterized by our homography or flow-based neural representations so as to learn the underlying scene moving according to the explicit motion model. In contrast, the interference layer that is difficult to be modelled by the motion model is stored in f2. The generic form of image formation is a linear combination of both networks, but varies according to tasks. We also use a few regularizations for optimization, which are described in detail in the paper.

Application 1: Reflection Removal

Input Li et al., 2013 Double DIP Liu et al., 2020 Ours

Figure 4. Comparison of refleciton removal methods on real images. We used the baseline results reproduced by this, where video results are not available.

Application 2: Fence Removal

Input Liu et al., 2020 Ours

Figure 5. Comparison of fence removal methods on real images. We used the baseline results reproduced by this, where video results are not available. Note that the gray pixels in the fence layer of Liu et al. indicate empty, which is same as the black pixels in our result.

Application 3: Rain Removal

Input FastDeRain Ours

Figure 6. Comparison of rain removal methods on real images. All results are visualized in videos.

Application 4: Moiré Removal

Input AFN C3Net Double DIP Ours

Figure 7. Comparison of moiré removal methods on real images. All results are in videos.