Nonlocal Based Super Resolution with Rotation Invariance and Search Window Relocation

Published on ICASSP, March 2012.

Overview

Multi-frame Super Resolution (SR) reconstruction aims to fuse a set of observed low resolution (LR) images into one high resolution (HR) image. Due to subpixel shifts, each observed LR image contains complementary information. With knowledge of the shifts, these LR images can be combined to remove the aliasing and generate a higher resolution image. In conventional multi-frame algorithms, it is essential to know the subpixel displacements between LR images. Thus, accurate motion estimation plays a critical role in conventional multi-frame SR. But unavoidable motion estimation errors lead to disturbing artifacts.

    To avoid motion estimation, Potter et al. [1] generalized Nonlocal means (NLM) from denoising algorithm to a motion estimation free SR method which averages neighbors by measuring patch similarity to reconstruct the center pixel. However, NLM SR only takes translational motion into consideration while natural videos usually contain complex motions, and even inside one frame, textured patches rotate in some places, which decrease the number of similar patches that NLM SR can find.

    In this work, to perform SR reconstruction, we aim to find similar patches of reference patch and then combine their information to reconstruct the center pixel of reference patch. To achieve our goal, we start from finding potential similar patches and then assign them suitable weights for average. We conclude that the reason of falling to find enough similar patches is complex motion or limitation of search window. Based on the analysis, we propose a novel method for rotation invariance (RI) similarity measure which involves local gradient and intensity information and block-based search window relocation (SWR).

Implementation

Search window relocation

Due to objects motion or camera motion, fixed search window fails to locate potential similar patches when objects are out of search window as the region highlighted with yellow rectangle shown in Fig.1. The proposed SWR approach uses a fixed search window size but centered at different location in different frames. To avoid block-matching search trapped into local minimum, we use predicted MV assuming that motion is continuous spatially and temporally. Fig.1 shows the location of search window of our algorithm and NLM SR in 15-th frame and 17-th frame. In 17-th frame, relocated search windows are highlighted with blue rectangle.

(a) (b)
Fig.1. Location of search windows in two frames: (a) Two search windows in 15-th frame (reference frame); (b) Corresponding search windows in 17-th frame.

Rotation invariance similarity measure

In this work, we extract local structure and intensity information to obtain a RI descriptor. Then, we measure similarities between patches based on their RI descriptors.

    To describe local structure, we simplify Scale-invariant feature transform (SIFT) [2] to obtain local RI structure descriptor. Local intensity descriptor involves the neighborhood intensity at each pixel. In particular, all the pixels within radius r that have the same Manhattan distance from the center pixel are grouped into one cluster. Then, we compute the mean of intensity of each cluster and get a vector of (r+1) elements at each pixel. Similarity measured by Gaussian function of structure distance and intensity distance is defined as follow:

w(k,l,i,j)= \frac{1}{C(k,l)}exp\bigg\{\frac{\|P(i,j,r)-P(k,l,r)\|_2^2}{\sigma_1^2}\bigg\}exp\bigg\{\frac{\|I(i,j,r)-I(k,l,r)\|_2^2}{\sigma_2^2}\bigg\},

where P(i,j,r) represents vector built by local gradient, I(i,j,r) represents vector built by local intensity and C(k,l) is the normalization constant defined as

C(k,l)=\sum_{i,j\in N(k,l)}exp\bigg\{\frac{\|P(i,j,r)-P(k,l,r)\|_2^2}{\sigma_1^2}\bigg\}exp\bigg\{\frac{\|I(i,j,r)-I(k,l,r)\|_2^2}{\sigma_2^2}\bigg\},

\sigma_1 and \sigma_2 control the effect of structure distance and intensity distance respectively. Fig.2 shows the most similar patches of center patch our algorithm and NLM can find in one image. It presents that by using the proposed similarity measure, rotated patches can be found which skipped by NLM.

(a) (b) (c)
Fig.2. Similar patches found by NLM and RI: (a) Center patch. (b) Similar patches that NLM can find. (c) Similar patches that can be found by using the proposed similarity measure.

    Since we explicitly separate local structure and intensity, \sigma_1 and \sigma_2 should be selected carefully to balance structure term and intensity term. In this work, we fix \sigma_2 and choose \sigma_1 adaptively according to the minimum distance of structure from reference patch. The calculating formula of \sigma_1 is defined as a piecewise function:

\sigma_1=\sigma_0+step\cdot \left \lfloor \frac{ \min_{i,j\in N(k,l)}\big\{\|P(i,j)-P(k,l)\|_2^2\big\} }{L} \right \rfloor,

where \sigma_0 is the initial value of \sigma_1, L is the piecewise length and step controls the increase rate of \sigma_1 when the minimum distance of structure from reference pixel increases. By using adaptive \sigma_1 selection, most mismatches can be eliminated.

Experimental results

The experiments are performed using MATLAB R2008a on Intel Core CPU 2.4GHz Microsoft Windows platform. All the tests are blurred using a 3×3 uniform mask, decimated by a factor of 1:3 (in each axis), and then contaminated by an additive noise with standard deviation 2.

    We first test on Monarch of 255×255 to perform single image SR without search window relocation to prove our RI similarity measure can find more similar patches so that eliminates block artifacts. Patch size is 7×7 for NLM SR, radius r is set to 3 for our algorithm and search window size is 21×21 for all the tests.

Monarch

Original
Bicubic (PSNR = 22.64dB) NLM SR (PSNR = 22.30dB)
RI (PSNR = 22.58dB) Adaptive RI (ARI) (PSNR = 23.00dB)

    Finally, we evaluate our algorithm on two real video sequences: Soccer and Ice. To accelerate computation, we only fuse 11 frames instead of whole frames to estimate one frame. Patch size is 13×13 for NLM SR, radius r is set to 6 for our algorithm and search window size is 27×27 for all the tests.

Soccer

Original Bicubic (PSNR = 28.4499dB)
NLM SR (PSNR = 27.8068dB) ARI-SWR (PSNR = 28.6145dB)

Ice

Original Bicubic (PSNR = 28.3791dB)
NLM SR (PSNR = 28.3717dB) ARI-SWR (PSNR = 28.8852dB)

Soccer sequence

Bicubic ARI-SWR

Ice sequence

Bicubic ARI-SWR

References

[1] M. Protter, M. Elad, H. Takeda, P. Milanfar, “Generalizing the Nonlocal-Means to Super-Resolution Reconstruction”, IEEE Trans. on Image Processing, vol.8, no.1, pp. 36-51, 2009.

[2] D. G. Lowe, “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.