Illumination-Invariant Non-Local Means Based Video Denoising

Publication Bibtex

@Inproceedings{RZL+2012,
        author = {Jie Ren, and Yue Zhuo, and Jiaying Liu, and Zongming Guo},
        title = {Illumination-Invariant Non-Local Means Based Video Denoising},
        booktitle = {Image Processing (ICIP), 2012 19th IEEE International Conference on},
        year = {2012}  }

Published on ICIP, September 2012.

Introduction

Image and video denoising is a long-standing research area in the image processing community. It aims to recover the high-quality clean image (sequence) from its noised version, which may be taken, for example, by a low-end imaging devices and/or under limited conditions. A key of image and video denoising is to exploit the prior information. For video denoising, which is the main focus of this paper, the temporal coherence of image sequence is an important ingredient of designing an efficient video denoising algorithm. To explore the temporal information in video denoising, motion compensated image sequence filters are proposed. However, motion estimation is a difficult problem mainly because the aperture problem on textureless regions. Filtering along the inaccurate trajectories can lead to blur and information loss.

Instead of relying on a robust and reliable motion estimation, the Non-Local Means (NLM) filtering was proposed by Buades et al.[1]. The idea of NLM is that the patches that have similar structure patterns can be spatially far from each other, and thus one can collect them in the whole image. In the original NLM video denoising framework and its variants, it is implicitly assumed that the ``intrinsic'' similar local structures in the image sequence have the coherent illumination condition. Therefore, intensities of image patch will not have significant change and can be directly utilized as the feature vector of local structure. However, it often occurs that the video scene has illumination changes during the capture process, e.g., the flashlight effect. As shown in Fig.1, when the neighboring frame has illumination condition changes, the patch matching may miss the ``intrinsic'' structural similar patches although they are very visually similar.

Fig. 1.  Non-local patch searching in a space-temporal volume consisting of search windows located in three adjacent frames, in which the rightmost frame has an illumination change due to the flashlight. The color boxes are similar patches to the current patch (yellow box) found by the non-local search. The second row shows the weight distribution of each temporal slice of the 3D space-temporal volume.

Therefore, we argue that NLM-based video denoising, especially when illumination changes are taken into account, indeed needs special handling to make it more robust. We use the NLM as the backbone of our system and propose several possible techniques to address the illumination invariance issue. Finally, by further analyzing and comparing these techniques, the histogram processing based technique is integrated into the NLM framework due to its robustness to the noise, scene changes and illumination changes.

Multi-frame NLM vs. Single-frame NLM

In [1], the NLM-based image denoising algorithm was extended to video denoising by aggregating patches in a space-temporal volume, which avoids the explicit motion estimation. Due to the temporal coherence of image sequence, more similar patches can be found than just searching within the current single frame. As shown in Fig.2, the performance of multi-frame NLM-based denoising performance (using the three adjacent frames) is much better than the performance of single-frame NLM-based denoising.

However, when the adjacent frames are having different illumination conditions, like the flashlight, the redundancy of similar patches founded by the original NLM-based method is suddenly reduced, as the example illustrated in Fig.1. It motivates us to further improve the NLM-based video denoising by making the similar patches searching much more robust to the illumination changes.

Fig. 2. PSNR curves of the multi-frame and single-frame NLM-based denoising methods on Crew sequence.

Illumination-Invariant Improvements Strategy

Based on the NLM-based framework, one can have three process scales: patch scale, search window scale, and frame scale. The patch scale has less pixels and therefore is more sensitive to the noise, while the frame scale captures the scene structures but may not be adaptive to the local illumination changes. The search window scale is a good tradeoff between the patch scale and frame scale.

Let S(i,t) be the search window centered at the pixel i in tth frame. S(i,t+k) denotes the search window in the kth adjacent frame. We define the filter F which processes S(i,t+k) to make it have a similar visual appearance to S(i,t). For convenience, let denote the minimum, maximum, and average intensity level of S(i,t).

We proposed three improvement strategies to implement the filter F:

(a). Direct linear mapping function (i.e., contrast-stretching transformation), ;

(b). Linear mapping function preserving the average intensity level of S(i,t), ;

.

(c). Histogram specification processing, .

where and are the probability density functions corresponding to intensity level r and z.

We verify the three strategies in different scenarios in Fig.3.

Fig. 3. Comparison of the filtering effects by three filtering methods. (a) search windows in three adjacent frames with illumination condition changes in the middle frame; (b) search windows in three adjacent frames without illumination changes.

In Fig.3(a), illumination change occurs at the $m$th frame caused by flashlight. The linear mapping function can adjust the local contrast, but can not guarantee to keep the same average intensity of the filtered image. Although addresses this problem, both of and are not robust to the local scene changes, e.g., in Fig.3(b), the bright pixels (within green circles) appear in the (n-1)th frame but disappear in the nth and (n+1)th frames. In this situation, and produce unsatisfactory filtering results as the majority of pixel intensities are scaled into low contrast due to the existence of a few bright pixel intensities.

In both cases, can produce plausible visual effects and fulfill the above requirements. Therefore, we select as the core block of our proposed denoising system. A processing example is shown in Fig.4, comparing the non-local search results before and after the histogram processing on the kth frame with flashlight effect.

Fig. 4. Comparison of non-local patch searching before and after filtering of the kth frame with flashlight.

Experimental results

Fig. 5. The PSNR gains of the proposed method (Hist-NLM) and the multi-frame NLM-based method over the single-frame NLM-based method on Crew sequence frame by frame. (=10)

Synthetic noise

OriginalNoisy(σe=10)
Non-Local MeansHist-NLM

For visual quality comparisons, under different noise levels,
σe=5σe=10 σe=15 σe=20 σe=25


σe=5

88th 89th (no flashlight) 90th
Original 89th frame Noisy σe=5
NLM denoised Hist-NLM denoised

PSNR=38.1452dB

PSNR=38.4282dB
99th 100th (with flashlight) 101th
Original 100th frame Noisy σe=5
NLM denoised Hist-NLM denoised

PSNR=37.3436dB

PSNR=37.6314dB
108th 109th (with flashlight) 110th
Original 109th frame Noisy σe=5
NLM denoised Hist-NLM denoised

PSNR=37.1847dB

PSNR=37.5285dB

σe=10

88th 89th (no flashlight) 90th
Original 89th frame Noisy σe=10
NLM denoised Hist-NLM denoised

PSNR=34.5351dB

PSNR=34.8459dB
99th 100th (with flashlight) 101th
Original 100th frame Noisy σe=10
NLM denoised Hist-NLM denoised

PSNR=33.7015dB

PSNR=33.9227dB
108th 109th (with flashlight) 110th
Original 109th frame Noisy σe=10
NLM denoised Hist-NLM denoised

PSNR=33.7479dB

PSNR=33.9682dB

σe=15

88th 89th (no flashlight) 90th
Original 89th frame Noisy σe=15
NLM denoised Hist-NLM denoised

PSNR=30.9084dB

PSNR=31.4788dB
99th 100th (with flashlight) 101th
Original 100th frame Noisy σe=15
NLM denoised Hist-NLM denoised

PSNR=30.8418dB

PSNR=31.3479dB
108th 109th (with flashlight) 110th
Original 109th frame Noisy σe=15
NLM denoised Hist-NLM denoised

PSNR=30.831dB

PSNR=31.3753dB

σe=20

88th 89th (no flashlight) 90th
Original 89th frame Noisy σe=20
NLM denoised Hist-NLM denoised

PSNR=29.6137dB

PSNR=30.2102dB
99th 100th (with flashlight) 101th
Original 100th frame Noisy σe=20
NLM denoised Hist-NLM denoised

PSNR=29.4825dB

PSNR=29.9174dB
108th 109th (with flashlight) 110th
Original 109th frame Noisy σe=20
NLM denoised Hist-NLM denoised

PSNR=29.3324dB

PSNR=29.9425dB

σe=25

88th 89th (no flashlight) 90th
Original 89th frame Noisy σe=25
NLM denoised Hist-NLM denoised

PSNR=28.4888dB

PSNR=29.1919dB
99th 100th (with flashlight) 101th
Original 100th frame Noisy σe=25
NLM denoised Hist-NLM denoised

PSNR=28.1683dB

PSNR=28.6627dB
108th 109th (with flashlight) 110th
Original 109th frame Noisy σe=25
NLM denoised Hist-NLM denoised

PSNR=27.9787dB

PSNR=28.6938dB

Real noisy video "Indoor" captured by Canon PowerShot-A570 under low light condition with flashlight effect.

Real Noisy frameNLM DenoisedHist-NLM denoised

References

[1] A. Buades, B. Coll, and J.M. Morel, “Denoising image sequences does not require motion estimation,” in AVSS 2005. IEEE Conference on Advanced Video and Signal Based Surveillance, Sept. 2005, pp. 70 – 74.

Back to Projects Page