Context-Aware Sparse Decomposition for Image Denoising and Super-Resolution

Publication Bibtex

@Inproceedings{RLG+2013,
author = {Jie Ren, and Jiaying Liu, and Zongming Guo},
title = {Context-Aware Sparse Decomposition for Image Denoising and Super-Resolution},
booktitle = {IEEE Transactions on Image Processing},
year = {2013} }

Published on TIP, April 2013 and ICPR, November 2012.

Resources

Paper: pdf

Demo: MATLAB

Introduction

Image restoration (IR) aims at recovering an original image $x$ (\ie, high-resolution, clean image) from its degraded counterpart $y$. The mathematical degradation model for the IR problem is typically formalized by $$\begin{equation}\label{eqn:IR} y = Hx + v, \end{equation}$$ where $H$ is a degradation matrix, and $v$ is assumed as a zero-mean Gaussian noise $\mathcal{N}(0,\sigma^2)$. When $H$ is an identity matrix, the IR problem in (\ref{eqn:IR}) is a denoising problem; when $H$ is a downsampling matrix, it corresponds to an interpolation or super-resolution problem.

From a statistical point of view, image restoration is an inverse problem which is to estimate $x$ given $y$, maximizing the probability $\Pr(x|y)$. Based on the Bayes' rule, maximum a posterior (MAP) estimator can be obtained by: $$\begin{equation} \hat{x} = \arg\max_{x} \Pr(x|y) = \arg\max_{x} \Pr(y|x)\Pr(x), \end{equation}$$ where $\Pr(y|x)$ represents the data penalization or likelihood term, while $\Pr(x)$ models the prior information that we prefer the target original image $x$ to be. Due to the ill-posedness of IR problem, one of the most important issues is to model a priori knowledge of natural images to regularize the IR inverse problem.

In this paper, we propose a context-aware sparsity prior to explicitly model the structural correlations of neighboring patches. In contrast to the previous work that dealt with similar issue in the specific wavelet transform domain [8], we analyze and address this problem in a more general and flexible framework of sparse representations. The main contributions of our work are summarized as follows:

Context-aware sparsity prior is proposed to model the correlations between sparsity patterns of neighboring patches. The proposed sparsity prior explicitly models the contextual information of local image structure. Therefore, it improves the capability of sparsity-based regularization for the IR problem.
MRF-based global restoration framework is proposed to tune the local sparsity prior into a global one in order to deal with natural images of arbitrary size.
An iterative numerical solution to the MRF-based global optimization problem is presented to approximate the exact estimation which is computationally intractable in practice. The proposed algorithm adopts an adaptive signal recovery scheme, which updates model parameters update and recover the sparsity patterns alternately.

CONTEXT-AWARE SPARSE DECOMPOSITION

A. Context Correlation Analysis of Sparse Representations

Sparse coding of image patches extracted from the whole image can be seen as filtering of the image with a set of filters. Similar to the wavelet transform, casual observation indicates that the sparse coefficients (or large amplitudes) of every patch are sparsely distributed throughout the whole image and often tend to occur in clusters (e.g., at edges and within textures as illustrated in Fig. 1). Meanwhile, clusters of different coefficient distributions seem not to be fully independent each other, such as the distribution clusters of coefficient index $5$ and $10$, $15$ and $29$, $53$ and $142$. It implies that different structural filters (or atoms of a dictionary) are highly correlated in the spatial neighborhood.

Fig. 1. Illustration of the spatial distributions of dictionary atom coefficients on image Boats. (a) Boats image. (b) Learned dictionary atoms (16 selected out of 256). (c) Spatial distributions of corresponding atom coefficients.

The main reason for the above phenomena is attributed to the regularity of natural image structures.

In general, local structures of natural images have the properties of visual continuity and consistency. We find that these properties may not necessarily appear in the image domain but commonly in the principal components domain, i.e., the dictionary atoms domain. Therefore, it is necessary to decompose the image into a few of dictionary atoms, and analyze the image regularity in the dictionary atoms domain.

We proceed with the following experiments to motivate the necessary of modeling the spatially probabilistic dependencies between dictionary atoms. The dictionary is adaptively learned from the data itself by K-SVD algorithm [27]. We extract a set of patches of size 8-by-8 from noise-free and high-resolution natural images. The extracted patches are overlapped with a few pixels in adjacent locations. For each patch, sparse representations over an adaptive learned dictionary of size 64-by-256 are obtained using the OMP algorithm. Specifically, the model error is set to $\sigma=1$, so that the OMP algorithm stops when the residual error falls below $\epsilon = 1.15\cdot\sqrt{64}\cdot\sigma$. Then the sparsity pattern of each patch is obtained from the computed representations. Let $S$ and $S_{\diamond {t}}$ ($t=1,\ldots,T$) represent the sparsity patterns of center position patch and its neighbor in the $t$-th orientation, respectively (the spatial configuration is shown in Fig. 2). We take all pairs of sparsity patterns as samples for $S$ and $S_{\diamond {t}}$. The $i$-th element of $S$, and the $j$-th element of $S_{\diamond {t}}$ are represented as $S_i$ and $S_{j\diamond {t}}$, $i,j = 1,\ldots,m$, respectively. Then, the empirical marginal distributions for each $S_i$, $\Pr(S_i = 1)$, and for all pairs of atoms at neighboring spatial position with different orientations, $\Pr(S_i=1,S_{j\diamond {t}}=1)$, are computed. The empirical conditional probability $\Pr(S_i=1|S_{j\diamond {t}}=1)$ is computed by $\displaystyle \Pr(S_i=1|S_{j\diamond {t}}=1) = \frac{\Pr(S_i=1,S_{j\diamond {t}}=1)}{\Pr(S_{j\diamond {t}}=1)}$.

Fig. 2. The local neighborhood system of patch $x^k$. In this work, we set $T=8$ for spatial configuration with eight different orientations.

To examine the dependencies between pairs of atoms at neighboring spatial position, we compute the following three-dimension dependency matrix value: $$\begin{equation}\label{eqn:Q} Q^{(t)}_{i,j} = \left|\log_{10}\left(\frac{\Pr(S_i=1|S_{j\diamond {t}}=1)}{\Pr(S_i=1)}+\delta\right)\right|. \end{equation}$$ Therefore, if $S_i$ and $S_{j\diamond {t}}$ are independent, then $\Pr(S_i=1|S_{j\diamond {t}}=1)$ is approximately equal to $\Pr(S_i=1)$, and the value of (\ref{eqn:Q}) is near zero. Otherwise, $\Pr(S_i=1|S_{j\diamond {t}}=1)$ is near zero or one, then the value of (\ref{eqn:Q}) is approaching to 1. In this experiment, we set $\delta=0.1$, so that for $\Pr(S_i=1|S_{j\diamond {t}}=1)=0$, the value of $Q^{(t)}_{i,j}$ is $1$.

The simulation results on the image ``Boats'' are shown in Fig. 3. For better visualization, the three-dimensional matrix $Q$ is split into eight orientations, and each slice of $Q$ in the corresponding orientation is displayed. In these figures, the black color corresponding to near-zero values is dominant. It implies that most of the atom pairs at neighboring position are nearly independent. However, there are some pairs exhibiting significant dependencies, particularly near the main diagonal. The correlations between sparsity patterns of neighboring patches in different orientations reveal another regularity of natural images.

Fig. 3. Visualization of the computed dependency matrix $Q$ on image ``Boats''. The three-dimension matrix $Q$ is split into eight orientations, and each slice of $Q$ corresponds to one orientation.

B. Context-Aware Sparsity Prior

The significant dependencies exhibited among the atom pairs can be utilized as another level of image regularity beyond the sparsity prior. An important issue is to model the spatial correlation between atoms adaptively, \ie, we need to decide which atom pairs should have strong correlations and make the others have weak or no correlations.

Motivated by the BM model, we propose a new prior for the sparsity patterns of sparse representations. The proposed sparsity prior explicitly models the atom interactions between neighboring patches. It contains two components including the context-aware part and the sparsity part. Therefore, we denote the new prior model as {context-aware sparsity prior} for the combination. More specifically, given the orientated neighboring sparsity patterns $\{S_{\diamond {t}}\}_{t=1}^T$, we define the context-aware energy $E_c(S)$ by $$\begin{equation} E_c(S) = -\sum_{t=1}^T{S^TW_{\diamond {t}}S_{\diamond {t}}}, \end{equation}$$ where the matrix $W_{\diamond {t}}$ captures the interaction strength between dictionary atoms in the $t$-th orientation. The interaction strength of each element in $W_{\diamond {t}}$ relies on the co-occurrence probability between dictionary atoms in the corresponding orientation, which should be adaptive to the image content as well. In Section A, we present a maximum likelihood estimator to obtain the optimal parameters $W_{\diamond {t}}$ in an adaptive scheme. Meanwhile, we also include the sparsity penalty energy $E_s(S)$, $$\begin{equation} E_s(S) = -S^Tb. \end{equation}$$ The total energy $E_{total}(S)$ for each sparsity pattern is the sum of two parts, i.e., $E_{total} = E_c(S) + E_s(S)$. Then, the prior probability is formalized by using $E_{total}$, $$\begin{equation}\label{eqn:ContextPrior} \begin{aligned} \Pr(S) \propto& \exp\left({-E_{total}}\right) \\ \propto& \exp\left(S^T \left(\sum_{t=1}^T{W_{\diamond {t}}S_{\diamond {t}}}+b \right)\right). \end{aligned} \end{equation}$$

Let $\tilde{W} = \left[W_{\diamond1},\dots,W_{\diamond J}\right]$, and $\displaystyle\tilde{S} = \left[(S_{\diamond1})^T,\dots,(S_{\diamond J})^T\right]^T$, then (\ref{eqn:ContextPrior}) can be expressed in a clearer form, $$\begin{equation}\label{eqn:ContextPrior2} \Pr(S) = \frac{1}{Z(\tilde{W},b)}\exp\left(S^T\left({\tilde{W}\tilde{S}}+b\right)\right), \end{equation}$$ where $\tilde{W}$, $b$ are model parameters, and $Z(\tilde{W},b)$ is the partition function for normalization. Compared to the BM model, the proposed prior model places more emphasis on the dependencies of atoms in the spatial context. In fact, we can combine the proposed model with the BM model for further improvement of the modeling capability, which is one of the interests of future work.

C. MRF-Based Global Restoration

The prior model proposed in the previous subsection is defined in a patch-wise scheme. It is enforced over the local neighborhood range of each patch. In fact, the neighboring sparsity patterns $\tilde{S}$ are always unknown when addressing the sparsity pattern recovery for one single patch. Meanwhile, when dealing with an arbitrary size image, it is necessary to extend the local prior to a global one as in [2], [6].

MRFs theory provides a convenient and consistent way to model context-dependent entities through characterizing mutual influences among entities using conditional MRFs distributions. It has been proved to be a robust and accurate model for natural image [36]. Therefore, it is natural to incorporate the context-aware sparsity prior into the MRFs framework.

For an input degraded image $Y$ of arbitrary size, we first break it into overlapped small patches $\{y^k\}_{k=1}^{K}$. Each patch $y^k$ has a corresponding high-quality patch $x^k$, and the ``true'' sparsity pattern of $x^k$ is denoted as $S^k$. $\mathcal{S} = \{S^k\}_{k=1}^K$ represents the whole set of sparsity patterns. We introduce an 8-connected MRFs to model the relationships among the degraded patches and their corresponding high-quality patches, as illustrated in Fig. 4. Based on the MRFs model, we define three types of potential functions corresponding to the likelihood term $\phi(S^k,y^k)$, sparsity term $\eta(S^k)$ and context-aware term $\psi(S^k,S^p)$, $$\begin{equation*} \begin{aligned} \phi(S^k,y^k) &\propto \Pr(y^k|S^k),\\ \eta(S^k) &\propto \exp\left((S^k)^Tb\right),\\ \psi(S^k,S^p) &\propto \exp\left((S^k)^TW_{\diamond {t}}S^k_{\diamond {t}}\right), \end{aligned} \end{equation*}$$ which use the fact that patch $x^p$ is adjacent to $x^k$ in the $t$-th orientation. Once the potential functions are determined, the MRFs with homogeneous potentials could be written as $$\begin{equation} \Pr(\mathcal{S},Y) \propto \prod_k\eta(S^k)\phi(S^k,y^k)\prod_{k,p}\psi(S^k,S^p). \end{equation}$$

Fig. 4. Eight-connected MRFs. Nodes y and S are low-resolution patch and “true” sparsity pattern, respectively.

Note that $\eta(S^k)$ and $\psi(S^k,S^p)$ with 8 orientations can be reassembled to the prior probability $\Pr(S^k)$ in (\ref{eqn:ContextPrior2}), and $\phi(S^k,y^k)$ corresponds to the likelihood probability $\Pr(y^k|S^k)$. Therefore, the probability for the whole MRFs is written as $$\begin{equation} \Pr(\mathcal{S},Y) \propto \prod_k\Pr(y^k|S^k)\Pr(S^k). \end{equation}$$ The complete set of sparsity patterns $\mathcal{S}$ in the MRFs can be optimally estimated by maximizing the joint probability of MRFs, $$\begin{equation}\label{eqn:MRFframework} \begin{aligned} \max{\Pr(\mathcal{S},Y)} = \max\sum_{k=1}^K\left(\ln\Pr\left(y^k|S^k\right)+\ln\Pr\left(S^k\right)\right).\\ \end{aligned} \end{equation}$$

Experimental Results

The proposed method is evaluated with two applications: image denoising and super-resolution. We collect a set of high quality natural images (Downloaded from website) as the training dataset for dictionary learning. For learning a pair of low and high-resolution dictionaries, the high-resolution images are downsampled by a factor of 3 to simulate the low-resolution images. We test the proposed method on several standard test images which are commonly used in previous work on denoising and super-resolution. Four test images are shown in Fig. 5.

Fig. 5. Four examples of test images. (a) Barbara. (b) Boats. (c) House. (d) Lena.

A. Image Denoising

Table I. Summary of denoising results on different kinds of dictionaries (PSNR: dB). The best result for each pair of comparing methods for different noise levels is highlighted.

Table II. Summary of denoising results on different dictionary sizes $m = r\cdot 8^2, r=3,4,5$ (PSNR: dB). The best result for each pair of comparing methods is highlighted.

Table III. Summary of denoising PSNR results on four methods in Decibels. In each cell. Top left: Results of FoE [6]. Top right: BM3D [41]. Bottom left: K-SVD [2]. Bottom right: the proposed.

Fig. 9. Improvement in the denoising results after each iteration of the proposed numerical algorithm. ($\sigma=20$)}

Fig. 10. The visual comparisons of denoised results with state-of-the-art methods. From left to right: Original image, Noisy image ($\sigma = 15$), FoE [6], BM3D [41], K-SVD [2] and the proposed method.

B. Image Super-Resolution

Table VI. Comparisons of methods on 3× super-resolution in terms of PSNR (dB).

Fig. 11. Visual comparisons of super-resolution results of different methods. From left to right: original image, low-resolution image, ScSR [4], and the proposed method.

References

[2] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006.

[4] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, Nov. 2010.

[6] S. Roth and M. Black, “Fields of experts: A framework for learning image priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2. Jun. 2005, pp. 860–867.

[8] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of gaussians in the wavelet domain,” IEEE Trans. Image Process., vol. 12, no. 11, pp. 1338–1351, Nov. 2003.

[27] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

[36] S. Z. Li, Markov Random Field Modeling in Image Analysis, S. Singh, Ed., New York: Berlin, Springer-Verlag, 2009.

[41] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007.

Back to Projects Page