Joint Sub-Band Based Neighbor Embedding for Image Super Resolution
fw

Fig.1. The framework of joint sub-band based neighbor embedding for image super-resolution. (a) Different frequency components obtained from steerable pyramid transform. (b) References for similar patches considering global and local features. (c) Nearest neighbors from external library. (d) Reconstruction based on neighborhood regression for each frequency component. (e) Super-resolved image through inverse steerable pyramid transform.

Abstract

In this paper, we propose a novel neighbor embedding method based on joint sub-bands for image super-resolution. Rather than directly reconstructing the total spatial variations of the input image, we restore each frequency component separately. The input LR image is decomposed into sub-bands defined by steerable filters to capture structural details on different directional frequency components. Then the neighbor embedding principle is employed to reconstruct each band, respectively. Moreover, taken the diverse characteristics of each band into account, we adopt adaptive similarity criterions for searching nearest neighbors. Finally, we recombine the generated HR sub-bands by applying the inverting sub-band decomposition to get the final super-resolved result. Experimental results demonstrate the effectiveness of our method both in objective and subjective qualities comparing with other state-of-the-art methods.

Implementation

Image Decomposition with Steerable Filters

The self-inverting and multi-orientation steerable pyramid transform at one scale is first employed to extract different frequency components from the input LR image. By computing the response of a set of steerable filters, we can obtain the direction selective sub-bands with N orientations, as well as the high-pass image and the residual low-pass information. The motivations we decompose the input image in frequency domain are twofold. (i) Structural patterns like edges are usually more prominent in one directional sub-band. Decomposing the image conveys such a property explicitly so that we are able to recover more details on this sub-band and get sharper edges in the final result. (ii) Richer textures can be synthesised because each frequency component is recovered independently. The combination of them can provide textures not existing in the training set. An example of decomposition result can be viewed below.

fw

Fig.2. Representations of steerable pyramid transform with one scale and two orientations. From left to right: high-pass image, two directional sub-bands (vertical and horizontal, respectively ), and low-pass residue.

Simalarity Metrics with Global and Local Features

Each frequency band of the input LR image is reconstructed independently to generate the corresponding HR bands.However, the qualities of reconstructed patches rely heavily on their nearest neighbors. It is important to formulate good similarity criterions when performing retrieval algorithms.

For the high-pass image and sub-bands, we not only consider local features, but also introduce the global structural information from the bicubic interpolated image. Taken both the global and local features into consideration, we develop the distance function for the patch in reatrieval algorithm as follows:

fw

And the comparision below illustrate the effectiveness of the similarity metrics for high-pass image and sub-bands.

fw

Fig.3. (a) Sub-band of upscaling image. (b) Corresponding sub-band of ground truth. (c) Results with only local features referred for similarity. (d) Results with global and local features referred for similarity.

The low-pass residue is more smooth due to its low frequency, which results in the difficulty extracting gradient features. But it is coherent and contains enough structural information. So for a patch from the low-pass image, we need joint local features from its corresponding patches of other bands. The similarity metric on patch can be given by:

fw

In the end, to generate the super-resolved result, the HR bands are combined through inverting the steerable pyramid decomposition. Besides, the nonlocal redundancy is also employed to the generated HR image to enhance the final result. For each patch, we seek its similar patches and constrain the prediction error to be minimum.

Experimental Results

In the experiment, to evaluate the effectiveness of the proposed method, we conduct experiments (x2) on several test sets (Set5, Set14 and B100) used in the previous literature. For a fair comparison, we also adopt the training set consisting of 91 images from ScSR [1]. The LR input images are generated from the original HR images by bicubic downsampling with the scaling factor. In our experiment, we decompose the image into the high-pass image, four directional sub-bands, as well as the low-pass residue. It is shown in the following tables that our proposed method performs better than other state-of-the-art methods.

Table 1. Average PSNR(dB) results of different super-resolution methods on test sets.

Test Set Bicubic ScSR[1]ANR[2]BPJDL[3]SRCNN[4]NEProposed
Set5 33.68 36.0035.8436.2036.3435.8436.59
Set14 30.2331.9331.8032.0232.1731.7932.33
B100 29.5630.9230.8231.0031.1430.7631.22

Table 2. PSNR(dB) results of different super-resolution methods on Set5.

Set 5 Bicubic ScSR[1]ANR[2]BPJDL[3]SRCNN[4]NEProposed
baby 37.09 38.3738.4438.6338.3038.1238.56
bird 36.8340.2140.0440.6640.6440.3441.07
butterfly 27.4430.9930.4830.9632.1330.7532.27
head 34.8835.6835.6635.7635.6435.4935.72
woman 32.1534.7234.5534.9634.9434.4835.31
average 33.6836.0035.8436.2036.3435.8436.59

Table 3. PSNR(dB) results of different super-resolution methods on Set14.

Set 14 Bicubic ScSR[1]ANR[2]BPJDL[3]SRCNN[4]NEProposed
baboon 24.89 25.5525.5425.6025.6125.3925.67
barbara 27.9928.6728.5928.5828.5928.5028.69
bridge 26.5727.6427.5427.7027.6927.4127.80
coastguard 29.1230.6230.4430.6130.4930.2630.63
comic 26.0127.9027.7728.0528.2727.5528.36
face 34.8335.6635.6335.7435.6235.4735.69
flowers 30.3732.5032.2932.6033.0332.1733.07
foreman 34.1436.2936.4036.4136.2036.5537.00
lena 34.7036.3636.4036.5436.5036.2136.64
man 29.2430.5730.4730.6730.8230.4430.89
monarch 32.9336.1135.7136.2137.1735.8937.26
pepper 34.9636.5036.3936.5536.7536.7737.04
ppt3 26.8729.2828.9729.4730.3929.3930.13
zebra 30.6333.3033.0733.4933.2833.0833.79
average 30.2331.9331.8032.0232.1731.7932.33
fw

Fig.4. Comparison of PSNR(dB) results by 2x on the (butterfly, comic) images: (a) Bicubic, (b) ScSR[1], (c) ANR[2], (d) BPJDL[3], (e) SRCNN[4], (f) NE, (g) Proposed. The red block with its corresponding magnification on the left-bottom corner of each image shows the reconstruction details.

References

[1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.

[2] R. Timofte, V. De, and L. V. Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proc. IEEE Int’l Conf. Computer Vision (ICCV), pp. 1920–1927, 2013.

[3] L. He, H. Qi, and R. Zaretzki, “Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (CVPR), pp. 345–352, 2013.

[4] C. Dong, C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PP, no. 99, pp. 1–1, 2015.

Back to Projects Page