- Wei Bai
janelle@pku.edu.cn
- Jiaying Liu
liujiaying@pku.edu.cn
- Jie Ren
renjie@pku.edu.cn
- Zongming Guo
guozongming@pku.edu.cn
Published on ISCAS, May 2012.
Overview
Frame rate up-conversion (FRUC) refers to the technique that generates a higher frame rate from the video with a lower frame rate by producing new frames and inserting them into the original one. The easiest way is frame repeat or temporal averaging. Nevertheless, they fail to handle sequences with high motion. Motion-compensated interpolation (MCI) is soon adopted to employ FRUC using the unidirectional or bidirectional motion estimation(ME) and compensation.
In order to find the true motion for the missing frames, Choi et al. [1] performed bidirectional ME for each interpolated blocks with additional spatial smoothness constraint. But the problem is, compared with the previous frame, the current frame always has occlusion problems, i.e., some areas that did not appear in the previous frame appeared in the current frame. Consequently, it becomes difficult for the ME methods that we mentioned above to find the right corresponding blocks in the previous and current frames. Moreover, occlusion problems also bring about ambiguity when predicting the middle blocks. It’s hard to decide which of the two blocks, the ones in the previous frame and the current frame, should the predicted block be more similar to.
In this work, we propose a novel frame rate up-conversion algorithm based on joint motion vector refinement and visual-weighted motion compensation interpolation (MCI). It utilizes a hierarchical motion vector refinement to correct inaccurate motion vectors (MVs), which is composed of the global level and the local level. In the global level, distinct inaccurate MVs are detected by global controlling and then corrected by neighborhood information. Afterwards, the local level performs the local controlling to pick out local outliers and re-estimate them with the maximum likelihood method. Finally, plausible weights for each block in the interpolated frame, computed by the similarity index (SSIM) [2], are applied for visual compensation. The experimental results demonstrate that compared with the conventional algorithm EBME [3], the proposed algorithm achieved the average PSNR by up to 2.7dB while the visual quality improvement is also remarkable.
Implementation
Bidirectional ME
The to-be-interpolated frame is divided into non-overlapped blocks of same size and then the block-based motion estimation is performed. Various ME methods can be adopted. In this paper, we utilize bidirectional ME [1] to acquire the initial motion vector field. The bidirectional ME process is depicted in Eq.(1). For each block in the to-be-interpolated frame fn, we find its aligned blocks in the previous frame fn-1 and the current frame fn+1 by integer-pel ME.

Hierarchical motion vector refinement
The bidirectional estimated motion field is not so accurate that we have to do some post processing to refine the incorrect motion vectors, including a two-level adjustment and a frame border MV correction in particular.
A. Global outlier detection and correction
Generally, a motion vector can be regarded as the inlier, only if it has good matching properties together with spatial coherence with the vectors assigned to the neighboring blocks. Oppositely, an outlier is quite different from its neighboring blocks, either in content or in MVF. SAD(sum of absolute difference) is widely used as a measure to determine the difference between two blocks.Then those MVs, which propagate over large SAD, can be considered as outliers.
After the outliers are detected, we take three steps to correct them:
Step 1: If the MV is an outlier, we search its 8-connectivity MVs as Fig.1 (a) to find a MV with the minimum SAD. Let
denote the found MV, then
should be considered reliable.


![]() |
![]() |
(a) | (b) |
Fig.1. Global outlier detection and correction. (a) Find the MV with a minimum SAD around the outlier. (b) Enlarge the outlier’s block size to research in adjacent frames.
Step 2: The outlier block is so close to the minimum block that is probably the true MV for the outlier, hence, we can set the outlier’s MV as below:
.
Step 3: With the initializing MV , the outlier block is enlarged of half block size to re-search in the previous and current frames for a new MV. Enlarged block contains more information, which reduces mismatch errors, especially in smooth regions. In this way, the motion vector field is updated with less erroneous MVs.
B. Local outlier detection and correction
In the local level, we use a window W to slide through the whole image with the similar detection method in Sec.3.1 to find the local outliers. After the outliers are detected, they are corrected by selecting the most appropriate vector from a candidate set in terms of the maximum likelihood described as follows.
Centered at the outlier block as in Fig.2, we divide its surrounding 8-connectivity blocks into 16 blocks by half-block step. Because 8 blocks partition are not adequate to reflect the trend of the local area. Then re-search the newly built blocks in the previous and current frames initialized with its neighboring MVs. Choose an optimal MV by trying different initials. Thus, we have obtained enough coherent references.
Fig.2. Local outlier detection and correction. The outlier’s neighborhood is divided into overlapped blocks and their MVs contribute to the refinement of the outlier.
As stated before, no matter what kind of motion the outlier belongs to, we can utilize its neighbor to predict its MV. Let denote the MVs of the neighboring blocks,
denotes the outlier’s MV. According to the
ML estimation, under the assumption of Gaussian stationary local MVF,
is predicted by the mean value of
as depicted in Eq.(3).

Similarity index-based weighted MCI
Although we interpolate the frame in the middle of the previous and current frames, it does not mean the moving object locates in the middle of the motion trajectory. In general cases, the middle frame should be more similar to the block either in the previous or the current frame. Obviously, SAD is not suitable for such task; we need to judge the similarity between them from another aspect that can represent the structural information, as illustrated in Fig.3. Thus, we utilize the SSIM to measure the similarity.
Fig.3. The ambiguity during the interpolation of the middle block. (a) and (c) are successive frames of Crew sequence. (b) is the reference for the intermediate frame we want to generate. (d) and (f) zoom into the areas highlighted in (a) and (c). (e) to-be-interpolated area.
Sn-1 denotes the similarity between the previous block and the predicted block. Sn+1 denotes the similarity between the current block and the predicted block. Thus, the weights of the two temporally adjacent blocks can be computed like this:

where and
denote the weight of the previous block and the current block respectively.
Experimental results
The performance of the proposed MRS-MCI algorithm has been evaluated through the objective and subjective evaluations. In the objective evaluation, we compared the PSNR values of 50 interpolated frames. The image qualities of interpolated frames constructed using the proposed and existing algorithms are assessed in the subjective evaluation. For experiments, we set the block size to 16×16 and the search range to ±12. Flower, Foreman, and Football are used in the CIF (352×288) format as test sequences.
Table I summarizes the average PSNR for the test video sequences, obtained by the proposed algorithm, MWCI [4] and EBME [3]. The table indicates that the proposed algorithm provides the PSNR improvement of up to 6.9dB compared to MWCI and 2.7dB compared to EBME. The PSNR improvement of the proposed algorithm comes from the use of motion vector refinement and the visual-weighted MCI. The table also reflects that sequence Football sees no apparent improvement in PSNR by the proposed algorithm. It is because the sequence Football itself contains abundant high-motion caused artifacts, which is not suitable for block-based ME.
Table.1. Average PSNR (dB) comparison of test sequences.

Fig.4. PSNR value as a function of the frame number with the proposed two-level refinement separately for Flower sequence.
Further experiments were conducted to verify the effectiveness of each step in the proposed algorithm. The steps include the global MV refinement, the local MV refinement and the SSIM-based weighted MCI. For simulation, we implemented the algorithm step by step. Moreover, results are output after each step was taken. Fig.6 shows that the PSNR gain is improved as a step is added. Note that for most frames the local level refinement does correct some outliers missed by the global level, proving the necessity of this step.
Fig.5 illustrates the interpolated frames of the test images, Flower. Areas that are highlighted by circles should appear based on the original image, Fig.5(b) and Fig.5(c) indicate that the proposed method works more effectively than the conventional method. Fig.6 demonstrates the effect of visual-weighted MCI, which is one frame of Foreman. Fig.6(c) and Fig.6(d) are two aligned blocks in its adjacent frames, and the proposed algorithm considers the predicted block more “structurally” similar to the previous block Fig.6(c). The interpolated result of the highlighted patch by the proposed algorithm shown in Fig.6(f) is closer to the original in visual quality.
Fig.5. Comparison of the performance of conventional FRUC and the proposed MRS-MCI. (a) An original frame of Flower. Interpolated frame obtained using (b) the conventional MCI (PSNR: 28.12dB), (c) the proposed motion vector refinement (PSNR: 29.75dB).
Fig.6. Zoom of interpolation comparison between conventional method [1] and MRS-MCI. (a) An original frame of Foreman, with a patch selected for comparison. (b) Zoom-in of the patch. (c) and (d) are two aligned blocks in adjacent frames. (e) Interpolated by the conventional method. (f) Interpolated by the proposed MRS-MCI algorithm.
References
[1] B.-D. Choi, S.-H. Lee, and S.-J. Ko, New Frame Rate Up-conversion Using Bi-directional Motion Estimation. IEEE Trans. on Consumer Electronic, vol.46, pp. 603-609, Aug. 2000.
[2] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. on Image Processing, vol.13, pp.600-612, Apr. 2004.
[3] S.-J. Kang, K.-R. Cho, and Y. H. Kim, Motion Compensated Frame Rate Up-conversion Using Extended Bidirectional Motion Estimation. IEEE Trans. on Consumer Electronics, vol.53, pp.1759-1767, Nov. 2007.
[4] T. Ha, S. Lee, and J. Kim, Motion Compensated Frame Interpolation by New Block-based Motion Estimation Algorithm. IEEE Trans. Consumer Electronics, vol.50, pp.752-759, May 2004.