Optimization of Fast Search Block Matching Motion Estimation Algorithms and their VLSI Implementation.

By Rajesh T.N. Rajaram

Abstract

Motion estimation plays a vital role in the various video coding standards namely MPEG-1, MPEG-2, H.261, H.263, HDTV and CMTT. It aids in achieving high compression in image/motion picture sequence. Several block matching algorithms have been devised for the purpose of motion estimation. The main issue with all these block matching algorithms is their huge computational complexity. This becomes the bottleneck in the real time operation of the video encoders. Though several fast search block matching have been developed to reduce the computational complexity, still in practice, the amount of arithmetic computations involved are quite large. In the present thesis an attempt has been made to optimize fast search block matching algorithms by reducing the computational complexity while keeping their performance the same. The optimization has been achieved by exploiting the correlation between the motion vectors of the adjacent macro blocks in a frame.

Several fast block matching techniques have been put forth which reduce the arithmetic operations without considering their overall performance in VLSI implementation. The main objective in the present research work was to design simple and efficient hardware architectures for some fast search block matching algorithms. Towards this end, a modification to the One Dimensional Full Search algorithm (1DFS) has been proposed which has performance comparable to the more popular Three Step Search (TSS) method. But the proposed method is much simpler to implement as a dedicated hardware architecture. A 17 processing element architecture has been designed and implemented with Altera's EPLD device to prove the feasibility and real time operation capability.

A novel VLSI architecture for the One-at-a-Time (OTS) search technique has also been proposed by introducing some amount of parallelism and pipelining. A three processing element architecture has been designed for the OTS. It has features such as low hardware complexity, due to simplified on-chip memory complexity, and high efficiency. This architecture had been implemented on Altera?s EPLD device and real time operation has been achieved.