Although digital signal processors (DSPs) are suitable for filters and multiply and accumulate (MAC) intensive operations, they are not suitable for video processing applications, since they have limited pipelining and parallelism inherent in their architecture. On the other hand, FPGA/ASIC implementations exploit massively parallel and highly pipelined architecture resulting in high-speed performance, which cannot be matched by DSPs. However, while DSPs have very efficient ways of handling data transfers with external peripherals, hardware logic based on ASIC/FPGA takes a long time to build and provide little flexibility for future adaptation. The present scenario of designing systems for specialized applications such as image processing is to integrate various application specific chips on a circuit board. Each of these devices has a specific role such as processing, data transfer, etc. This approach results in reduced processing speed as well as loss of flexibility to build new features or to upgrade existing systems. This project proposes a design methodology that combines various features mentioned earlier on a single core offering the high performance of the ASIC and the flexibility of the DSP. The Methodology is proven by integrating a synthesizable, general purpose I/O processor along with the DMA feature and DCT/IDCT core as image processing co-processor for a high performance programmable DSP. Various issues such as speed, image size, and quality have been analyzed to validate the use of this design for image processing.