Concerto does come with the Viterbi CRC and a complex arithmetic unit also called the VCU. This is the same VCU that is used on the Piccolo F2806x series. The VCU enhances the digital signal processing capabilities of the C28x core by adding three highly optimized computational units. The CRC operations help to reduce overhead in communication algorithms. This also helps to perform memory background checks using a standard CRC 32 polynomial. Two standard CRC 16 polynomials and a CRC 8 polynomial are also supported in this unit, as well as a Viterbi Unit. Viterbi decoding is commonly used in baseband communication applications. Without the VCU, it takes about 15 cycles for the C28x DSP to perform two Viterbi butterfly calculations. F28x MCUs with the VCU option, can perform two simultaneous butterflies in two cycles. On top of this, a Viterbi trace back is implemented. If it is implemented in software, it generally takes about twenty-two cycles per stage compared to just three cycles per stage by using this VCU. The arithmetic unit, with complex math capability of the VCU, enables it to perform 16 bit FFT butterflies in five cycles and do 16 bit complex filters in a single cycle or a single tap. Note the VCU benefits from its own independent register space. These registers function as sourced and destination registers for VCU instructions. Concerto comes with the standard C28x register set, plus the well-known FPU registers and additional independent register space dedicated to the VCU. The additional VCU registers are VR0, VR1 through VR8. These again are used by the Viterbi in the complex module. The two trace back registers, VT0 and VT1, are reserved specifically for the Viterbi module. A configuration and status register, this is the V status register, and again this is also accessible from the Viterbi and complex math module only. There is also a CRC result register (VCRC), this is exclusively used by the CRC, so this is not a shared register. There is a repeat block register (RB) and it is shared with the FPU, which provides an enhanced math engine, that added 75 more math instructions to accelerate complex communication algorithm. This adds a processing factor increase of about seven-fold. So while the C28x FPU is optimized for 32 bit data operations, that can be either fixed or floating point, the VCU itself is optimized for 16 bit data operations which again are common for communication. An application issue instructions to the VCU in the same way that an application issues an instruction to the FPU using dedicated register pointers to data and result buffers. This allows applications to utilize VCU accelerated calculations at anytime. Remember, floating point and VCU operations are not pipeline protected. Some instructions require delay slots for the operation to complete, insert NOPs (non-conflicting instructions) between operations to fill the delay slot exactly as per the FPU.

