Now, take a look at the C28x pipeline, while it is protected, neither the FPU nor the VCU pipelines are protected. The pipeline flow for C28x instructions on Concerto is identical to that of the standard C28x devices, starting from left to right, first is fetch (F1), this is the drive program address. Second is the F2, this fetches the program memory and loads it into the instruction queue. Next is the decode (D1) looks at the instructions and says “Is this valid or not, how big is it?” and passes it on. If it passes, it passes it on to D2, then D2 takes the request instructions from the queue, generates addresses for memory read-write and this includes both the FPU and VCU instructions. SP and ARP are updated if required in the decode two stage (D2). It is determined if an instruction is a C28x instruction and continues on the main pipeline or if it is a floating point unit instruction or VCU instruction and is delegated correctly from here. From this point on, all units have their own pipeline. An additional decode phase is requested to initiate the FPU or VCU instructions. So in both the FPU and VCU, the pipeline from C28x is an R1, which drives any required read addresses. It also has an R2, this is a read the data, this is also true for an FPU instructions. A single read stage is enough for the FPU and VCU since they execute operations on the registers. Next is the execute (E stage) this is where modification occurs for RMW operations. Last is the Write, this is where the write address is generated and the proper strobes and data are written. For example, in D2, a memory wait states will also stall any C28x FPU or VCU instruction, which is a stall do to normal C28x pipeline stalls. This keeps the floating point unit and the VCU unit aligned with the C28x pipeline so that there is no need to change the code based on wait states of a memory block. Most of the C28x FPU and VCU instructions are single cycle and will complete in the FPU E1 or W stage, which aligns to the C28x pipeline. Some instructions will take an additional execute cycle, so an addition E2 cycle. For these instructions, one must wait a cycle for the results from the instruction to be available. The assembly tools for the C28x plus FPU and VCU, will issue both an error if a delay slot has not been handled correctly. Pipeline alignment is achieved by inserting NOPs or fill a delay slot with a useful instruction in order to keep processing performance up. All of the VCU and FPU and fixed point instructions can be used in VCU instruction delay slots as long as source and destination register conflicts are avoided.

