dsPIC33A CPU

Last modified by Microchip on 2025/02/05 14:24

Overview

The dsPIC33A CPU has a 32-bit (data) modified Harvard architecture with a 5-stage instruction pipeline and a single-phase clock design, with 32-bit instructions. The CPU has a 32-bit instruction word with a variable length opcode field. The CPU also supports some instructions that are only available in 16-bit format. The Program Counter (PC) is 24 bits wide to access a 16 MB (24-bit address) unified linear address map.

Key Features

32-bit instruction width and 32-bit data path
72-bit DSP accumulators
32-bit working registers W0-W15
- Seven additional register contexts (one per IPL) for W0-W7, AccA, AccB, RCOUNT and CORCON
Mixed-size 16/32-bit instruction set
16 MB (24-bit address) unified memory map
5-stage instruction pipeline
Conditional branching with speculative execution
Single/double precision Floating Point Unit (FPU) co-processor with dedicated register set and execution pipeline

Block Diagram

CPU Block Diagram

CPU Registers

The following is a high-level summarized version of the programmer's model for the CPU:

CPU Registers

Note that several register sets have multiple contexts:

Working Registers (W0-W7)
Accumulators (AccA, AccB)
Repeat Loop Counter (RCOUNT)
CPU Core Control Register (CORCON)

There are seven contexts available, which are mapped to interrupt priority levels one through seven. This reduces exception latency since the CPU can automatically switch over to the new context, requiring no saving/restoring of register states.

5-Stage Instruction Pipeline with Speculative Branch Prediction

The dsPIC33A CPU contains a 5-stage instruction execution pipeline, consisting of the following stages:

Fetch
Address
Read
Execute
Write

Control hazards can arise during any program flow changes and can be mitigated by handling the branch as early as possible. To speed execution, the pipeline implements speculative branch prediction logic in the Fetch stage whenever a branch instruction is fetched:

5 Stage Pipeline

If the branch prediction is correct, there is no penalty in the instruction pipeline, and the code is executed as if it were a sequential access as seen in the top graphic.

If the branch prediction fails, then we see a penalty with a pipeline flush, and the delays associated with re-filling the pipeline from cache or flash memory.

Branch Prediction Example

Let’s look at an example of the effects of both correct and incorrect branch target prediction using the BCC instruction, which can represent any conditional branch instruction.

Branch Prediction Example

If the correct branch is predicted and there is a cache hit, 0 additional cycles will occur but if there is a cache miss, then it will take 4-7 additional cycles for a total of 5-8.
If the incorrect branch target is predicted but the correct target is still in the cache, you incur a 2-cycle delay to flush out the incorrect prediction.
Finally, an incorrect branch target prediction with a cache miss will produce the longest cycle time.

This does make the instruction pipeline more non-deterministic but the performance benefits outweigh this. The branch prediction is also fairly successful in backward branching cases, which are usually code loops. For branches such as if-else, it’s in the 75-80% range of success.

Feeding the I-Pipeline: The Prefetch Branch Unit (PBU)

Prefetch Branch Unit v2

The Prefetch Branch Unit (PBU) in the dsPIC33A core devices accelerates the interface between the dsPIC33A program Flash memory and the CPU instruction bus. The PBU can predictively prefetch the next sequential address and cache fetched program data that are the target of a CPU instruction fetch.

The PBU in dsPIC33A core devices supports the following functions:

An Instruction Stream Buffer accelerates the execution of linear program code flow.
An Instruction Cache accelerates the execution of non-linear program flow changes (branches).

The PBU block diagram shows data paths to and from the PBU in the dsPIC33A environment. The PBU provides data when the CPU fetches program data from Flash memory. It may provide program data from an internal buffer, or it may fetch program data from Flash if the requested program data is not available. Flash fetch operations are therefore accelerated when data are sourced from internal PBU buffers.

Performance Monitor Unit (PMU)

In the dsPIC33A family of devices, the architecture does not have a fixed relationship between the CPU clock speed in MHz and the throughput of the CPU in Million Instructions per Second (MIPS). The throughput of the CPU is dependent on extra cycles incurred from the following:

CPU pipeline data dependency
Branches or program flow changes
Cache misses
Slow memory or SFR accesses
Arbitration between bus masters
A bus that is slower than the CPU

Performance Monitor Unit

The performance monitor unit (PMU) is connected to the Prefetch Branch Unit and other Core hardware and counts events that cause extra cycles to be inserted into the program flow. Using this information the Cycles-per-Instruction (CPI) can be calculated and the reasons for poor code execution latency can be determined.

dsPIC33A CPU

Overview

Key Features

Block Diagram

CPU Registers

5-Stage Instruction Pipeline with Speculative Branch Prediction

Branch Prediction Example

Feeding the I-Pipeline: The Prefetch Branch Unit (PBU)

Performance Monitor Unit (PMU)

Menu

On This Page

Microchip Support