[Tech] Lagarto KA: The High Performance Core for DRAC

One of the main goals pursued by processor designers is to achieve a high level of performance.  Over the years, several proposals exploring novel ideas to achieve this goal have emerged, both in academia and industry, taking advantage of the technological evolution dictated by Moore's Law.  However, the processors' design has not ceased to be a challenging task in which it is sought to maintain a balance between complexity and performance.

In this sense, super-scalar processors, which are processing machines capable of executing multiple instructions per clock cycle, have proven to be effective in taking advantage of instruction-level parallelism. These designs implement dynamic planning techniques, combined with efficient memory management units and specialized accelerators, resulting in high hardware resource utilization, reaching a high percentage of instructions executed each clock cycle.

The DRAC project has as its primary goal the design, verification, and fabrication of a high-performance processor integrating several accelerators and other features in the same system-on-chip (SoC).  In order to achieve this goal, in a joint effort between the BSC and the CIC-IPN, the design and implementation of the Lagarto Ka have been proposed, a super-scalar 2-way 64-bit processor implemented in a 12-stage  microarchitecture supporting out-of-order execution based on the RISC-V instruction set.


Figura 1. Diagrama de blocs bàsic de la microarquitectura del processador Lagarto Ka.
Figure 1. Basic block diagram of the Lagarto Ka processor microarchitecture.

As shown in Figure 1, the Lagarto Ka processor microarchitecture works with two fundamental processing blocks: the Front-End and the Back-End.

The Front-End is the part of the processor in charge of maintaining a constant stream of instructions in the pipeline that must be brought from the instruction cache, decoded, and renamed, to later be distributed in the data path that corresponds to each instruction to be executed correctly.  The register renaming is essential at this stage of the processing since it resolves the false dependencies between instructions, keeping only the true dependencies, which optimizes the execution of independent instructions, breaking the original order of the program.

Additionally, the Front-End integrates a branch predictor to speed up fetching new instructions by detecting jumps in the program without decoding them.  Finally, it is possible to restore the processor context to a previous state at the occurrence of any failure thanks to the integration of a recovery mechanism that allows speculative instructions to be executed effectively.

The second block of processing, the Back-End, has been designed to exploit the instruction-level parallelism offered by register renaming, accelerating the execution of instructions as soon as their source operands are ready, regardless of the initial order in entering the pipeline.  This process is known as out-of-order execution.

Out-of-order execution within the Lagarto Ka processor is driven by to two main structures: the out-of-order issue queues, in charge of sending as many instructions as possible to the execution units each clock cycle, and the reorder buffer (ROB ), which monitors the order of the active instructions in the Back-End, and determines when the hardware resources should be released once the instructions leave the pipeline.

Since power consumption is critical for embedded systems, the Lagarto Ka processor integrates low-power consumption techniques to mitigate this problem.

Lagarto Ka is the first high-performance design of the Lagarto family processors.  The Mayan numbering inspires the name of this processor, as well as for the first designs fabricated in DRAC.  The first version of these processors is the Lagato Hun, a scalar in-order processor that fetches one instruction per clock cycle, while the Lagarto Ka has been designed to fetch two instructions each cycle.     

sistema de numeración maya 2
Figure 2. Mayan numbering system.


Lagarto Ka is the core of the chips that integrate the different accelerators developed in DRAC, a project looking forward to opening the way to new proposals from the academy to the Computer Architecture field, a movement that has been initially triggered by open-source initiatives such as RISC-V, seeking to take advantage of these proposals in a not too distant future in specialized or domestic products.