One of the objectives of the DRAC project is to promote the adoption of post-quantum cryptography (PQC), by accelerating PQC schemes and deploying them in open-source RISC-V environments. Improving the efficiency of PQC schemes will entice technological actors to embrace their use, and to produce safer technologies against attacks using quantum computers. To this end, newly presented DRAC research proposes an HLS-based, HW/SW co-design acceleration [1] of a major PQC candidate scheme, called Classic McEliece (CM).
In recent years, quantum technologies have been developing at a fast pace. While quantum computing can benefit many computational disciplines, it poses an alarming threat to the current cryptographic infrastructure. This is due to the fact that known quantum algorithms could be used to break the security of popular public-key cryptosystems. Fortunately, a swift adoption of PQC would help prevent this future threat. With this concern in mind, in 2017 the National Institute of Standards and Technology (NIST) launched an open contest to choose the future PQC standards [2]. This process is currently in its last evaluation round, and the CM scheme is one of the seven remaining finalist cryptosystems.
Classic McEliece is a cryptographic scheme aimed at secure key exchange. It has the longest-standing security among the candidates to the NIST PQC process, as well as fast encryption/decryption and very small ciphertexts. It is especially appealing for applications such as VPNs, where its large key size does not become an issue. However, it lacks performance evaluation studies, especially on heterogeneous platforms. The research in DRAC bridges this gap, accelerating the CM scheme on platforms containing CPUs and FPGAs.
The proposed hardware acceleration affects the most time-consuming part of each CM algorithm, according to a preliminary profiling. It uses hardware/software (HW/SW) co-design and High-Level Synthesis (HLS) coding. HLS consists in generating hardware from a description written in a software language, which results in an increased performance, and also in a low design effort with respect to RTL implementations. The original NIST submission C code was used for this purpose, and the HW/SW co-design methodology was applied to exploit the idle time of hardware resources, to unroll and pipeline most computationally intensive loops, to use array partitioning on large data, and to parallelize computations into different accelerators. In this process, the security, speedup and hardware overhead trade-offs were carefully balanced. The obtained accelerators have been deployed in the Xilinx zcu 102 platform, which comprises a quad-code ARM Cortex-A53, running at 1.1GHz, and a Zynq-Ultrascale+ FPGA, with 4GB and 512GB DRAMs attached.
The obtained acceleration results are significant, as they outperform baseline implementations for all CM algorithms. Performance is not sacrificed to obtain security, since higher security levels of CM admit bigger speedups. The results show speedups of up to 55.2x, 3.3x, and 8.7x for the key generation, encapsulation, and decapsulation CM algorithms with respect to a SW-only scalar implementation. This approach provides a considerable speedup even when compared to manually-vectorized code, with a 2.3x, 0.5x, and 0.1x acceleration for each of the CM algorithms, respectively. The design is highly-portable, and can be implemented in current, heterogeneous CPU+FPGA platforms.
References:
[1] HLS-Based HW/SW Co-Design of the Post-Quantum Classic McEliece Cryptosystem. Vatistas Kostalampros, Jordi Ribes-González, Oriol Farràs, Miquel Moretó and Carles Hernández. FPL 2021 https://cfaed.tu-dresden.de/fpl2021/program.
[2] National Institute of Standards and Technology, Information Technology Laboratory, Computer Security Resource Center. Post-Quantum Cryptography, Round 3 Submissions. 2020. https://csrc.nist.gov/projects/post-quantum-cryptography/round-3-submissions.