Tools and Experimental Setup for Efficient Hardware Benchmarking of Candidates in Cryptographic Contests
Date
Authors
Farahmand, Farnoud
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Hardware benchmarking of candidates competing in cryptographic contests, such as SHA-3 and CAESAR, is very important for ranking of their suitability for standardization. A huge amount of time is necessary to design the datapath and controller and convert them to the hardware description language (HDL) code, due to an increasing number of candidates. The other difficulty is to develop a testbench in HDL for verification purposes. High-Level Synthesis (HLS), based on the newly developed Xilinx Vivado HLS tool, offers a potential solution to the aforementioned problems. Therefore, in the first part of this thesis we investigate the following hypothesis: Ranking of candidate algorithms in cryptographic contests in terms of their performance in modern FPGAs & All-Programmable SoCs will remain the same independently whether the HDL implementations are developed manually or generated automatically using HLS tools. In order to verify a potential validity of this approach, 4 Round 2 SHA-3 candidates are implemented using Vivado HLS and compared with existing RTL implementation. Our results indicate that the ranking of the evaluated candidates, in terms of four major performance metrics, frequency, throughput, area, and throughput to area ratio, has remained unchanged for all tested candidates. In addition, one of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets, such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. Finding actual maximum clock frequency utilizing static timing analysis is not a trivial task, especially in the Xilinx Vivado environment. It is extremely time consuming and tedious. As a result, in the second part of this thesis, we describe Minerva. Minerva is an automated hardware benchmarking tool which finds maximum frequency based on static timing analysis. It can be configured to target either Throughput or Throughput/Area as optimization criteria and to search through specific number of optimization strategies. The tool determines the best requested clock frequency, leading to the maximum value of the optimization target. We evaluated 20 Round 2 CAESAR candidates in terms of frequency and frequency to area ratio. Minerva frequency search is compared to binary search and results demonstrated up to 37% improvement in terms of throughput to area ratio and up to 24% in terms of throughput. In the third part of the thesis, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design and Advanced eXtensible Interface (AXI). We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis.
Description
Keywords
High level synthesis, Cryptography, Benchmarking, SHA-3, Software/hardware codesign, Zynq