Reconfigurable FET Approximate Computing-based Accelerator for Deep Learning Applications
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Artificial Intelligence (AI) has recently surged in the last few years, facilitating revolutionary state-of-the-art solutions in healthcare, banking, data & business analytics, transportation, retail, and much more. The tremendous increase in data to deliver AI solutions has led to the need for ML acceleration, enabling improved performance, efficient realtime processing, scalability, energy, and cost efficiency. In recent years, active research has been on ML acceleration using FPGAs, GPUs, and ASICs. ASIC-based ML accelerators have superior performance, reduced latency, energy-efficient and cost-efficient compared to their counterparts. However, the traditional CMOS-based ASIC accelerator lacks flexibility leading to reconfigurability overheads. The hardware’s reconfigurability enables multiple functionalities per computational unit with less resource consumption. Emerging transistor technology devices such as FinFETs, RFETs, and Memristors are adopted in designing accelerators to facilitate reconfigurability at to the transistor level. Furthermore, some of these devices, such as Memristors, also support storage along with computations. Among multiple emerging devices, the recent research on reconfigurable nanotechnologies such as Silicon Nanowire Reconfigurable Field Effect Transistors (SiNW RFET) serve as a promising technology that not only facilitates lower power consumption but also supports multi-functionality through reconfigurability. It enables reconfigurability and supports multiple functionalities per computational unit. These features motivate us to design a novel state-of-the-art energy-efficient hardware accelerator for implementing memory-intensive applications, including convolutional neural networks (CNNs) and deep neural networks (DNNs). To accelerate the computations, we design Multiply and Accumulate (MAC) units to perform the computations. For the design of MAC units, we employ Silicon nanowire reconfigurable FETs (RFETs). The use of RFETs leads to nearly 70% power reduction compared to the traditional CMOS implementation and also reduced latency in performing the computations. Further, to optimize the overheads and improve memory efficiency, we introduce a novel approximation technique for RFETs. The RFET-based approximate adders lead to reduced power, area, and delay while having a minimal impact on the accuracy of the DNN/CNN. In addition, we carry out a detailed study of varied combinations of architectures involving CMOS, RFETs, accurate adders, and approximate adders to demonstrate the benefits of the proposed RFET-based approximate accelerator. The proposed RFET-based accelerator achieves an accuracy of 94% on MNIST datasets with 93% and 73%reduction in the area, power and delay metrics, respectively compared to the state-of-the- art hardware accelerator architectures.