[C51] DCR: Decomposition-Aware Column Re-Mapping for Stuck-At-Fault Tolerance in ReRAM Arrays

Abstract

The ReRAM-based neuromorphic computing system (NCS) has been widely used as an energy-efficient platform for deep neural network (DNN) acceleration. However, ReRAM suffers from a common fault, stuck-at-fault (SAF), which can cause a permanent failure of the ReRAM device. SAF tolerance is an essential task to ensure the reliability of the ReRAM-based NCS by minimizing the degradation in the inference accuracy of the DNN. Hardware-based solutions incur additional overhead and power consumption, while re-training on the hardware shortens the lifespan of ReRAM. Therefore, it is necessary to seek a solution that can be executed offline to mitigate the impact of SAF. In this work, we propose a decomposition-aware column re-mapping (DCR) for SAF tolerance in analog ReRAM arrays. Our DCR method consists of the column re-mapping technique combined with fault-aware weight decomposition and an advanced sensitivity metric. As a result, it generates a final weight map optimized for the fault map. We demonstrate the effectiveness of the proposed method on the VGG16 and ResNet18 networks. Our DCR method achieves only about 1% loss of inference accuracy on CIFAR-10 and CIFAR-100 for the analog ReRAM arrays with the SAF rate of 2% and 1%, respectively, without any hardware-based solution or re-training.

Publication
International Conference on Computer Design (ICCD) 2023
Kang Eun Jeon (전강은)
Kang Eun Jeon (전강은)
Post-doctoral researcher
Johnny Rhe (이존이)
Johnny Rhe (이존이)
Combined MS-PhD student