[C69] Low-Rank Compression for IMC Arrays

Abstract

In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads, especially when model sparsity does not meet a specific threshold. To circumvent these drawbacks, we propose leveraging low-rank compression techniques, which, unlike pruning, streamline the dataflow and seamlessly integrate with IMC architectures. However, low-rank compression presents its own set of challenges, notably suboptimal IMC array utilization and compromised accuracy compared to traditional pruning methods. To address these issues, we introduce a novel approach employing shift and duplicate kernel (SDK) mapping technique, which exploits idle IMC columns for parallel processing, and group low-rank convolution, which mitigates the information imbalance in the decomposed matrices. Our experimental results, using ResNet-20 and Wide ResNet16-4 networks on CIFAR-10 and CIFAR-100 datasets, demonstrate that our proposed method not only matches the performance of existing pruning techniques on ResNet-20 but also achieves up to 2.5x speedup and +20.9% accuracy boost on Wide ResNet16-4.

Publication
Design, Automation & Test in Europe Conference & Exhibition (DATE) 2025
Kang Eun Jeon (전강은)
Kang Eun Jeon (전강은)
Post-doctoral researcher
Johnny Rhe (이존이)
Johnny Rhe (이존이)
Combined MS-PhD student