ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells’ low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm’s benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used element-wise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the-art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves 4.3x higher energy efficiency than Piggyback.
GLSVLSI
MnM: A Fast and Efficient Min/Max Searching in MRAM
Amitesh Sridharan, Fan Zhang, and Deliang Fan
In Proceedings of the Great Lakes Symposium on VLSI 2022, Irvine, CA, USA, 2022
In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra’s algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM.
IEEE Asilomar
Efficient Multi-task Adaption for Crossbar-based In-Memory Computing
Fan Zhang, Li Yang, and Deliang Fan
In 2022 56th Asilomar Conference on Signals, Systems, and Computers, 2022
Recently, ReRAM crossbar-based deep neural network (DNN) accelerator has been widely investigated. However, most prior works focus on single-task inference due to the high energy consumption of weight reprogramming and ReRAM cells’ low endurance issue. Adapting the ReRAM crossbar-based DNN accelerator for multiple tasks has not been fully explored. In this study, we propose XMA2, a novel crossbar-aware learning method with a 2-tier masking technique to efficiently adapt a DNN backbone model deployed in the ReRAM crossbar for new task learning. During the XMA2-based multi-task adaption (MTA), the tier-1 ReRAM crossbar-based processing-element- (PE-) wise mask is first learned to identify the most critical PEs to be reprogrammed for essential new features of the new task. Subsequently, the tier-2 crossbar column-wise mask is applied within the rest of the weight-frozen PEs to learn a hardware-friendly and column-wise scaling factor for new task learning without modifying the weight values. With such crossbar-aware design innovations, we could implement the required masking operation in an existing crossbar-based convolution engine with minimal hardware/memory overhead to adapt to a new task. The extensive experimental results show that compared with other state-of-the-art multiple-task adaption methods, XMA2 achieves the highest accuracy on all popular multi-task learning datasets.
The electrocardiogram (ECG), a recording of the electrical activity of the heart, is commonly used for cardiac analysis, but lack of abnormal ECG signal data restricts the development of high quality automatic auxiliary diagnosis. In this paper, we introduce an LSTM and GAN based ECG abnormal signal generator to alleviate the issue. By training with a small set of real abnormal signals, the proposed generator can learn and produce high quality fake abnormal signals. The fake signals are then combined with real signals to train abnormal ECG classifiers. We show that our method can significantly improve the ability of classifiers in recognizing the uncommon case with a low proportion in the database.
2020
JETC
Mitigate Parasitic Resistance in Resistive Crossbar-Based Convolutional Neural Networks
Traditional computing hardware often encounters on-chip memory bottleneck on large-scale Convolution Neural Networks (CNN) applications. With its unique in-memory computing feature, resistive crossbar-based computing attracts researchers’ attention as a promising solution to the memory bottleneck issue in von Neumann architectures. However, the parasitic resistances in crossbar deviate its behavior from the ideal weighted summation operation. In large-scale implementations, the impact of parasitic resistances must be carefully considered and mitigated to ensure circuits’ functionality. In this work, we implemented and simulated CNNs on resistive crossbar circuits with consideration of parasitic resistances. Moreover, we carried out a new mapping scheme for high utilization of crossbar arrays on convolution, and a mitigation algorithm to mitigate parasitic resistances in CNN applications. The mitigation algorithm considers parasitic resistances as well as data/kernel patterns of each layer to minimize the computing error in crossbar-based convolutions of CNNs. We demonstrated the proposed methods with implementations of a 4-layer CNN on MNIST, and residual neural network (ResNet) (20, 32, and 56) on CIFAR-10. Simulation results show the proposed methods well mitigate the parasitic resistances in crossbars. With our methods, modern CNNs on crossbars can preserve ideal (software) level classification accuracy with 6-bit ADCs and DACs implementation.
Resistive crossbar arrays are known for their unique structure to implement analog in-memory vector-matrix-multiplications (VMM). However, general-purpose circuit simulators, such as HSPICE and HSIM, are too slow for large scale crossbar array simulations with consideration of circuit parasitics. Although there are some specific simulators designed for crossbar arrays, they mainly focus on area/power/delay estimation rather than accurate SPICE-level simulation, thus could not model its functionality on analog in-memory computing. In this paper, we firstly give a SPICE-level modeling of resistive crossbar array with consideration of circuit parasitics in MATLAB. We also propose efficient methods to further speedup simulations by model simplifications. Last but not least, ResNet-20 on CIFAR-10 is applied to demonstrate the work. With the proposed model simplification methods, simulation speed can be improved by 31X with tolerable errors, and more than 5X speedup is achieved on ResNet-20 while the accuracy drop is 6%.