順序 | 口頭報告論文 |
1 | 13:30 – 13:43 (SB11) Low Accuracy Loss and Hardware-Friendly Model Compression Technique for DNNs Ya-Chu Chang and Juinn-Dar Huang National Chiao Tung University Abstract: Deep neural networks (DNNs) are broadly utilized in numerous machine learning applications nowadays. In a large DNN with several hidden layers, the number of weights (or coefficients) required to complete an entire perceptron are indeed huge. However, these excessive number of weights not only require a big chunk of memory but also create a significant memory access traffic, which incurs a heavy burden especially for small-scale embedded systems and edge devices. Therefore, several techniques have been proposed during the past few years. We present a new hardware-friendly model compression technique in this paper. It can achieve a compression rate of 20X~30X while keeping the accuracy loss below 1%. |
2 | 13:43 – 13:56 (SB12) A Simulator for Evaluating the Fault-Tolerance Capability of DNNs Yung-Yu Tsai and Jin-Fu Li National Central University Abstract: Deep neural network (DNN) is considered as one effective technique for the artificial intelligence applications. A DNN is constituted by a large amount of neurons arranged in a form of multilayers. Typically, a DNN has overprovisioning neurons such that it has the property of fault tolerance [1][2]. However, how to evaluate the fault tolerance capability of DNNs is an important issue. In this paper, we propose a simulator to estimate the loss of inference accuracy due to the faults in a DNN model or hardware accelerator. The simulator is implemented based on the platforms of Keras and Tensorflow. It can evaluate the fault-tolerance capability of a DNN at model and hardware levels. The proposed simulator can estimate the accuracy loss of a DNN model caused by a faulty neuron, a faulty link, or faulty input. The fault injection mechanism is done through the bitwise operation at the parameter of Tensorflow. The simulator integrates the Tensorflow and Keras platform to evaluate the accuracy of a DNN model with faulty elements. Also, the simulator can estimate the inference accuracy loss of a DNN accelerator caused by the faulty buffers. Simulation results of accuracy with respect to different fault rates for the LeNet and 4C2F models are conducted. |
3 | 13:56 – 14:09 (SB13) Approximate Systolic Array-based Processor for AI Computation Wei-Kai Tseng, Huan-Jan Chou, Ning-Chi Huang, and Kai-Chiang Wu National Chiao Tung University Abstract: Approximate computing is an emerging strategy which trades computational accuracy for computational cost in terms of performance, energy, and/or area. We propose a novel sensor-based approximate adder, Carry Truncate Adder (CTA), for high-performance energy-efficient arithmetic computation, while considering the accuracy requirement of error-tolerant applications. On top of a fully-optimized ripple carry adder, the performance of our adder is enhanced by 2.17X. When applied in error-tolerant applications such as image processing and handwritten digit recognition, our approximate adder leads to very promising quality of results compared to the case when an accurate adder is used. Systolic arrays are widely used as matrix multiplication accelerators for DNNs. To improve the performance and energy efficiency of a systolic array, we apply the idea of timing speculation based on our proposed CTA into a systolic array. By using in-situ sensors for approximate multiplier-accumulator (MAC) computation in a systolic array, the computation which needs longer propagation time will drop the next result of multiplication and occupy two (adjacent) MACs to complete the current multiplication and accumulation. In the experiments, compared to the original systolic array (without any approximation), our proposed approximate systolic array can reduce the clock period from 8.57 to 5.39 (ns) with only 1% accuracy loss on MNIST dataset. |
4 | 14:09 – 14:22 (SB14) Approximate Logic Circuit Design for AI Applications Wei-Hung Lin, Hsu-Yu Kao, and Shih-Hsu Huang Chung-Yuan Christian University Abstract: To reduce the power consumption of an embedded system, the design of approximate logic circuits appears as a promising solution for many error-resilient applications. In this paper, we will introduce the approximate logic circuit design for AI applications. We have developed a circuit library of approximate logic circuits for AI applications. Moreover, we also have constructed a neural network (NN) design framework for the users to utilize the circuit library to develop their AI applications. So far, we have implemented ICNet, which is a famous semantic segmentation NN, by using the proposed NN design framework. Experimental results show that, compared with the original ICNet model, even if all the multiplications and activation functions are replaced by our approximate logic circuits, the accuracy loss is only 0.1%. Our future works is to provide more approximate logic circuits in the NN design framework for the trade-off. We will also try to develop more AI applications based on the NN design framework. |
5 | 14:22 – 14:35 (SB15) Dataflow Exploration Framework for Data Reuse of CNN Computation Xiang-Yi Liu, Yuan-Chih Lo, Tsai-Yu Tsai, and Wei-Kai Cheng Chung-Yuan Christian University Abstract: In the edge intelligence system, due to the limited hardware resources, memory accesses become the bottleneck of DNN hardware accelerator. Different from CPU or GPU architecture, dataflow processing is an effective method to speed-up the efficiency of data migration in the DNN hardware accelerator. However, memory accesses consume a high percentage of energy in this type of DNN architecture. In this paper, we propose a modified dataflow approach based on Eyeriss to reduce data volume of external memory access. Experimental results show that our dataflow approach can reduce data migration of kernel and input feature map between external DRAM and internal buffer. |
6 | 14:35 – 14:48 (S0168) AIP: Saving the DRAM Access Energy of CNNs Using Approximate Inner Products Cheng-Hsuan Cheng and Ren-Shuo Liu National Tsing Hua University Abstract: In this work, we propose AIP (Approximate Inner Product), which approximates the inner products of CNNs’ fullyconnected (FC) layers by using only a small fraction (e.g., onesixteenth) of parameters. We observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f). |
7 | 14:48 – 15:01 (S0030) Filter Pruning based on Dynamic Convolutional Neural Network for Surveillance Video Chun-Ya Tsai, De-Qin Gao, and Shanq-Jang Ruan National Taiwan University of Science and Technology Abstract: The large-scale surveillance videos analysis becomes important as the development of the intelligent city; however, the heavy computational resources necessary for the state-of-the-art deep learning model makes real-time processing hard to be implemented. As the characteristic of high scene similarity generally existing in surveillance videos, we propose an effective compression architecture called dynamic convolution, which can reuse the previous feature maps to reduce the calculation amount; and combine with filter pruning to further speed up the performance. |
指導單位:
教育部資訊及科技教育司
主辦單位:
臺灣積體電路設計學會
承辦單位:
國立中正大學電機工程學系
國立中正大學資訊工程學系
協辦單位:
科技部工程司工程科技推展中心
國研院台灣半導體研究中心
中國電機工程學會
智慧聯網整合推動聯盟中心
中央研究院資訊科學研究所
贊助單位:
聯發科技股份有限公司
日月光半導體製造股份有限公司
威鋒電子股份有限公司
奇景光電股份有限公司
荷蘭商益華國際電腦科技(股)公司台灣分公司
台灣新思科技股份有限公司
財團法人工業技術研究院
瑞昱半導體股份有限公司
聯詠科技股份有限公司
鈦思科技股份有限公司
台灣是德科技股份有限公司
和澄科技股份有限公司
愛爾蘭商明導股份有限公司台灣分公司
一元素科技股份有限公司
力旺電子股份有限公司
博鑫醫電股份有限公司
群聯電子股份有限公司
晶心科技股份有限公司