Abstract:
The rapid pace and successful application of machine learning research and development has seen widespread deployment of deep convolutional neural networks (CNNs). Alongs...Show MoreMetadata
Abstract:
The rapid pace and successful application of machine learning research and development has seen widespread deployment of deep convolutional neural networks (CNNs). Alongside these algorithmic efforts, the compute- and memory-intensive nature of CNNs has stimulated a large amount of work in the field of hardware acceleration for these networks. In this paper, we profile the memory requirements of CNNs in terms of both on-chip memory size and off-chip memory bandwidth, in order to understand the impact of the memory system on accelerator design. We show that there are fundamental tradeoffs between performance, bandwidth, and on-chip memory. Further, this paper explores how the wide variety of CNNs for different application domains each have fundamentally different characteristics. We show that bandwidth and memory requirements for different networks, and occasionally for different layers within a network, can each vary by multiple orders of magnitude. This makes designing fast and efficient hardware for all CNN applications difficult. To remedy this, we outline heuristic design points that attempt to optimize for select dataflow scenarios.
Date of Conference: 30 September 2018 - 02 October 2018
Date Added to IEEE Xplore: 13 December 2018
ISBN Information: