I recently caught up with Vivienne Sze, Associate Professor of Electrical Engineering and Computer Science at MIT, to discuss the launch of a new professional education course titled, “Designing Efficient Deep Learning Systems.” The two-day class will run from March 28-29, 2018 at the Samsung Campus in Mountain View, CA and will explore all the latest breakthroughs related to efficient algorithms and hardware that optimize power, memory and data processing resources in deep learning systems. Vivienne Sze joined the EECS Department as an Assistant Professor in August 2013. She received the B.A.Sc. degree in Electrical Engineering from the University of Toronto in 2004, and the S.M. and Ph.D. degree in Electrical Engineering and Computer Science from MIT in 2006 and 2010, respectively. Prof. Sze’s research focuses on joint design of algorithms, architectures and circuits to build energy efficient and high performance systems.
Q. Why did you decide to launch a course specifically focused on deep learning hardware?
Ans. Deep Learning is a popular approach for artificial intelligence (AI) that is becoming more ubiquitous in our daily lives as it allows computers to accurately extract meaningful information from the large amount of data being collected every day including videos, images, speech, etc.
While deep learning is highly accurate, the main challenge is that the algorithms are complex and require more computation than other approaches. The analysis of massive data sets can lead to high power and heat dissipation in data centers which limits processing speeds; always-on applications can quickly drain power and memory resources in portable devices, such as smart phones and wearables. As a result, in addition to algorithm design, it is also important to consider the hardware because it will impact your speed, your energy, your power consumption and ultimately your costs.
While there has been a lot of focus on the design of deep learning algorithms, we felt there was a strong need for a course that specifically focuses designing efficient deep learning systems to help address the important real-word challenges that one is faced with when deploying deep learning solutions.
Q. What factors do you need to consider in order to have a usable and feasible deep learning system?
Ans. There are five key metrics to consider when evaluating or designing deep learning systems, which are outlined below. The goal is to have an efficient trade-off between all these different metrics.
Accuracy – you want to ensure the system you are using will produce sufficient accuracy based on the data set you are using for each particular task.
Programmability – make sure the systems are flexible enough to support multiple applications and different weights.
Energy/power – embedded devices have limited battery capacity, and in the data center you don’t want to generate too much heat.
Throughput/Latency – For real-time applications, you want your hardware to operate at a reasonable speed; for instance, with video you want to reach a frame rate of 30 frames per second. For interactive applications such as navigation, you also need to consider the minimum reaction time.
Cost – one primary cost is the cost of the chip itself. Chips that are larger with a higher number of cores and more memory are more expensive.
Q. What can you do to simultaneously address all these factors and efficient trade-offs?
Ans. That’s exactly what we aim to cover in the course. There are basically three main areas that can help you achieve this. One way is to look at how to map deep learning algorithms onto different platforms to efficiently exploit the processing units and the storage hierarchy. But you can also develop more custom hardware that efficiently supports deep learning. In this case, we should focus on minimizing data movement, which dominates energy consumption.
Finally, we explore how to jointly design the custom hardware and algorithms together to be more efficient. There are also various things you can do to tweak the algorithms and optimize them for custom hardware. For example, you can change the bit width of the value you use – rather than using 32 bits, you go down to 8 or even one bit – or remove parts of the network that have little effect on the accuracy. You can also change the number of layers, or shapes of filters to reduce memory access.
My goal in offering this course is to show how we can essentially have our cake and eat it too: achieve high accuracy while also meeting energy and speed requirements. I want participants to walk away with a better understanding of the key design considerations for deep learning systems as well as the trade-offs between various hardware architectures and platforms.
Deep Learning is poised to change the path for technology for the next decade. But in the path of innovation for those looking to unleash the full power of AI, stands a key hurdle.
Designing efficient systems to support deep learning processing is a crucial step towards enabling wider deployment, particularly for embedded applications like mobile technologies, Internet of Things (IOT), and drones.