Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems

100%

Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems

ABSTRACT The traditional approach to distributed deep neural network inference in edge computing systems is data-distributed inference. In this paradigm, each worker has a pre-trained DNN model. Using the DNN model, the worker processes the data that is offloaded to itself. The data-distributed inference approach has high communication cost especially when the size of data is large, and it is not efficient in terms of memory as the whole model should be stored and computed in each worker. Model-distributed inference is emerging as a promising solution, where a DNN model is distributed across workers. Although there is a huge amount of work on model-distributed training, the benefit of model distribution for inference is not understood well. In this paper, we analyze the potential of model-distributed inference in edge computing systems. Then, we develop an Adaptive and Resilient Model-Distributed Inference algorithm based on our optimal model allocation formulation. AR-MDI performs model allocation in a lightweight and decentralized way and it is resilient against delayed workers and failures. We implement AR-MDI in a real testbed consisting of NVIDIA Jetson TX two's and show that AR-MDI improves the inference time significantly as compared to baselines when the size of data is large, such as ImageNet.

One. INTRODUCTION

Modern edge devices such as drones, autonomous robots, sensors, and self-driving cars are generating data at tremendous rates. Many applications that execute on these devices are delay-sensitive, meaning that the data generated by the applications should be processed as quickly as possible. For this purpose, transmission of the generated data to a remote cloud may be unacceptable due to transmission delays. Thus, data should be processed near its place of origin, i.e., on or near the edge. One complication in this context is that the edge devices are typically limited in terms of computation power, energy, and/or memory. Hence, the design of high-performance distributed data processing methods is crucial.

The traditional approach to distributed deep neural network inference is data-distributed inference, which partitions and distributes data to workers as illustrated in Figure one. The workers are comprised of edge servers, end users, and/or remote cloud if available. An end user, which would like to classify input data, offloads data to workers for classification. The end user itself could function as one of the workers by processing some of its own data. Each worker keeps a pre-trained DNN model, processes the offloaded data, and sends the output back to the end user. This approach, although very straightforward, has two disadvantages: Communication cost is high especially when input data size is large i.e., high resolution data; and Each worker should store the whole model, which puts a strain especially on end user devices.

Model-distributed inference also called model parallelism is emerging as a promising solution, where a DNN model is distributed across workers, Figure two. The end user, which has input data, may process a few layers of a DNN model, and transmits the activation vector of its last layer to a neighboring node. The neighboring node receives an activation vector and performs the calculations of the layers that are assigned to it. Finally, the worker that calculates the last layers of the DNN model obtains and sends the output back to the end user that has the input data. We note that the workers perform parallel processing in this setup by pipelining as further explained in Section three.

Although there is a huge amount of work on model-distributed training, the benefit of model distribution for inference is not understood well. The potential of model distribution for training is obvious. Indeed, it is indispensable in data-distributed training to exchange the whole model among workers and a model aggregator parameter server for every batch of data, which introduces huge amount of communication cost. On the other hand, thanks to distributing the DNN model, model-distributed training requires to exchange only activation vectors among workers, not the whole model. Thus, model-distributed training reduces the communication cost as compared to data-distributed training.

The potential of model distributed inference in terms of reducing the communication cost is less obvious. While data-distributed inference requires the exchange of actual data Figure one, model-distributed inference needs to exchange activation vectors. We observed that when the size of data is large, exchanging the actual data introduces higher communication cost, which makes model-distributed inference plausible. Building on this observation, we analyze the potential of model-distributed inference as compared to data-distributed inference in a homogeneous setup, where all workers have the same amount of computing power.

It is crucial to exploit the potential of model-distributed inference in a heterogeneous and dynamic setup, where the computing power of workers may be different and change over time. A model partitioning mechanism based on dynamic programming is proposed for this purpose. However, this approach introduces too much computing cost to determine the optimal model allocation. Also, it is not adaptive to time-varying resources. Instead, we design a lightweight, adaptive, and decentralized model allocation mechanism, which we name Adaptive and Resilient Model-Distributed Inference based on the solution to our optimal model-allocation formulation.

One of the weaknesses of model distribution as compared to data distribution is its vulnerability to failing workers. For example, if one of the workers in Figure two fails, the whole system fails. Thus, we design a recovery mechanism as part of our AR-MDI algorithm. The recovery mechanism of AR-MDI is inspired by the peer management mechanism of Chord-like P two P systems as further detailed in Section five. The following are the key contributions of this work:

We provide inference time analysis for both model- and data-distributed inference in a homogeneous setup, and show that model-distributed inference has smaller inference time if the size of input data is large.

We formulate a model-allocation problem across workers for model-distributed inference in a heterogeneous setup. Building on the solution to the optimization problem, we design a lightweight, adaptive, and decentralized model allocation algorithm, which we name Adaptive and Resilient Model-Distributed Inference algorithm.

We fortify our AR-MDI algorithm with a recovery mechanism against delayed and failing workers.

We implemented AR-MDI as well as baselines; EdgePipe and Data-Distributed Inference in a heterogeneous testbed of NVIDIA Jetson TX two cards. Our experiments including CIFAR ten and ImageNet datasets, and VGG sixteen and MobileNetV two DNN models show that AR-MDI significantly reduces the data inference time as compared to the baselines.

The structure of the rest of this paper is as follows. Section two presents the related work. Section three introduces our system model and provides preliminaries on model-distributed inference. Section four analyzes the potential of model-distributed inference for the case of homogeneous transmission delays and worker computing powers. We formulate an optimal model allocation problem for the heterogeneous setup, and design our Adaptive and Resilient Model-Distributed Inference algorithm based on the structure of the optimal solution in Section five. In Section six, we provide experimental results on a real-life testbed. Section seven concludes the paper.

Two. RELATED WORK

Three. Model and Preliminaries

Four. Potential of Model Distributed Inference

V. A R-M D I: ADAPTIVE AND RESILIENT M D I

Six. Experimental Results

A. Datasets, D N N Models and Testbed Description

B. Results

Seven. Conclusion

Overview

The document analyzes the potential of model-distributed inference in edge computing, presenting an Adaptive and Resilient Model-Distributed Inference (AR-MDI) algorithm to optimize model allocation. AR-MDI is implemented in a testbed, demonstrating significant reductions in inference time compared to traditional methods.

Key Points

1Model-distributed inference can reduce communication costs compared to data-distributed approaches
2The AR-MDI algorithm offers a lightweight and decentralized model allocation solution
3Resilience against delayed workers and failures is a key feature of AR-MDI
4Real testbed experiments show that AR-MDI significantly improves inference time with large data sets

Details

Authors: PENGZHEN LI, ERDEM KOYUNCU, HULYA SEFEROGLU
Category: Technology and Engineering

PDF
KarGO: A Smarter Mobile Platform for Tricycle Transportation
KarGO is a mobile platform designed to optimize tricycle transportation in the Philippines, making it easier for users to book rides and helping registered drivers find more passengers, while ensuring safety and convenience through technology.
PDF
KarGO: A Smarter Transportation Solution for Tricycles
This document introduces KarGO, a mobile platform designed to improve the tricycle transportation experience for passengers and drivers in the Philippines. It outlines how users can book rides or deliveries and emphasizes the convenience and safety features of the app.
PDF
KarGO: A Smarter Way to Move Your Community
KarGO is a mobile platform designed to improve transportation for passengers and tricycle drivers in the Philippines, allowing users to book rides, track trips in real-time, and utilize cashless payments.
PDF
Introducing KarGO: A Smarter Transportation Solution for Tricyle Services
KarGO is a mobile platform designed to streamline tricycle transportation in the Philippines, allowing passengers to easily book rides and drivers to find more opportunities. The platform enhances safety for school transportation with real-time GPS tracking and facilitates cashless transactions.
PDF
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
This comprehensive survey explores Cognitive Edge Computing as a methodology for deploying advanced AI models and agents on resource-constrained edge devices. It examines model optimization, system architecture, and adaptive intelligence necessary for effective cognitive processing in such environments.