sensors Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT
sensors Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT
Abstract: Motivated by the pervasiveness of artificial intelligence and the Internet of Things in the current "smart everything" scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks, despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the deep neural networks that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.
One. Introduction
One. Introduction
As a result of a steady synergy still currently in place, Internet of Things and artificial intelligence have almost simultaneously experienced outstanding progress in the past two decades, leading to the so-called AI-enabled Internet of Things and thus achieving the vision of a pervasive intelligence. Methods and technologies developed under the Internet of Things paradigm facilitate the connection of the different devices that comprise such intelligent environments and the exchange of data between them, enabling the creation and proper exploitation of new network architectures consisting of connected ambient sensing instruments and resource-constrained energy-efficient end devices (i.e., embedded devices and mobile devices), commonly referred to as user equipment in the related literature. Efforts in this regard have been primarily aimed at designing and deploying faster and more efficient network infrastructures, as well as developing more accurate sensing platforms. This has dramatically increased the capability of those systems to sense data from the physical world, thus enabling the collection and storage of large volumes of data and consequently supporting increasingly sophisticated artificial intelligence techniques-from traditional machine learning methods to more recent deep learning approaches-ultimately creating enormous opportunities for a "smart" life.
Deep neural networks have driven the evolution and subsequent consolidation of "intelligent" computing systems among the general public beyond research forums,
proving their undeniable power and achieving great success in a number of application domains, such as smart transportation, smart farming, smart manufacturing, and smart healthcare. Exploiting the sensors embedded in these systems has enabled massive data collection, nurturing deep learning algorithms and thus contributing to achieving new levels of accuracy in the delivered results. However, this accuracy has come at the cost of an increase in computational and memory resource consumption, both in terms of training the networks as well as making inferences. Moreover, in the particular case of inference, it entails more than just numerous complex computations and memory expansion of the processed deep neural network model, which are already relevant challenges for Internet of Things devices; for most of the use cases found in the typical application domains of Internet of Things systems mentioned above, time is a critical factor, and real-time prediction performance is expected from them.
Even though today there are hardware solutions specifically for artificial intelligence acceleration in embedded and mobile devices that could be considered, to some extent, as a possible answer to the above needs, they have been shown to be insufficient to efficiently address the execution of today's more sophisticated deep learning models. As a result, graphics processing unit cluster-powered cloud-based configurations are still the standard deep learning research-support infrastructure used today. Nonetheless, the reliance on remote data centers-i.e., geographically located far away from the users or the data sources-for deep neural network execution may incur prohibitive latency delays, thereby failing to meet the minimum latency goal pursued. In this context, new computing paradigms, namely Mobile Cloud Computing and Mobile Edge Computing, have emerged over the last few years as alternatives to classic cloud computing. These have also impacted the artificial intelligence domain, enabling the progressive abandonment of the latter's fully cloud-delegated processing model towards a vertically distributed computation across the user-edge-cloud continuum, bringing part of the deep neural network inference to computing tiers closer to end users, and thereby resulting in the so-called collaborative inference.
Specifically, this collaborative intelligence has been materialized at a practical level, essentially in the form of a pipeline for the execution of deep neural networks on multiple entities distributed across the different levels of computation considered, although subjected to the structural properties of the models utilized. In this sense, although it is an approach that succeeds in generating segments of reduced size and complexity through the partitioning of deep networks, allowing devices with limited capabilities to take on some of the load and offload the heavier subtasks to nodes at higher layers of the hierarchy, its effectiveness is to some extent undermined or constrained by the challenging issues that remain, such as the significant distance between nodes, the adoption of the layer as the minimum partitioning unit, and the inter-layer data dependency inherent to deep neural network models. Hence, such an approach results in end-to-end latency numbers that, dominated by data communication times, fail to achieve the real-time goal; it also allows the generation of partitions that, even in their minimum expression, may result in a memory footprint and computational load excessive for Internet of Things devices; and, last but not least, it further penalizes the overall co-inference performance by preventing the processing of deep neural network partitions to be handled concurrently.
Given these limitations, recent studies have explored other alternatives of computation at the edge, introducing novel methods and strategies aimed at better leveraging the distributed resources within the same computing tier towards a future vision of extremely interoperable and flexible artificial intelligence-capable Internet of Things systems. Specifically, in this context, such research has pursued the design of cooperative inference mechanisms that, unlike those mentioned above, demonstrate the capability to speed up the execution of deep learning tasks by partitioning the workload and distributing the resulting deep neural network segments horizontally across the devices within an edge cluster, whether this is referred to as a mini-cloud (i.e., a cluster of computers within the same local area network), a micro-cloud (i.e., an infrastructure-independent and easily portable assembly of small computers), an ad hoc cloudlet (a cluster consisting of mobile devices interconnected via short-range radio communication technologies), or a fog network (perhaps the most representative term, referring to an architecture consisting of end-user clients or near-user edge devices that can alternatively cooperate and support machine-to-machine-based service provisioning in a distributed manner).
In such a context, determining how to efficiently partition, distribute, and schedule deep neural network inference within such an environment, considering the significant heterogeneity of devices regarding their capability, including edge servers equipped with graphics processing units, low-power single-board computers such as the Raspberry Pi, and smartphones with multi-purpose Systems of a Chip, and the dynamic network conditions, continues to pose major challenges that make the edge-based efficient artificial intelligence service provisioning an open and still relevant research problem. For this reason, in the present work, we conduct an in-depth study of the most relevant aspects of this edge intelligence. This research, far from being an exhaustive or systematic review of the related literature, aims to be a comprehensive and gentle introductory guide to those techniques and methods that have proven to be highly successful in exploiting the aggregated computational and memory resources of in-cluster Internet of Things nodes to address highly complex deep neural network tasks in a timely manner and to ultimately deliver a desirable quality of service despite the acute resource limitations of the interconnected devices.
It should be noted, however, that, while computation offloading at the edge and the distribution and deployment of deep learning solutions on such computing environments are still emerging topics that have gained momentum over the past five years, both have already given rise to a vast corpus of scientific articles, leading to a fairly important number of surveys, as illustrated in Table one. Specifically, we found sixteen papers focused on EI that provide an extensive overview of the current state of the art in the topic space. They guide the reader through a comprehensive collection of methods and technologies designed to better leverage edge infrastructures for DNN training but primarily for the execution of such DL models.
Regarding inference, an extended trend is particularly noticeable among the authors on providing an overview of architectures and workflows for enabling DNN processing at the edge, giving particular attention to techniques that make DL models applicable for direct deployment and local execution on resource edge devices by creating lightweight architectures from scratch, i.e., naturally suitable for edge environments, or by adjusting existing DNNs to reduce their complexity and size; and strategies pursuing the realization of an offloading-based collaborative inference across multiples devices located either at the same tier or in different computing levels. Moreover, most authors extend these by introducing core concepts and providing extensive background on other EI-specific matters such as the most representative application scenarios, the software and hardware infrastructure for facilitating EI, and the most relevant challenges that need to be faced for its realization, i.e., model partitioning, communication, edge coordination, and more AI-related challenges.
Although the studies cited cover a wide range of relevant topics at the intersection of edge computing and AI, and may serve as a good starting point for gaining an understanding of the distributed execution of DL algorithmic solutions and for establishing a foundation for the knowledge that will be progressively solidified throughout the rest of this paper, they go beyond the scope of the studies cited, providing a higher-level overview of edge-based AI and, as a result, differing in the thematic core and the level of detail embraced. Overall, the narrative style used adeptly guides the reader through the various approaches conceived, introducing the related concepts pertinent in each case without delving too deeply, omitting a significant number of the underlying design considerations, and thus falling short when it comes to discussing the techniques and methods presented. To fill this gap, our work supplements the existing body of literature and provides a comprehensive and in-depth review of the most salient studies published that have led to the emergence of cooperative intelligence solutions in IoT environments, discussing the specific approaches and strategies conceived for partitioning and parallelizing DNNs, providing an in-depth treatment of the decision-making process required, and, finally, giving details on both the different challenges or issues that have emerged in this domain and the specific solutions conceived in response.
neural networks, split machine learning design, and hybrid DNN computation. The list of one hundred thirty-two entries obtained from the first cursory reading of the abstract and the conclusions of each paper was then refined by an in-depth reading and thorough analysis of these research efforts of potential interest, assessing their quality and excluding those considered outside the domain of study, either due to having no place in the IoT paradigm-belonging instead to other closely related fields, such as the already mentioned MCC, MEC, or CC-or because, despite being under the IoT umbrella and being also related to the distributed computation model, they were oriented instead to locally distributed solutions relying on modern SoCs for parallelization.
The methodology for examining and selecting the literature just outlined resulted in a body of work that, albeit small, made it possible to clearly outline the most relevant aspects of cooperative intelligence in IoT contexts. More specifically, all these aspects are developed throughout this document according to the following structure. The second section puts the study in context by briefly presenting some of the most relevant milestones in the evolution of both AI and computing platforms towards deep learning at the edge. Specifically, and with regard to this last point, Section three examines the various research efforts on the distribution of DL workloads within IoT clusters, establishing a taxonomy of the primary parallelism strategies and partitioning schemes proposed, and elaborating on their most pertinent aspects at the practical level, the pillars of the decision making required for their proper configuration, as well as the specific challenges that arise when addressing the setup and the effective exploitation of such mechanisms. Section four shares the observations drawn from the state of the art and highlights the still open research challenges to be addressed in future work, while Section five brings the survey to a close by presenting the conclusions.