A Serverless Quantization-as-a-Service Model to Run Compression Jobs for Edge Intelligence
Contributo in Atti di convegno
Data di Pubblicazione:
2025
Abstract:
Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.
Tipologia CRIS:
14.d.3 Contributi in extenso in Atti di convegno
Keywords:
Edge Intelligence; FaaS; Quantization; Serverless
Elenco autori:
De Novi, Danny; Dell'Acqua, Pierluigi; Carnevale, Lorenzo; Fazio, Maria; Villari, Massimo
Link alla scheda completa:
Titolo del libro:
2025 IEEE Symposium on Computers and Communications (ISCC)