A Serverless Quantization-as-a-Service Model to Run Compression Jobs for Edge Intelligence

Contributo in Atti di convegno

Data di Pubblicazione:

2025

Abstract:

Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.

Tipologia CRIS:

14.d.3 Contributi in extenso in Atti di convegno

Keywords:

Edge Intelligence; FaaS; Quantization; Serverless

Elenco autori:

De Novi, Danny; Dell'Acqua, Pierluigi; Carnevale, Lorenzo; Fazio, Maria; Villari, Massimo

Autori di Ateneo:

CARNEVALE Lorenzo

DELL'ACQUA PIERLUIGI

FAZIO Maria

VILLARI Massimo

Link alla scheda completa:

https://iris.unime.it/handle/11570/3353929

Titolo del libro:

2025 IEEE Symposium on Computers and Communications (ISCC)