Self-Supervised Hypergraph Learning for Enhanced Multimodal Representation

Articolo

Data di Pubblicazione:

2024

Abstract:

Hypergraph neural networks have gained substantial popularity in capturing complex correlations between data items in multimodal datasets. In this study, we propose a novel approach called the self-supervised hypergraph learning (SHL) framework that focuses on extracting hypergraph features to improve multimodal representation. Our method utilizes a dual embedding strategy and leverages SHL to improve the accuracy and robustness of the model. To achieve this, we employ a hypergraph learning framework to extract global context effectively by capturing rich inter-modal dependencies. Additionally, we introduce a novel self-supervised learning (SSL) component that utilizes the interaction graph data, thereby strengthening the robustness of the model. By jointly optimizing hypergraph feature extraction and SSL, SHL significantly improves the performance of multimodal representation tasks. To validate the effectiveness of our approach, we construct two comprehensive multimodal micro-video recommendation datasets using publicly available data (TikTok and MovieLens-10M). Prior to dataset creation, we meticulously handle invalid entries and outliers and complete missing mode information using external auxiliary sources, such as YouTube. These datasets are made publicly available to the research community for evaluation purposes. Experimental results on the above recommendation datasets demonstrate that the proposed SHL approach outperforms state-of-the-art baselines, highlighting its superior performance in multimodal representation tasks.

Tipologia CRIS:

14.a.1 Articolo su rivista

Keywords:

hypergraph neural networks; micro-video; Multimodal; self-supervised learning

Elenco autori:

Shu, H.; Meng, C.; De Meo, P.; Wang, Q.; Zhu, J.

Autori di Ateneo:

DE MEO Pasquale

Link alla scheda completa:

https://iris.unime.it/handle/11570/3291491

Pubblicato in:

IEEE ACCESS

Journal