[Project] Multimodal Movie Recommendation Systems
January 2021 - Present
Affiliation: Independent Research Initiative
Target Audience: Streaming Platforms, Media Labs, & Scientific Research Community
Project Ecosystem: GitHub (Popcorn) | GitHub (RAG-VisualRec) | GitHub (ViLLA-MMBench) | Paper (Popcorn) | Paper (RAG-VisualRec) | Paper (ViLLA-MMBench)
- Multimodal Video Analytics: advanced content-based personalization ecosystem designed to parse and model rich multimedia inputs (including visual, audio, and textual data).
- Deep Multimodal Feature Extraction: processes diverse visual, textual, and audio representations extracted directly from full-length movies, trailers, and individual cinematic sequences to build comprehensive item profiles.
- Granular Shot-Level Video Analytics: covers LLM and deep neural network pipelines to analyze video properties down to individual movie shots, capturing rapid pacing changes, color semantics, and aesthetic features.
- High-Fidelity Personalization Engine: constructs deep representation vectors for multimedia content, matching complex structural movie dynamics to latent user preferences for high-accuracy recommendations.
- 💡 Stack: Python, PyTorch, HuggingFace Transformers
A. Tourani, F. Nazary, Y. Deldjoo, and T. Di Noia,
"Popcorn: A Configurable Benchmark for Visual Evidence in Multimodal Movie Recommendation,"
35th International ACM Conference on Knowledge and Information Management (CIKM 2026),
Under Review, 2026.
DOI: 10.48550/arXiv.2606.09595
DOI: 10.48550/arXiv.2606.09595
A. Tourani, F. Nazary, and Y. Deldjoo,
"RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in
Recommendation,"
ACM Transactions on Recommender Systems (TORS), 2026.
DOI: 10.48550/arXiv.2506.20817
DOI: 10.48550/arXiv.2506.20817
F. Nazary, A. Tourani, Y. Deldjoo, and T. Di Noia,
"ViLLA-MMBench: A Unified Benchmark Suite for LLM-Augmented Multimodal Movie Recommendation,"
ArXiv Preprint,, 2025.
DOI: 10.48550/arXiv.2508.04206
DOI: 10.48550/arXiv.2508.04206