[Project] Multimodal Movie Recommendation Systems

January 2021 - Present

Affiliation: Independent Research Initiative

Target Audience: Streaming Platforms, Media Labs, & Scientific Research Community

Project Ecosystem: GitHub (Popcorn) | GitHub (RAG-VisualRec) | GitHub (ViLLA-MMBench) | Paper (Popcorn) | Paper (RAG-VisualRec) | Paper (ViLLA-MMBench)

  • Multimodal Video Analytics: advanced content-based personalization ecosystem designed to parse and model rich multimedia inputs (including visual, audio, and textual data).
  • Deep Multimodal Feature Extraction: processes diverse visual, textual, and audio representations extracted directly from full-length movies, trailers, and individual cinematic sequences to build comprehensive item profiles.
  • Granular Shot-Level Video Analytics: covers LLM and deep neural network pipelines to analyze video properties down to individual movie shots, capturing rapid pacing changes, color semantics, and aesthetic features.
  • High-Fidelity Personalization Engine: constructs deep representation vectors for multimedia content, matching complex structural movie dynamics to latent user preferences for high-accuracy recommendations.
  • 💡 Stack: Python, PyTorch, HuggingFace Transformers
A. Tourani, F. Nazary, Y. Deldjoo, and T. Di Noia, "Popcorn: A Configurable Benchmark for Visual Evidence in Multimodal Movie Recommendation," 35th International ACM Conference on Knowledge and Information Management (CIKM 2026), Under Review, 2026.
DOI: 10.48550/arXiv.2606.09595
A. Tourani, F. Nazary, and Y. Deldjoo, "RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation," ACM Transactions on Recommender Systems (TORS), 2026.
DOI: 10.48550/arXiv.2506.20817
F. Nazary, A. Tourani, Y. Deldjoo, and T. Di Noia, "ViLLA-MMBench: A Unified Benchmark Suite for LLM-Augmented Multimodal Movie Recommendation," ArXiv Preprint,, 2025.
DOI: 10.48550/arXiv.2508.04206