Ali Tourani | Projects | Movie RecSys

[Project] Multimodal Movie Recommendation Systems

January 2021 - Present

Affiliation: Independent Research Initiative

Target Audience: Streaming Platforms, Media Labs, & Scientific Research Community

Multimodal Video Analytics: advanced content-based personalization ecosystem designed to parse and model rich multimedia inputs (including visual, audio, and textual data).
Deep Multimodal Feature Extraction: processes diverse visual, textual, and audio representations extracted directly from full-length movies, trailers, and individual cinematic sequences to build comprehensive item profiles.
Granular Shot-Level Video Analytics: covers LLM and deep neural network pipelines to analyze video properties down to individual movie shots, capturing rapid pacing changes, color semantics, and aesthetic features.
High-Fidelity Personalization Engine: constructs deep representation vectors for multimedia content, matching complex structural movie dynamics to latent user preferences for high-accuracy recommendations.
💡 Stack: Python, PyTorch, HuggingFace Transformers

A. Tourani, F. Nazary, Y. Deldjoo, and T. Di Noia, "Popcorn: A Configurable Benchmark for Visual Evidence in Multimodal Movie Recommendation," 35th International ACM Conference on Knowledge and Information Management (CIKM 2026), Under Review, 2026.
DOI: 10.48550/arXiv.2606.09595

A. Tourani, F. Nazary, and Y. Deldjoo, "RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation," ACM Transactions on Recommender Systems (TORS), 2026.
DOI: 10.1145/3818681

F. Nazary, A. Tourani, Y. Deldjoo, and T. Di Noia, "ViLLA-MMBench: A Unified Benchmark Suite for LLM-Augmented Multimodal Movie Recommendation," ArXiv Preprint,, 2025.
DOI: 10.48550/arXiv.2508.04206