
Video Recommendations in Industry
26-12-2025 | 38 Min.
In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial role of homepage curation, and how human curators help by contextualizing content, cleaning data, and identifying positive feedback loops that algorithms might miss. The conversation covers practical challenges like measuring "surprise and delight," the content deluge created by democratized creation tools, and why trust in tech companies is essential for better personalization. Cory emphasizes that discovery is "a good type of friction" and explains how the CODE framework (Capture, Organize, Distill, Express, plus Analysis) guides professional curation work. Looking to the future, they discuss the need for systems thinking that creates narrative connections between content, the potential for conversational AI to help users articulate preferences, and why diverse perspectives beyond engineering are crucial for building effective discovery systems. Resources mentioned include the newsletter "Top Information Retrieval Papers of the Week" and Notebook LM for synthesizing research.

Eye Tracking in Recommender Systems
18-12-2025 | 52 Min.
In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrates how eye tracking can uncover insights about positional bias and user engagement that traditional click data misses. Beyond the technical aspects, Santiago addresses the ethical considerations surrounding eye tracking data, particularly concerning pupil data and privacy. He emphasizes the importance of questioning assumptions in recommender systems and shares practical advice for improving recommendation algorithms by understanding actual user behavior rather than relying solely on click patterns. Looking forward, Santiago discusses exciting future directions including simulating user behavior using eye tracking data, addressing the cold start problem, and translating these findings to e-commerce applications. This conversation challenges researchers and practitioners to think more deeply about de-biasing clicks and leveraging eye tracking as a powerful tool to enhance user experience in recommendation systems.

Cracking the Cold Start Problem
08-12-2025 | 39 Min.
In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.

Designing Recommender Systems for Digital Humanities
23-11-2025 | 36 Min.
In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challenges of building a recommender system for cultural heritage materials, including dealing with sparse user-item interaction matrices, the cold start problem, and the need for multi-modal similarity approaches that can handle text, images, metadata, and historical context. The platform leverages various embedding techniques and gives users control over weighting different modalities—whether they're searching based on text similarity, visual imagery, or diplomatic features like issuers and receivers. A key insight from Florian's research is the importance of balancing serendipity with utility, collection representation to prevent bias, and system explainability while maintaining effectiveness. The discussion also touches on unique evaluation challenges in non-commercial recommendation contexts, including Florian's "research funnel" framework that considers discovery, interaction, integration, and impact stages. Looking ahead, Florian envisions recommendation systems becoming standard tools for exploration across digital archives and cultural heritage repositories throughout Europe, potentially transforming how researchers discover and engage with historical materials. The new version of Monasterium.net, set to launch with enhanced semantic search and recommendation features, represents an important step toward making cultural heritage more accessible and discoverable for everyone.

DataRec Library for Reproducible in Recommend Systems
13-11-2025 | 32 Min.
In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommenders, privacy considerations, and recommendation novelty. He explains why small modifications in datasets can significantly impact research outcomes, the importance of offline evaluation, and DataRec's vision as a lightweight library that integrates with existing frameworks rather than replacing them. Whether you're benchmarking new algorithms or exploring recommendation techniques, this episode offers practical insights into one of the most critical yet overlooked aspects of reproducible ML research.



Data Skeptic