KubeFM

KubeFM
KubeFM
Nieuwste aflevering

93 afleveringen

  • KubeFM

    GPU Containers as a Service, with Landon Clipp

    24-03-2026
    Running GPU workloads on Kubernetes sounds straightforward until you need to isolate multiple tenants on the same server. The moment you virtualize GPUs for security, you lose access to NVIDIA kernel drivers — and almost every tool in the ecosystem assumes those drivers exist.
    Landon Clipp built a GPU-based Containers as a Service platform from scratch, solving each isolation layer — from kernel separation with Kata Containers + QEMU to NVLink fabric partitioning to network policies with Cilium/eBPF — and shares exactly what broke along the way.
    In this interview:
    Why standard NVIDIA tooling (GPU Operator) fails in multi-tenant setups, and how to use CDI with PCI topology scanning to make GPUs visible to Kubernetes without kernel drivers

    How to partition the NVLink fabric between tenants using a trusted service VM running Fabric Manager, and why the physical PCIe wiring differs between Supermicro HGX and NVIDIA DGX systems

    Why gVisor doesn't work for GPU workloads — NVIDIA's unstable ioctl ABI means Google has to update gVisor for every driver release, and they only support a handful of GPUs

    What caused 8-GPU VMs to take 30+ minutes to boot, and the specific fixes (IOMMUFD, cold plugging, kernel upgrades) that brought it down to minutes

    How Cilium network policies enforce tenant isolation at the Kubernetes identity level instead of fragile IP-based rules

    Where Containers as a Service fits best: inference workloads where AI teams want to ship an OCI image without managing infrastructure or signing multi-million dollar cluster contracts.
    Sponsor
    This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
    More info
    Find all the links and info for this episode here: https://ku.bz/jjK_yJTDz

    Interested in sponsoring an episode? Learn more.
  • KubeFM

    How We Cut Build Debugging Time by 75% with AI, with Ron Matsliah

    17-03-2026 | 20 Min.
    Build failures in Kubernetes CI/CD pipelines are a silent productivity killer. Developers spend 45+ minutes scrolling through cryptic logs, often just hitting rerun and hoping for the best.
    Ron Matsliah, DevOps engineer at Next Insurance, built an AI-powered assistant that cut build debugging time by 75% — not as a dashboard, but delivered directly in Slack where developers already work.
    In this episode:
    Why combining deterministic rules with AI produces better results than letting an LLM guess alone

    How correlating Kubernetes events with build logs catches spot instance terminations that produce misleading errors

    Why integrating into existing workflows and building feedback loops from day one drove adoption

    The prompt engineering lessons learned from testing with real production data instead of synthetic examples

    The takeaway: simple rules plus rich context consistently outperform complex AI queries on their own.
    Sponsor
    This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
    More info
    Find all the links and info for this episode here: https://ku.bz/PDdYfC00w

    Interested in sponsoring an episode? Learn more.
  • KubeFM

    Migrating Kubernetes Off Big Cloud, with Fernando Duran

    10-03-2026 | 25 Min.
    Managed Kubernetes on a major cloud provider can cost hundreds or even thousands of dollars a month — and much of that spending hides behind defaults, minimum resource ratios, and auxiliary services you didn't ask for.
    Fernando Duran, founder of SadServers, shares how his GKE Autopilot proof of concept ran close to $1,000/month on a fraction of the CPU of the actual workload and how he cut that to roughly $30/month by moving to Hetzner with Edka as a managed control plane.
    In this interview:
    Why Kubernetes hasn't delivered on its original promise of cost savings through bin packing — and what it actually provides instead

    A real cost comparison: $1,000/month on GKE vs. $30/month on Hetzner with Edka for the same nominal capacity

    What you need to bring with you (observability, logging, dashboards) when leaving a fully managed cloud provider

    The decision comes down to how tightly coupled you are to cloud-specific services and whether your team can spare the cycles to manage the gaps.
    Sponsor
    This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
    More info
    Find all the links and info for this episode here: https://ku.bz/6nSDbz9m4

    Interested in sponsoring an episode? Learn more.
  • KubeFM

    Migrating to Karpenter: Fun Stories, with Adhi Sutandi

    03-03-2026 | 1 u. 1 Min.
    Running multiple Kubernetes clusters on AWS with the cluster autoscaler? Every four months, you face the same grind: upgrading Kubernetes versions, recreating auto scaling groups, and hoping instance type changes stick.
    Adhi Sutandi, DevOps Engineer at Beekeeper by LumApps, shares how his team migrated from the cluster autoscaler to Karpenter across eight EKS clusters — and the hard lessons they learned along the way.
    In this episode:
    Why AWS auto scaling groups are immutable and how that creates upgrade bottlenecks at scale

    How the latest AMI tag accidentally turned less critical clusters into chaos engineering environments, dropping SLOs before anyone realized Karpenter was the cause

    Why pre-stop sleep hooks solved pod restartability problems that Quarkus's built-in graceful shutdown couldn't

    The case for pod disruption budgets over Karpenter annotations when protecting critical workloads during node rotations

    How Karpenter's implicit 10% disruption budget caught the team off guard — and the explicit configuration that fixed it

    Sponsor
    This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
    More info
    Find all the links and info for this episode here: https://ku.bz/XyVfsSQPr

    Interested in sponsoring an episode? Learn more.
  • KubeFM

    From ECS to Kubernetes: A Real Migration Story, with Radosław Miernik

    24-02-2026 | 38 Min.
    Migrating from ECS to Kubernetes sounds straightforward — until you hit spot capacity failures, firewall rules silently dropping traffic, and memory metrics that lie to your autoscaler.
    Radosław Miernik, Head of Engineering at aleno, walks through a real production migration: what broke, what they missed, and the fixes that made it work.
    In this interview:
    Running Flux and Argo CD together — Flux for the infra team, Argo CD's UI for developers who don't want to touch YAML

    How the wrong memory metric caused OOM errors, and why switching to jemalloc cut memory usage by 20%

    Splitting WebSocket and API containers into separate deployments with independent autoscaling

    Four months of migration, over 100 configuration changes in the first month, and a concrete breakdown of what platform work looks like when you can't afford downtime.
    Sponsor
    This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
    More info
    Find all the links and info for this episode here: https://ku.bz/x6wFMhVsx

    Interested in sponsoring an episode? Learn more.

Meer Technologie podcasts

Over KubeFM

Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
Podcast website

Luister naar KubeFM, Hard Fork en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies