KubeFM podcast | Gratis online luisteren

101 afleveringen

1 Million Tokens Per Second on Kubernetes, with Federico Iezzi
28-07-2026 | 47 Min.
GPU inference throughput depends on more than accelerator generation or count.
Memory bandwidth, model parallelism, cache configuration, and the load generator itself all influence measured throughput.
Federico Iezzi, Customer Engineer at Google Cloud, explains how his team achieved 1 million output tokens per second using Qwen 3.5 27B, vLLM, GKE Autopilot, and NVIDIA B200 GPUs.
The discussion covers:
Why memory bandwidth limits decode performance

How Federico chose between tensor and data parallelism

What changed after enabling multi-token prediction and reducing the KV cache footprint with FP8 quantization.

Sponsor
This episode is sponsored by LearnKube. Download the free book, The Technical Guide to Kubernetes Rightsizing, to understand what Prometheus and Grafana cannot tell you about safely reducing requests and limits.
More info
Find all the links and info for this episode here: https://ku.bz/1xD9Md0mb

Interested in sponsoring an episode? Learn more.
The Hidden Cost of Slow Autoscaling, with John Ford
19-05-2026 | 21 Min.
Forced platform migrations are usually treated as something to survive. At Scout24, a mandatory OS migration became an opportunity to rethink Kubernetes autoscaling, node provisioning, and infrastructure efficiency.
John Ford explains how Scout24 moved its EKS-based Infinity platform from a polling autoscaler and over-provisioned capacity to Karpenter and Bottlerocket. The result was faster node startup, a safer migration path, and about a 30% infrastructure reduction without major downtime.
In this interview:
Why two-minute node provisioning forced a 25% capacity buffer

How Karpenter made the Bottlerocket migration safer

What broke around EC2 metadata, AWS SDKs, and cgroups

How the new foundation enables Spot, ARM, and GPU workloads

Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/DdmVC2_7v

Interested in sponsoring an episode? Learn more.
The Namespaces Scaling Trap, with Brian Stack
12-05-2026 | 36 Min.
Most teams scale Kubernetes by thinking about pods and nodes. At Render, Brian Stack ran into a different dimension: hundreds of thousands of namespaces per cluster, multiplied across DaemonSets that list-watch every namespace.
Brian explains how Render traced the issue through Calico and Vector, worked with upstream maintainers, and turned memory profiling into operational wins: lower node costs, lighter API-server load, and faster rollouts.
In this interview:
Why namespaces can become a hidden scaling bottleneck

How DaemonSets multiply memory and control-plane pressure

How profiling, staging clusters, and upstream collaboration freed 7 TiB

Why pushing from an 80% fix to a complete fix can make teams faster

Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/0mrvCsXrV

Interested in sponsoring an episode? Learn more.
AI Agents Running Kubernetes, with Mike Solomon
05-05-2026 | 38 Min.
What happens when an AI agent stops generating Kubernetes YAML and starts operating the cluster directly?
Mike Solomon, software engineer at AIATELLA, explains how his team moved from a sprawling Helm setup to Markdown-driven infrastructure specs that Claude Code can execute, test, and refine.
You will learn
Why Helm became hard to maintain for a fast-moving medical infrastructure repo

How Claude debugged Argo, TLS conflicts, kubectl patches, and private registry credentials

How runbooks plus agent memory files capture failures so deployments become reproducible.

It is a practical look at where Kubernetes automation may be heading: less hand-written YAML, more precise intent, and a sharper definition of when the human must stay in the loop.
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/y70mLvWNs

Interested in sponsoring an episode? Learn more.
SaaS with Kubernetes Operators and Garbage Collection, with Alexander Held
28-04-2026 | 35 Min.
A single Kubernetes CRD for every service request turns small changes into full-platform reconciliations.
Alexander Held, former platform engineer at Mercedes-Benz Tech Innovation, describes a production refactor from a 2,000-line CRD to purpose-built resources and controllers. He shows how teams can model business workflows as Kubernetes APIs and then use owner references, finalizers, and events to keep platform operations predictable.
You will learn:
Why monolithic CRDs create performance and troubleshooting problems

How controllers turn database provisioning and backups into reconciliation loops

How finalizers clean up external resources such as S3 backups

Why Kubernetes events make platform workflows easier to debug

Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/TGy4Qn7Qs

Interested in sponsoring an episode? Learn more.

Meer Technologie podcasts

Trending Technologie -podcasts

Over KubeFM

Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.

Podcast website

Technologie

Luister naar KubeFM, De Technoloog | BNR en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

Zenders en podcasts om te bookmarken
Streamen via Wi-Fi of Bluetooth
Ondersteunt Carplay & Android Auto
Veel andere app-functies

App openen

Ontvang de gratis radio.net app

Zenders en podcasts om te bookmarken
Streamen via Wi-Fi of Bluetooth
Ondersteunt Carplay & Android Auto
Veel andere app-functies

KubeFM

Scan de code,
download de app,
luisteren.