The Podcasts Flow: From Audio to Insights
The Podcast Problem
You subscribe to amazing podcasts. Smart people sharing incredible insights. But here’s the reality:
- 2-3 hour episodes are hard to fit into your schedule
- Taking notes while listening breaks your flow
- Remembering key points days later is nearly impossible
- Searching for that one thing someone said requires re-listening to hours of audio
What if you could get structured, searchable summaries of every episode automatically?
The Korben Solution: Podcasts Flow
The podcasts flow is a complete automation pipeline that:
- Downloads new episodes from your configured RSS feeds
- Transcribes the audio to text using OpenAI Whisper
- Extracts wisdom using AI to pull out key insights
- Emails you a formatted HTML summary
All with one command:
pdm run python3 ./korben.py --flow podcasts
How It Works
Step 1: Download from RSS Feeds
Configure your podcasts in config/podcasts.yml:
days_back: 7
podcasts:
all_in: "https://feeds.megaphone.fm/all-in-with-chamath-jason-sacks-friedberg"
lennys_podcast: "https://feeds.simplecast.com/OKC4rK5v"
odd_lots: "https://feeds.bloomberg.com/odd-lots"
Korben downloads new episodes from the last 7 days (or whatever you configure). It uses CSV tracking to skip episodes you’ve already processed:
podcast_name,episode_date,downloaded,transcribed,wisdom_extracted
all_in,2025-10-15,true,true,true
lennys_podcast,2025-10-20,true,false,false
If the flow is interrupted, just run it again - it picks up where it left off.
Step 2: Transcribe with Whisper
Each MP3 gets transcribed using OpenAI’s Whisper model:
transcribe_podcasts.run()
Whisper is state-of-the-art for speech recognition:
- Handles multiple speakers
- Works with various audio quality
- Accurate with technical terms and proper nouns
- Runs locally on your machine
Transcripts are saved to data/podcasts/transcripts/ as plain text files.
Step 3: Extract Wisdom
This is where it gets interesting. For each transcript, Korben extracts:
- Summary (25 words) - The core message
- Ideas (20-50 items) - Key concepts and arguments
- Insights (10-20 items) - Refined takeaways
- Quotes (15-30 items) - Memorable statements with attribution
- Habits (15-30 items) - Practical behaviors you can adopt
- Facts (15-30 items) - Surprising or important data points
- References - Books, tools, and projects mentioned
- One-sentence takeaway - The single most important point
- Recommendations (15-30 items) - Actionable next steps
The AI enforces consistent formatting (like “exactly 16 words” for ideas), producing concise, scannable output.
Inspired by: The extract_wisdom step is heavily inspired by Fabric, a project by Daniel Miessler focused on structured wisdom extraction and personal knowledge management. Korben adapts and extends these ideas for automation workflows.
Step 4: Format and Email
The wisdom is converted from Markdown to beautiful HTML and emailed to you:
html = markdown_to_html.run(text=wisdom)
send_email.run(
recipient=your_email,
subject=f"Wisdom: {episode_title}",
content=html
)
You get a formatted email with all the insights, ready to read and reference.
Real Example: 3-Hour Podcast → 5-Minute Read
Let’s say “All-In Podcast” drops a 3-hour episode on AI regulation. Here’s what Korben does:
- Downloads the 180-minute MP3 (automatic via RSS)
- Transcribes to ~50,000 words of text (takes ~30 minutes locally)
- Extracts structured insights (takes ~2 minutes)
- Emails you a summary you can read in 5 minutes
Now you have:
- 50 key ideas from the discussion
- 20 insights you can apply
- 30 memorable quotes with attribution
- 25 actionable recommendations
- References to books, papers, and tools mentioned
All searchable, all in your inbox, zero manual effort.
The Composable Architecture
What makes this powerful is that each step is independent and reusable:
@cf.flow
def podcast_workflow():
# Domain-specific podcast tasks
download_podcasts.run(days_back=7)
transcribe_podcasts.run()
# Generic tasks composed together
for transcript in get_transcripts():
text = read_file.run(file_path=transcript)
wisdom = extract_wisdom.run(text=text)
write_file.run(file_path=wisdom_path, content=wisdom)
html = markdown_to_html.run(text=wisdom)
send_email.run(content=html)
The generic tasks (read_file, extract_wisdom, markdown_to_html, send_email) can be used for any workflow - not just podcasts.
State Tracking: Never Lose Progress
The CSV tracker is crucial for long-running workflows:
- Interrupted? Just run again, it resumes where it stopped
- New episodes? Only processes what’s new since last run
- Debugging? Check the CSV to see what succeeded/failed
- No database? Version control friendly, human readable
Getting Started
1. Install and Configure
# Clone and install
git clone <repo-url>
cd korben
pdm install
# Configure
cp config/podcasts.yml.example config/podcasts.yml
vim config/podcasts.yml # Add your podcast RSS feeds
# Set environment variables
export PERSONAL_EMAIL="[email protected]"
export POSTMARK_API_KEY="your-postmark-key"
2. Run the Flow
pdm run python3 ./korben.py --flow podcasts
3. Schedule It (Optional)
Deploy to Prefect Cloud for automatic daily runs:
prefect cloud login
pdm run python3 deployments/prefect_cloud.py
prefect worker start --pool default-pool
Now your podcasts are processed automatically every day at 6 AM.
Use Cases Beyond Podcasts
The same pattern works for:
- YouTube videos - Download transcript, extract insights
- Meeting recordings - Transcribe, summarize, email attendees
- Audiobooks - Process chapters, build a personal knowledge base
- Conference talks - Archive insights from recorded sessions
Anything with audio → text → insights can use this flow.
Try It Yourself
# Process last 7 days of configured podcasts
pdm run python3 ./korben.py --flow podcasts
# Or run individual steps
pdm run python3 ./korben.py --task download_podcasts
pdm run python3 ./korben.py --task transcribe_podcasts
# Extract wisdom from any text file
cat transcript.txt | pdm run python3 ./korben.py --task extract_wisdom
What’s Next
Future improvements:
- Multi-episode summaries - Synthesize insights across a podcast season
- Topic extraction - Automatically tag and categorize episodes
- Speaker identification - Attribute quotes to specific speakers
- Personal knowledge graph - Connect insights across different sources
Want to contribute? Check out the GitHub repo!
Conclusion
The podcasts flow turns passive listening into active learning. You don’t need to take notes, remember details, or search through hours of audio. Korben handles the mechanical work while you focus on applying the insights.
That’s agentic automation - AI doing the tedious parts so you can focus on what matters.
