You have a finished book — a PDF or EPUB file sitting on your computer. You want it to become an audiobook. Not a robotic read-aloud, but a proper audiobook with character voices, music, and sound effects.
This tutorial walks you through the conversion process step by step: extracting clean text from your file, uploading it to an AI production platform, and generating a finished audiobook export.
What You'll Need
- Your book as a PDF or EPUB file
- A Midsummerr account (free to create)
- 30–60 minutes for setup (production runs automatically after that)
One thing to confirm first: you need audiobook rights to the work. If you're self-published, you typically own them. If you're traditionally published, check your contract — audio rights are often signed to the publisher separately from print and ebook rights, and you may need their sign-off to produce an audiobook.
Ready to try it yourself?
Create your first audiobook free →Step 1: Extract Clean Text From Your File
AI audiobook production works with clean text — not formatted layouts or images. The first step is getting your book's text out of its container.
From EPUB
EPUB files are already text-based, making extraction straightforward.
Option A: Open in a word processor. Most word processors (LibreOffice Writer, Calibre's editor, Apple Pages) can open EPUB files and export to DOCX or TXT.
Option B: Use Calibre. The free ebook management tool Calibre converts between formats reliably. Open your EPUB, convert to DOCX or TXT, and save.
Option C: Copy from an ereader. Open the EPUB in an ereader app, select all text, and paste into a word processor. Clean up formatting artifacts.
From PDF
PDFs are trickier because they store visual layout, not document structure. Extraction quality depends on how the PDF was created.
Text-based PDFs (most ebook PDFs): These contain actual text data. Copy-paste usually works, or use a PDF-to-DOCX converter.
Scanned PDFs (physical book scans): These are images of pages, not text. You'll need OCR (optical character recognition) to extract text. Adobe Acrobat, Google Docs (upload and open), or free OCR tools can handle this.
Formatting cleanup: After extraction, review the text for:
- Page numbers embedded in text
- Running headers or footers
- Hyphenation artifacts (words split across lines)
- Missing or merged paragraphs
- Special characters that didn't convert properly
Spend time on cleanup. Clean input produces clean audio.
Step 2: Prepare the Manuscript
With clean text extracted, prepare it for production.
Chapter structure. Make sure chapters are clearly separated. Use consistent chapter headings (e.g., "Chapter 1", "Chapter One", or whatever your format uses). The production platform auto-detects chapter breaks.
Dialogue formatting. Verify that dialogue is properly punctuated with quotation marks and attribution tags. The AI uses these cues to assign lines to character voices.
Front and back matter. Decide what to include in the audiobook. Typically include: title page, dedication, author's note, epilogue. Typically skip: table of contents, index, bibliography, page references.
Save as DOCX or TXT. Midsummerr accepts DOCX and plain text files. If you're working with a DOCX, keep formatting minimal — bold and italic are fine, but complex layouts can cause issues.
Step 3: Upload to Midsummerr
Log in to Midsummerr and create a new project.
Upload your file. Drag your DOCX or TXT file into the upload area. The platform processes the text and organizes it into chapters.
Review chapter structure. Check that chapters were detected correctly. Combine, split, or rename chapters as needed. This is also where you remove any sections you don't want in the audiobook.
Step 4: Set Up Character Voices
The platform scans your text, identifies characters, and suggests voices.
Review character detection. Check that all characters were identified. Add any that were missed or remove minor characters that don't need distinct voices.
Preview and select voices. Listen to voice samples for each character. Choose voices that match your creative vision — age, tone, energy. The narrator gets their own distinct voice.
Adjust as needed. You can fine-tune voice characteristics to get the right feel for each character. Don't overthink this step — you can swap voices later during editing.
Step 5: Configure Sound Design
Sound design is what makes the output an audiobook rather than a narration track.
Music style. Choose the overall musical tone: orchestral, ambient, minimal, genre-specific. The platform generates original music that matches your book's mood.
Sound effects. Configure how environmental audio works. Action scenes get effects; quiet dialogue stays clean. You control the intensity.
Preview. Generate a short sample to check the overall sound balance before committing to the full book.
Step 6: Generate Your Audiobook
Click generate and let the platform work. A full-length novel typically processes in a few hours.
What happens during generation:
- Each chapter is processed with the assigned voices
- Music is composed and placed to match scene moods
- Sound effects are positioned based on text context
- Audio is mixed and balanced across all elements
You'll be notified when generation is complete.
Step 7: Review and Edit
Listen through your audiobook. Focus on:
- Voice assignments. Are character lines going to the right voices?
- Pronunciation. Are names and unusual words handled correctly?
- Pacing. Does the narration flow naturally?
- Sound balance. Is music too loud or too quiet? Are effects natural or distracting?
Make edits. Re-generate specific lines, fix pronunciation, adjust sound levels, or swap character voices. Midsummerr includes unlimited editing on all tiers. Iterate until you're satisfied. For more on the editing process, see our complete production guide.
Step 8: Export and Distribute
Download your finished audiobook files. The export is built for audiobook distribution workflows.
From here, you can:
- Distribute widely through Voices by INaudio (formerly Findaway Voices) — 40+ retailers and libraries including Apple Books, Google Play, Kobo, Everand, Storytel, and OverDrive. Audible is not included.
- Upload to Spotify directly via Spotify for Authors (a separate path since Findaway and Spotify split in August 2025)
- Sell directly through BookFunnel, Payhip, or your own website
- Reach Apple Books, Google Play, and Kobo directly via their author portals if you'd rather skip the aggregator
A note on Audible: As of 2026, ACX (the path to Audible for indie authors) does not accept AI-narrated audiobooks — it requires human narration. Audible has a separate AI-narration program, but it is invitation-only for traditional publishers — indie authors cannot opt in. Policies evolve, so verify the current rules with ACX directly before you commit to a distribution plan. If your audiobook was produced with AI, plan distribution around the platforms above. See ACX alternatives for indie authors for the full breakdown.
For the full distribution guide, see our self-publishing audiobook guide.
Cost Breakdown
| Book Length | Word Count | Self-Serve ($5/1K) | Director-Led ($10/1K) |
|---|---|---|---|
| Short novel | 50,000 words | $250 | $500 |
| Standard novel | 80,000 words | $400 | $800 |
| Long novel | 100,000 words | $500 | $1,000 |
| Epic fantasy | 150,000 words | $750 | $1,500 |
All tiers include full-cast character voices, background music, sound effects, and unlimited editing. See full pricing details.
Compare this to traditional audiobook production at $200–$400 per finished hour, and the economics are clear. For a detailed cost comparison, read our audiobook production cost breakdown.
Tips for Best Results
Clean text wins. The quality of your audiobook correlates directly with the quality of your source text. Spend extra time on text extraction and cleanup.
Genre matters. Fiction with dialogue and atmosphere benefits most from full-cast production. Non-fiction works well with single narrator and subtle music.
Listen critically. Don't just spot-check. Listen to at least the first few chapters carefully and sample later chapters. Character voice consistency across a full book is important.
Use the editing tools. First-generation output is a starting point. The editing phase is where you dial in quality. Take advantage of unlimited edits.
FAQ
Can I convert any PDF to an audiobook? Any PDF with extractable text works. Scanned PDFs (images of pages) need OCR processing first. The text quality after extraction is what matters — clean text produces better audio.
Does the EPUB/PDF formatting affect the audiobook? Only the text content is used for production. Visual formatting (fonts, colors, layout) is ignored. Chapter structure and dialogue formatting matter most.
How long does conversion take? Text extraction: 15–30 minutes depending on cleanup needed. Production setup: 15–30 minutes. Generation: a few hours for a full novel. Total active time is typically under an hour; the rest runs automatically.
Can I convert a textbook or non-fiction PDF? Yes. Non-fiction works well with single-narrator production. Tables, charts, and images won't convert to audio — you'll want to handle those sections as descriptive text or skip them.
What about DRM-protected files? Don't try to strip DRM from a retailer file — circumventing DRM is a separate legal issue from owning the rights, and we don't recommend it. Use your own unprotected source instead: the original manuscript file from your writing tool (Word, Scrivener, Google Docs export). If you're traditionally published and only have a retailer copy, ask your publisher for the unprotected source.
Start Converting
You already have the book. The audiobook is a few steps away.
Create your Midsummerr account and upload your first project. Or listen to samples to hear what the finished product sounds like before you start.
