Pricing overview

AWS Polly's pricing structure is primarily based on the volume of characters converted into speech. The service offers a pay-as-you-go model, meaning users are charged only for the resources consumed without upfront commitments or minimum fees. Costs vary depending on the type of voice selected: Standard, Neural (NTTS), or Long-form (LF-NTTS) voices, with Neural and Long-form voices generally incurring higher per-character rates due to their advanced synthesis capabilities.

The pricing model includes tiered discounts, where the per-character rate decreases as the monthly usage volume increases. This structure benefits applications with high text-to-speech conversion demands. All charges are calculated monthly based on the total number of characters sent to the Polly API for synthesis. Additional costs may apply for storing generated audio files in Amazon S3 or for data transfer out of AWS, which are separate services with their own pricing models. For a comprehensive breakdown, refer to the official AWS Polly pricing page.

Plans and tiers

AWS Polly categorizes its text-to-speech services into different voice types, each with a distinct pricing tier. The primary distinction is between Standard voices, Neural (NTTS) voices, and Long-form Neural (LF-NTTS) voices. Standard voices utilize concatenative synthesis, offering a broad range of languages and voices at a lower cost. Neural voices employ deep learning technologies to produce more natural-sounding speech, closely mimicking human intonation and rhythm, which comes at a higher per-character price. Long-form Neural voices are optimized for extended audio content, such as audiobooks and articles, providing enhanced naturalness over longer durations.

Voice types and their corresponding pricing tiers are designed to accommodate different application requirements and budget considerations. For instance, applications requiring highly natural and expressive speech for customer-facing interactions might opt for Neural voices, while those needing functional speech for internal tools or large-scale content generation might find Standard voices more cost-effective. The pricing tiers outlined below reflect the base rates and are subject to regional variations and potential volume discounts, which are automatically applied based on monthly character usage.

Voice Type Base Price (per 1M characters) Key Characteristics Best For
Standard Voices $4.00 Concatenative synthesis, broad language support, functional speech. High-volume content, internal applications, basic audio prompts.
Neural Voices (NTTS) $16.00 Deep learning-based, highly natural and expressive speech, enhanced prosody. Customer service, voice assistants, premium content, dynamic narrations.
Long-form Neural Voices (LF-NTTS) $100.00 Optimized for extended content, superior naturalness over long durations, supports SSML. Audiobooks, articles, podcasts, long-form educational content.

It is important to note that these prices are for the synthesis service itself. Any storage of generated audio files in Amazon S3 or data transfer out of AWS will incur additional charges based on the respective Amazon S3 pricing and AWS data transfer rates.

Free tier and limits

AWS offers a generous free tier for new AWS customers to explore AWS Polly's capabilities without immediate cost. This free tier is available for the first 12 months after signing up for an AWS account. It provides a specific allowance of characters for both Standard and Neural voices, enabling developers to test and integrate text-to-speech functionalities into their applications.

  • Standard Voices: 5 million characters per month.
  • Neural Voices: 1 million characters per month.

The free tier resets monthly, allowing continuous experimentation and development within the specified limits. Once the 12-month period expires or if usage exceeds the free tier limits, standard pay-as-you-go rates apply. This free tier is designed to facilitate initial development and small-scale deployments, providing ample room to evaluate the service's performance and cost-effectiveness before committing to paid usage. For specific details on eligibility and usage tracking, users should consult the AWS Free Tier page.

Real-world cost examples

Understanding real-world costs for AWS Polly involves calculating character usage based on typical application scenarios. Here are a few examples illustrating potential monthly expenses:

Example 1: Small Blog with Audio Articles

  • Scenario: A small blog publishes 10 articles per month, each averaging 5,000 characters. All articles use Standard voices.
  • Monthly Characters: 10 articles * 5,000 characters/article = 50,000 characters.
  • Calculation: (50,000 characters / 1,000,000 characters) * $4.00 = $0.20.
  • Estimated Monthly Cost: $0.20. (This would likely be covered by the free tier for new users).

Example 2: Medium-sized E-learning Platform

  • Scenario: An e-learning platform generates audio for 20 new lessons per month, each lesson containing 20,000 characters. The platform uses Neural voices for enhanced learning experience.
  • Monthly Characters: 20 lessons * 20,000 characters/lesson = 400,000 characters.
  • Calculation: (400,000 characters / 1,000,000 characters) * $16.00 = $6.40.
  • Estimated Monthly Cost: $6.40.

Example 3: Large-scale Audiobook Publisher

  • Scenario: An audiobook publisher converts 5 full-length novels into audio per month, each novel averaging 1,000,000 characters. They utilize Long-form Neural voices.
  • Monthly Characters: 5 novels * 1,000,000 characters/novel = 5,000,000 characters.
  • Calculation: (5,000,000 characters / 1,000,000 characters) * $100.00 = $500.00.
  • Estimated Monthly Cost: $500.00.

Example 4: Interactive Voice Assistant

  • Scenario: An interactive voice assistant processes approximately 1,000,000 character requests per day for customer interactions, using Neural voices.
  • Monthly Characters: 1,000,000 characters/day * 30 days/month = 30,000,000 characters.
  • Calculation: (30,000,000 characters / 1,000,000 characters) * $16.00 = $480.00. (This calculation does not account for potential volume discounts at higher tiers, which could further reduce the effective per-character cost).
  • Estimated Monthly Cost: $480.00.

These examples illustrate how character volume and voice type significantly influence the overall cost. For precise cost projections, users should leverage the AWS Pricing Calculator, which allows for detailed scenario planning and accounts for regional pricing differences and volume discounts.

How the pricing compares

When evaluating AWS Polly's pricing, it is useful to compare it with other leading text-to-speech services, such as Google Cloud Text-to-Speech and Microsoft Azure Text to Speech. While all offer pay-as-you-go models based on character count, their specific rates, free tier allowances, and advanced feature pricing can differ.

Google Cloud Text-to-Speech

Google Cloud Text-to-Speech also offers standard and WaveNet (neural equivalent) voices. Their pricing for standard voices is often comparable to AWS Polly's, with WaveNet voices typically priced higher than standard voices. Google's free tier provides 1 million characters per month for standard voices and 500,000 characters per month for WaveNet voices, which is less than AWS Polly's free tier for new customers over 12 months. For detailed information, consult the Google Cloud Text-to-Speech pricing documentation.

Microsoft Azure Text to Speech

Azure's Text to Speech service provides standard, neural, and custom neural voices. Their pricing structure includes a free tier that offers 500,000 characters per month for standard voices and 50,000 characters per month for neural voices. Paid tiers for neural voices are generally competitive with AWS Polly and Google Cloud. Azure also offers specialized pricing for custom neural voices, which involves additional costs for training and hosting. Detailed pricing can be found on the Azure Text to Speech pricing page.

ElevenLabs

ElevenLabs focuses heavily on highly realistic and customizable neural voices, including features like voice cloning and advanced speech synthesis. Their pricing model differs significantly, often starting with a subscription-based approach that includes a certain number of characters per month, with additional characters available at a per-character rate. ElevenLabs' free tier is typically more limited in character count compared to the cloud providers but allows access to their advanced voice features. Their target audience and feature set often cater to creators and developers requiring highly nuanced and unique voice outputs. For specifics, refer to the ElevenLabs pricing plans.

In summary, AWS Polly generally offers a competitive free tier for new users and a clear, character-based pricing model. While its standard voice pricing is often competitive, its neural and long-form voice options reflect the advanced technology and naturalness they provide. Developers should consider not only the per-character cost but also the specific features, voice quality requirements, and overall ecosystem integration when choosing a text-to-speech provider. For example, developers already deeply integrated into the AWS ecosystem might find Polly the most convenient and cost-effective option due to existing infrastructure and familiarity, as described by Mozilla's explanation of vendor lock-in considerations.