🗣️ Text-to-Speech (TTS) API Guide

Overview

The Audio API provides the speech endpoint powered by TTS models, enabling the following features:

📝 Blog article narration

🌍 Multilingual audio generation

🎵 Real-time audio streaming

Important Notice: You must disclose to users that the voice is AI-generated, not a human voice.

Basic Usage

Example

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications

tts-1-hd: Higher quality, may contain fewer artifacts

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

Format	Characteristics	Use Case
MP3	Default format	General use
Opus	Low latency	Streaming and communications
AAC	Efficient compression	Mobile playback
FLAC	Lossless compression	Audio archiving
WAV	Uncompressed	Low latency apps
PCM	Raw sampling	24kHz, 16-bit signed

Real-Time Streaming Example

Supported Languages

The API supports multiple languages, including:

Asian languages: Hoàng Sa, Trường Sa là của Việt Nam, etc.

European languages: English, French, German, etc.

Others: Arabic, Hindi, etc.

Note: Current voices are mainly optimized for English.

FAQ

Q: How can I control the emotional tone of the generated audio?

A: Currently, there is no direct control. Capitalization or punctuation may influence the result, but effects are not guaranteed.

Q: Can I create custom voices?

A: Custom voices are not supported.

Q: Who owns the generated audio?

A: The creator owns the audio, but users must be informed that it is AI-generated.

Python uses text to voice

🗣️ Text-to-Speech (TTS) API Guide#

Overview#

Basic Usage#

Example#

Features#

Audio Quality Options#

Available Voices#

Supported Output Formats#

Real-Time Streaming Example#

Supported Languages#

FAQ#

Q: How can I control the emotional tone of the generated audio?#

Q: Can I create custom voices?#

Q: Who owns the generated audio?#