Text-to-Speech (TTS)
Text-to-speech, or TTS, is technology that converts written text into spoken audio, a form of speech synthesis.[1] Modern systems use machine learning to produce natural sounding voices with appropriate rhythm and intonation, and they can read arbitrary text aloud on demand.
In media and product work, TTS is used for accessibility, voice interfaces, rapid prototyping of dialogue, and content that must be generated or updated at scale. Designing a good TTS experience involves choosing voices, tuning pronunciation, and integrating the system so that its output fits naturally into the surrounding audio.[2]