All Resources

Speech Synthesis


Speech synthesis is the artificial production of human speech by a computer system.[1] Early approaches assembled speech from recorded fragments or modeled the vocal tract directly, while contemporary systems use deep neural networks to generate waveforms that closely resemble a natural speaker.

The quality of synthesized speech is judged by how intelligible and natural it sounds, including correct stress, timing, and emotion. Speech synthesis underpins text-to-speech, voice assistants, and many accessibility tools, and it is the foundation on which more specialized techniques such as voice cloning are built.[2]