SSML
The Speech Synthesis Markup Language SSML is a well-established W3C Recommendation supported by a range of commercial text-to-speech (TTS) systems. It is the most established of the representation formats described in this section. The main purpose of SSML is to provide information to a TTS system on how to speak a given text. This includes the possibility to add <emphasis> on certain words, to provide pronunciation hints via a <say-as> tag, to select a <voice> which is to be used for speaking the text, or to request a <break> at a certain point in the text. Furthermore, SSML provides the possibility to set markers via the SSML <mark> tag. The following shows an example SSML document that could be used as input to a TTS engine. It requests a female US English voice; the word “wanted” should be emphasised, and there should be a pause after “then”.
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"> <voice gender="female"> And then <break/> I <emphasis>wanted</emphasis> to go. </voice> </speak>