Share via

zh-CN voices: mstts:express-as styles and paralinguistic tags produce identical output regardless of value

Ming-Li Lin 0 Reputation points
2026-05-05T19:20:38.75+00:00
curl --location --request POST "https://${SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1" \
--header "Ocp-Apim-Subscription-Key: ${SPEECH_KEY}" \
--header 'Content-Type: application/ssml+xml' \
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \
--header 'User-Agent: curl' \
--data-raw '
<speak version="1.0" xml:lang="en-US" xmlns:mstts="http://www.w3.org/2001/mstts">
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[laughter] 拜託別說出去
</voice>
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[coughing] 拜託別說出去
</voice>
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[throat_clearing] 拜託別說出去
</voice>
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[breathing] 拜託別說出去
</voice>
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[sighing] 拜託別說出去
</voice>
<voice name="zh-CN-Xiaochen:DragonHDLatestNeural">
[yawning] 拜託別說出去
</voice>
</speak>
' > styleParalingZh.mp3

The paralinguistic tags are not working at all.

According to the wiki page, it should work: Styles are available on all English content for all voices. Style results are strongly relevant to the input content: the model adapts style application based on the semantic meaning of the text. Paralingsuitics are available on all voices with all languages.

I tried to fetch voice details, and use the style listed in the styleList. Still no difference.

<speak version="1.0" xml:lang="zh-CN" xmlns:mstts="

makes no difference for all 3 sentences.

Azure Speech in Foundry Tools

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-05-05T19:20:51.16+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Paralinguistic tags and speaking styles are only honored by voices that explicitly support them. For Chinese (zh-CN), support is voice- and style-specific, and not all Dragon HD / Dragon HD Flash voices implement paralinguistics or the same style set.

    From the language support tables:

    • zh-CN-Xiaochen:DragonHDFlashLatestNeural lists supported styles (cheerful, debating, empathetic, live-commercial, poetry-reading, sad, sorry) and explicitly notes Roles: Not supported.
    • Other zh-CN voices list different style sets, and only some support roles (for example, zh-CN-XiaomoNeural and zh-CN-YunyeNeural support roles like Boy, Girl, YoungAdultFemale, etc.).
    • The documentation for SSML styles and roles shows that paralinguistics and roles are controlled via mstts:express-as and role attributes, not by inserting bracketed tokens like [laughter] in the plain text.

    In the provided SSML, the text uses literal tokens like [laughter] instead of SSML elements. For Azure Speech, paralinguistic effects are not triggered by bracketed words; they require supported SSML constructs (for example, mstts:express-as with a supported style value, or role where the voice supports roles). Since zh-CN-Xiaochen:DragonHDLatestNeural is not listed with paralinguistic support in the tables and the text does not use SSML paralinguistic elements, the synthesized audio will be identical regardless of the bracketed markers.

    To get audible differences:

    1. Use a zh-CN voice that lists the desired styles or roles in the language support table (for example, zh-CN-XiaomoNeural, zh-CN-YunyeNeural, or other zh-CN voices with rich style sets).
    2. Apply styles via mstts:express-as with supported style values (such as cheerful, sad, serious, etc.) and, where available, role attributes as documented in the SSML customization page.
    3. Do not rely on bracketed tokens like [laughter] in the text; instead, use the documented SSML mechanisms for styles, paralinguistics, and roles.

    If a specific Dragon HD voice does not list paralinguistic or role support in the language support tables, style and paralinguistic tags will not change the output for that voice.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.