Significant Regression in Code-Switching (EN-ZH) recognition for zh-HK after April 2026 Engine Update

Question

Significant Regression in Code-Switching (EN-ZH) recognition for zh-HK after April 2026 Engine Update

Peter Ng 0

Problem Description

We are reporting a critical regression in the Cantonese (zh-HK) Speech-to-Text service that started occurring after the March 31, 2026 API retirement and the subsequent rollout of the latest engine (MAI-Transcribe-1).

The Issue: Common English terms (e.g., "Wifi", "Meeting", "MTR") mixed within Cantonese sentences are being systematically filtered out or misrecognized as Cantonese homophones, even when using the latest Speech SDK (v1.49.1) and high Phrase List weights (2.0).

Technical Details

Region: East Asia (Hong Kong) / Southeast Asia

Locale: zh-HK (Cantonese Traditional)

SDK Version: JavaScript SDK 1.49.1 (Updated from 1.36.0)

Deployment Type: Base Model (and previously Custom Model)

Expected Behavior: "邊度有 Wifi 機？" (Where is the Wifi machine?)

Actual Behavior: "邊度有機？" (The word "Wifi" is completely omitted by the AI Post-Processor).

Evidence & Logs (Telemetry)

The following log shows that the Lexical (raw) output already fails to capture the English token, suggesting the issue is within the new decoder's language-purity constraint:

Result ID: [請貼上你 log 裡的 Result ID] Lexical (raw): 邊度有機 ITN: 邊度有機 Display: 邊度有機？ Note: The speaker clearly said "Wifi 機", but "Wifi" was treated as disfluency and removed.

Steps we have already taken to troubleshoot:

SDK Upgrade: Upgraded to 1.49.1 to ensure compatibility with new service properties.

Phrase List: Added English terms to PhraseListGrammar with phraseListWeight set to 2.0 (via setServiceProperty).

Post-Processing: Toggled PostProcessingOption between "None" and "TrueText", but the English words remain missing.

Custom Model Removal: The issue persists even on the Base Model, indicating a global regression in the zh-HK engine's ability to handle code-switching.

Requested Action

Please investigate if the MAI-Transcribe-1 model for zh-HK has an over-aggressive language filter for English tokens.

Clarify if the property phraseListWeight is being correctly honored by the new engine for zh-HK.

Provide a way to disable the mandatory "Semantic Segmentation" or "Disfluency Removal" that seems to be filtering out short English terms in Cantonese contexts.
Problem Description
We are reporting a critical regression in the Cantonese (zh-HK) Speech-to-Text service that started occurring after the March 31, 2026 API retirement and the subsequent rollout of the latest engine (MAI-Transcribe-1). The Issue: Common English terms (e.g., "Wifi", "Meeting", "MTR") mixed within Cantonese sentences are being systematically filtered out or misrecognized as Cantonese homophones, even when using the latest Speech SDK (v1.49.1) and high Phrase List weights (2.0).
Technical Details
- Region: East Asia (Hong Kong) / Southeast Asia
- Locale: zh-HK (Cantonese Traditional)
- SDK Version: JavaScript SDK 1.49.1 (Updated from 1.36.0)
- Deployment Type: Base Model (and previously Custom Model)
- Expected Behavior: "邊度有 Wifi 機？" (Where is the Wifi machine?)
- Actual Behavior: "邊度有機？" (The word "Wifi" is completely omitted by the AI Post-Processor).
Evidence & Logs (Telemetry)
The following log shows that the Lexical (raw) output already fails to capture the English token, suggesting the issue is within the new decoder's language-purity constraint:

Lexical (raw): 邊度有機 ITN: 邊度有機 Display: 邊度有機？ Note: The speaker clearly said "Wifi 機", but "Wifi" was treated as disfluency and removed.

Steps we have already taken to troubleshoot:
1. SDK Upgrade: Upgraded to 1.49.1 to ensure compatibility with new service properties.
2. Phrase List: Added English terms to PhraseListGrammar with phraseListWeight set to 2.0 (via setServiceProperty).
3. Post-Processing: Toggled PostProcessingOption between "None" and "TrueText", but the English words remain missing.
4. Custom Model Removal: The issue persists even on the Base Model, indicating a global regression in the zh-HK engine's ability to handle code-switching.
Requested Action
1. Please investigate if the MAI-Transcribe-1 model for zh-HK has an over-aggressive language filter for English tokens.
2. Clarify if the property phraseListWeight is being correctly honored by the new engine for zh-HK.
3. Provide a way to disable the mandatory "Semantic Segmentation" or "Disfluency Removal" that seems to be filtering out short English terms in Cantonese contexts.

Karnam Venkata Rajeswari 2,395 Reputation points Microsoft External Staff Moderator

2026-04-22T14:17:26.89+00:00
Hello @Peter Ng ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

The observed behavior originates during the decoding stage of the zh-HK speech recognition model, where English tokens in Cantonese-English code-switching scenarios may be filtered or not generated.

This behavior aligns with known limitations in speech recognition systems when handling code-switching within a single language setting.

The current engine appears to favor single‑locale consistency (zh‑HK) when processing speech.

In mixed-language (Cantonese + English) scenarios:

Short English terms may be treated as low-confidence tokens

They may be mapped to phonetic Cantonese equivalents or omitted

Regarding phraseListWeight ,

Phrase list biasing is supported and functions correctly.

It improves the likelihood of recognizing specified words when they are already considered by the model.

However, it does not force recognition of words that are absent in the decoding stage.

For post-processing and filtering

Settings such as PostProcessingOption (“None”, “TrueText”) affect only the formatting stage after recognition.

They do not influence whether a word is captured during decoding.

There is currently no exposed configuration to disable or control token filtering or normalization occurring at the decoding level.

Please check if the following suggestions and workarounds help -

Please test the alternate transcription pipelines to

Evaluate behavior using batch transcription APIs

This helps to confirm whether behavior is consistent across processing paths

Providing reproducible sample

Short audio sample demonstrating the issue

Exact phrase list configuration or SDK setup

This enables precise validation and internal reproduction

Comparing with alternate configurations

Test with different locales (for example, English-based locales)

This helps in confirming whether behavior is locale-specific

Evaluating multilingual scenarios

Consider using speech translation or multilingual features in scenarios involving mixed-language input

These modes may handle language switching more effectively

To conclude , at present, there are no configuration options available to disable or control this filtering behavior. Overall, this pattern indicates a model-level limitation or behavioral change in handling mixed-language input.

The following references might be helpful , please check them out

Improve recognition accuracy with phrase list - Foundry Tools | Microsoft Learn

Use the LLM-speech API - Speech service - Foundry Tools | Microsoft Learn

Display text formatting with speech to text - Speech service - Foundry Tools | Microsoft Learn

We have reached out to you on private message for further assistance.

Thank you
Karnam Venkata Rajeswari 2,395 Reputation points Microsoft External Staff Moderator

2026-04-24T18:47:20.9566667+00:00

Hello @Peter Ng ,

Checking in to see if you had any chance to review the above response.

Thank you
Karnam Venkata Rajeswari 2,395 Reputation points Microsoft External Staff Moderator

2026-04-27T17:31:57.9066667+00:00

Hello @Peter Ng ,

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

Thank you!
Karnam Venkata Rajeswari 2,395 Reputation points Microsoft External Staff Moderator

2026-04-28T17:09:38.6833333+00:00

Hello @Peter Ng ,

Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

Thank you!
Manas Mohanty 16,670 Reputation points Microsoft External Staff Moderator

2026-05-05T20:23:19.0966667+00:00

Hey @Peter Ng,

Thank you for your insights regarding aggressive disfluency removal "zh-HK" speech to text services

I am replicating this at my side with SDK, could you please share you connect details/minimal replicable code in private message to help us replicate and escalate to SME here through a support ticket

Thank you