Try to use gpt-4o-transcribe-diarize through HTTP-Request for transcription-results in Server Error

Question

Try to use gpt-4o-transcribe-diarize through HTTP-Request for transcription-results in Server Error

Paul Nesch 0

Im building an automation in copilot studio/power automate which includes transcribing an audio file. I use a HTTP-Request to connect to my azure ai foundry models. I first tried it with whisper which worked perfectly and also gpto 4o-mini-transcribe. But these get retired soon in azure ai foundry so i looked out for another model. So i tried the same with gpt-4o-transcribe-diarize and i always get the Server Error message back: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through an Azure support request at: ... if you keep seeing this error.

I would like to use just whisper but if it gets retired i have no choice, because the gpt-4o-transcribe-diarize model is the only one available atelast till 2027.

I found a post about this topic already but the tips didnt really help and its been over a month since this post

Manas Mohanty 16,670 Reputation points Microsoft External Staff Moderator

2026-02-25T00:05:39.2533333+00:00

Hi Paul Nesch

I am trying to replicate the scenario in Power automate at my side. Wanted to quote that

The HTTP request schema (speech to text whisper) that worked for non‑diarized models will be incompatible with the diarization model (Speech to text + diarization)

Could you check the schema used once which might be most probably cause behind server errors here.

Thank you.
Manas Mohanty 16,670 Reputation points Microsoft External Staff Moderator

2026-02-25T16:59:33.7533333+00:00

Hi Paul Nesch

Can you share the power automate workflow in private message.

Thank you.

1 answer

Your answer

Manas Mohanty 16,670 Reputation points Microsoft External Staff Moderator

2026-02-25T00:05:39.2533333+00:00

Hi Paul Nesch

I am trying to replicate the scenario in Power automate at my side. Wanted to quote that

The HTTP request schema (speech to text whisper) that worked for non‑diarized models will be incompatible with the diarization model (Speech to text + diarization)

Could you check the schema used once which might be most probably cause behind server errors here.

Thank you.
Manas Mohanty 16,670 Reputation points Microsoft External Staff Moderator

2026-02-25T16:59:33.7533333+00:00

Hi Paul Nesch

Can you share the power automate workflow in private message.

Thank you.

Answer 1

Hello Paul Nesch,

Thanks for raising it in the Q&A forum!

I understand you're trying to use GPT-4o transcribe-diarize through HTTP/REST API in Azure OpenAI and encountering issues.

The gpt-4o-transcribe-diarize model is an automatic speech recognition (ASR) model with built-in speaker diarization, designed to transcribe audio and identify different speakers. It supports 100+ languages, processes up to 16,000 tokens context window, and can convert 10 minutes of audio in approximately 15 seconds.

API Endpoint and Usage

Azure OpenAI Endpoint Structure:The correct endpoint format for Azure OpenAI is:

text
POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-name}/audio/transcriptions?api-version=2024-08-01-preview

Request Format:

python
from

Response Format:When using response_format="diarized_json", speakers are labeled as "A:", "B:", etc., unless you provide speaker references in your request.

Common Issues and Troubleshooting

Deployment Required:Unlike OpenAI's direct API, Azure OpenAI requires you to first deploy the gpt-4o-transcribe-diarize model in Azure AI Foundry:

Go to Azure AI Foundry portal

Navigate to Deployments

Create a new deployment with the gpt-4o-transcribe-diarize model

Use your deployment name (not "gpt-4o-transcribe-diarize") in API calls

Audio Format Requirements:

Sample rate: 24,000 Hz recommended

Format: PCM 16-bit

Maximum duration: 1,400 seconds (23.3 minutes) per chunk

For WebSocket/Realtime API: encode to base64

API Version:Ensure you're using the correct API version that supports this model:

text
api-version=2024-08-01-preview

Or later versions if available.

Model Limitations:

Not available over the Realtime API

Prompting (e.g., to establish abbreviations or prior chunks) is not supported

Speaker attribution doesn't learn automatically—it labels speakers generically unless references are provided

May have hallucinations and omissions, especially with less structured speech

Fallback to Whisper: If gpt-4o-transcribe-diarize has intermittent failures, temporarily switch to the whisper-1 model to verify your setup works, then switch back.

Check Quotas and Logs:

Verify your deployment status in Azure Portal

Check API usage and rate limits

Review error logs for specific failure messages

Retry after a few minutes if seeing transient errors

Update SDK:Ensure you're using the latest Azure OpenAI Python SDK:

bash
pip install --upgrade openai

C# Sample (if needed):

csharp
using

If this helps, kindly accept the answer.

Best Regards,

Jerald Felix

Paul Nesch 0 Reputation points

2026-02-09T06:59:42.72+00:00

Thank you for your detailed response, I checked everything and made sure everything is updated but it still cant seem to work. I actually prefer whsiper-1 and would like to use it but I saw that the model is getting retired in june thats why I am looking for a replacement or is there gonna be a follow up model for whisper available once whisper-1 is getting retired?

Share via

Try to use gpt-4o-transcribe-diarize through HTTP-Request for transcription-results in Server Error

1 answer

Common Issues and Troubleshooting

Your answer