Share via

Try to use gpt-4o-transcribe-diarize through HTTP-Request for transcription-results in Server Error

Paul Nesch 0 Reputation points
2026-02-06T09:35:33.87+00:00

Im building an automation in copilot studio/power automate which includes transcribing an audio file. I use a HTTP-Request to connect to my azure ai foundry models. I first tried it with whisper which worked perfectly and also gpto 4o-mini-transcribe. But these get retired soon in azure ai foundry so i looked out for another model. So i tried the same with gpt-4o-transcribe-diarize and i always get the Server Error message back: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through an Azure support request at: ... if you keep seeing this error.

I would like to use just whisper but if it gets retired i have no choice, because the gpt-4o-transcribe-diarize model is the only one available atelast till 2027.

I found a post about this topic already but the tips didnt really help and its been over a month since this post

Azure Translator in Foundry Tools

1 answer

Sort by: Most helpful
  1. Jerald Felix 11,550 Reputation points Volunteer Moderator
    2026-02-09T02:18:40.12+00:00

    Hello Paul Nesch,

    Thanks for raising it in the Q&A forum!

    I understand you're trying to use GPT-4o transcribe-diarize through HTTP/REST API in Azure OpenAI and encountering issues.

    The gpt-4o-transcribe-diarize model is an automatic speech recognition (ASR) model with built-in speaker diarization, designed to transcribe audio and identify different speakers. It supports 100+ languages, processes up to 16,000 tokens context window, and can convert 10 minutes of audio in approximately 15 seconds.

    API Endpoint and Usage

    Azure OpenAI Endpoint Structure:The correct endpoint format for Azure OpenAI is:

    text
    POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-name}/audio/transcriptions?api-version=2024-08-01-preview
    

    Request Format:

    python
    from
    

    Response Format:When using response_format="diarized_json", speakers are labeled as "A:", "B:", etc., unless you provide speaker references in your request.

    Common Issues and Troubleshooting

    Deployment Required:Unlike OpenAI's direct API, Azure OpenAI requires you to first deploy the gpt-4o-transcribe-diarize model in Azure AI Foundry:

    Go to Azure AI Foundry portal

    Navigate to Deployments

    Create a new deployment with the gpt-4o-transcribe-diarize model

    Use your deployment name (not "gpt-4o-transcribe-diarize") in API calls

    Audio Format Requirements:

    Sample rate: 24,000 Hz recommended

    Format: PCM 16-bit

    • Maximum duration: 1,400 seconds (23.3 minutes) per chunk

    For WebSocket/Realtime API: encode to base64

    API Version:Ensure you're using the correct API version that supports this model:

    text
    api-version=2024-08-01-preview
    

    Or later versions if available.

    Model Limitations:

    Not available over the Realtime API

    Prompting (e.g., to establish abbreviations or prior chunks) is not supported

    Speaker attribution doesn't learn automatically—it labels speakers generically unless references are provided

    • May have hallucinations and omissions, especially with less structured speech

    Fallback to Whisper: If gpt-4o-transcribe-diarize has intermittent failures, temporarily switch to the whisper-1 model to verify your setup works, then switch back.

    Check Quotas and Logs:

    Verify your deployment status in Azure Portal

    Check API usage and rate limits

    Review error logs for specific failure messages

    Retry after a few minutes if seeing transient errors

    Update SDK:Ensure you're using the latest Azure OpenAI Python SDK:

    bash
    pip install --upgrade openai
    

    C# Sample (if needed):

    csharp
    using
    

    If this helps, kindly accept the answer.

    Best Regards,

    Jerald Felix


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.