An Azure service to easily conduct machine translation with a simple REST API call.
Hello Paul Nesch,
Thanks for raising it in the Q&A forum!
I understand you're trying to use GPT-4o transcribe-diarize through HTTP/REST API in Azure OpenAI and encountering issues.
The gpt-4o-transcribe-diarize model is an automatic speech recognition (ASR) model with built-in speaker diarization, designed to transcribe audio and identify different speakers. It supports 100+ languages, processes up to 16,000 tokens context window, and can convert 10 minutes of audio in approximately 15 seconds.
API Endpoint and Usage
Azure OpenAI Endpoint Structure:The correct endpoint format for Azure OpenAI is:
text
POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-name}/audio/transcriptions?api-version=2024-08-01-preview
Request Format:
python
from
Response Format:When using response_format="diarized_json", speakers are labeled as "A:", "B:", etc., unless you provide speaker references in your request.
Common Issues and Troubleshooting
Deployment Required:Unlike OpenAI's direct API, Azure OpenAI requires you to first deploy the gpt-4o-transcribe-diarize model in Azure AI Foundry:
Go to Azure AI Foundry portal
Navigate to Deployments
Create a new deployment with the gpt-4o-transcribe-diarize model
Use your deployment name (not "gpt-4o-transcribe-diarize") in API calls
Audio Format Requirements:
Sample rate: 24,000 Hz recommended
Format: PCM 16-bit
- Maximum duration: 1,400 seconds (23.3 minutes) per chunk
For WebSocket/Realtime API: encode to base64
API Version:Ensure you're using the correct API version that supports this model:
text
api-version=2024-08-01-preview
Or later versions if available.
Model Limitations:
Not available over the Realtime API
Prompting (e.g., to establish abbreviations or prior chunks) is not supported
Speaker attribution doesn't learn automatically—it labels speakers generically unless references are provided
- May have hallucinations and omissions, especially with less structured speech
Fallback to Whisper: If gpt-4o-transcribe-diarize has intermittent failures, temporarily switch to the whisper-1 model to verify your setup works, then switch back.
Check Quotas and Logs:
Verify your deployment status in Azure Portal
Check API usage and rate limits
Review error logs for specific failure messages
Retry after a few minutes if seeing transient errors
Update SDK:Ensure you're using the latest Azure OpenAI Python SDK:
bash
pip install --upgrade openai
C# Sample (if needed):
csharp
using
If this helps, kindly accept the answer.
Best Regards,
Jerald Felix