atsauth disabled on nodes
Azure Local cluster (CO-HCI) running solution version 10.2503.0.13 is failing to update to 11.2504.1001.21. The update consistently fails at 88% at the "Update Arc infrastructure components" → "Determine Deploy Or Update" step. Investigation shows atsauth is disabled on all three nodes (CO-HCI1, CO-HCI02, CO-HCI03) and is being controlled from the Azure cloud side via GetDisabledFeaturesReq. Arc agent version is 1.46 on all nodes. Cannot manually upgrade Arc agent on HCI nodes. Need assistance re-enabling atsauth and resolving the Arc infrastructure component update failure.
Azure Local
-
Himanshu Shekhar • 5,740 Reputation points • Microsoft External Staff • Moderator
2026-04-22T20:37:26.35+00:00 The Azure Connected Machine agent (azcmagent) is responsible for maintaining connectivity and configuration between the node and Azure Arc. It operates under Azure control and retrieves its configuration/state from the Azure control plane. - https://learn.microsoft.com/en-us/azure/azure-arc/servers/azcmagent
This confirms that certain feature states exposed by the agent (such as feature enablement/disablement) are not purely local configurations but are influenced by Azure Arc service communication.
Arc infrastructure dependency during HCI updates
Azure Local (Azure Stack HCI) updates include steps that deploy or validate Arc infrastructure management components, and failures during this stage typically require deeper investigation. Since your failure occurs exactly at this step, it strongly indicates a dependency on Arc infrastructure state rather than a pure OS or cluster issue.
Connectivity is critical for Arc state synchronization - https://learn.microsoft.com/en-us/azure/azure-arc/network-requirements-consolidated?tabs=azure-cloud
Azure Arc agents depend on outbound HTTPS connectivity (port 443) to Azure endpoints (e.g., management.azure.com, *.his.arc.azure.com) to retrieve configuration and maintain state.
inconsistency or delay in this communication can result in stale or incomplete agent state being used during update orchestration.
What This Means for Your Scenario
- The consistent atsauth disabled state across all nodes indicates this is not a node-level misconfiguration.
- The failure point aligns with Arc infrastructure orchestration, which depends on synchronized control-plane state.
- Therefore, the issue is most likely related to: Arc service-side state / orchestration dependency or a sync/communication gap between cluster and Azure control plane
Recommended Actions
1. Trigger Arc synchronization
Run on cluster: PowerShell
Sync-AzureStackHCIThis forces the cluster to sync its current state with Azure (commonly used to refresh Arc/HCI metadata).
- Validate Arc agent health on all nodes
Run: PowerShell
azcmagent showazcmagent checkConfirm:
- Agent status = Connected
- No connectivity errors
- Recent heartbeat
- Verify outbound connectivity
Ensure required Arc endpoints are reachable over HTTPS (443), including:
-
management.azure.com -
*.his.arc.azure.com - Identity and configuration endpoints [learn.microsoft.com]
4. Review update logs
Please Check:
- HCI deployment/update logs (
C:\Windows\AzureStackHCI\Logs\...) - Update action plan (
Get-AzureStackHCIUpdateRun)
This helps confirm whether the failure is consistently tied to Arc orchestration.
Important Support Guidance (Do Not Perform)
- Do not manually reinstall or forcibly disconnect the Arc agent unless guided by Microsoft Support
- Avoid making unsupported changes to agent configuration
- Do not attempt manual overrides for feature states exposed via the agent
(These operations may break Arc integration and complicate recovery)
-
Andrew D • 0 Reputation points
2026-04-22T20:46:27.8566667+00:00 Thank you for the response. I ran Sync-AzureStackHCI (no output returned), then verified with azcmagent show and azcmagent check.
Results:
All Arc connectivity endpoints are reachable (TLS 1.3, no proxy issues)
Agent Status: Connected with recent heartbeat
Arc Proxy: now running
Disabled Features: atsauth is still showing as disabled even after the sync
The sync did not clear the atsauth disabled state. Can you advise how to re-enable atsauth at the Azure control plane level? We did not deliberately disable this feature — it was in this state when we discovered it during our first-ever update attempt. Our setup company may have disabled it during initial configuration.
-
Andrew D • 0 Reputation points
2026-04-23T14:25:40.0933333+00:00 Found that connectivityProperties.enabled was set to False on the arcSettings/default resource. Patched it to true successfully (confirmed via API response). However atsauth is still showing as disabled in azcmagent show even after restarting the himds service and running Sync-AzureStackHCI. Is there a specific property that controls atsauth in the Azure control plane?
-
Andrew D • 0 Reputation points
2026-04-23T14:27:46.41+00:00 Found the root cause. The himds.log shows: Sending flighting features: ["atsauth"] — this confirms atsauth is being disabled via Azure Feature Flighting, not via any customer-controlled configuration. This is a Microsoft-side feature flag. We need Microsoft to enable the atsauth flight for our subscription/tenant [TENANT ID REDACTED]. Can this be escalated to the Azure Local product team?
-
Himanshu Shekhar • 5,740 Reputation points • Microsoft External Staff • Moderator
2026-04-29T00:07:52.8566667+00:00 Andrew D - The understanding that “atsauth is blocking the update” is incorrect. The actual blocker is the Azure Arc Resource Bridge (ARB) being Offline, which is a critical dependency for Azure Local updates.
Reason:
- Azure Arc Resource Bridge acts as the control plane bridge between Azure and on‑prem infrastructure. [learn.microsoft.com]
- If ARB is offline, update workflows involving ARB and extensions cannot proceed, causing the update to get stuck.
Update is stuck (Root Cause)
When ARB is Offline / unreachable,
- Update stage “Update ARB and Extensions” cannot complete
- System cannot validate or communicate with the management Kubernetes cluster hosted inside ARB VM [learn.microsoft.com]
This directly is a known issue:
- “Update stuck/failure when ARB is offline”
- ARB is a required infrastructure component, not optional
- Validate ARB (Resource Bridge) state
- Confirm:
- ARB VM (control-plane VM) is running
- Not in stopped/failed state
This is a primary check when ARB shows offline [chkja.dk]
- Check connectivity to ARB VM
- Identify ARB appliance IP (via MOC config commands - as per TSG context)
- Validate:
- Ping to ARB IP
- Network path from management node
- Ping to ARB IP
If not reachable:
- Check: Network adapter attached to ARB VM
Correct vSwitch / VLAN mapping [chkja.dk]
- Validate communication from management node
- Ensure:
- Management machine can reach ARB VM over required ports (e.g., SSH TCP 22 for logs) [learn.microsoft.com]
- Collect logs for deeper validation
Use:
-
az arcappliance logs(from management machine) - Identify whether ARB failure is due to:
- Networking
- Credentials mismatch
- Internal ARB service failure [learn.microsoft.com]
- Check credentials / certificate health
- If ARB is reachable but still failing:
- Validate credentials stored in ARB appliance
- Expired credentials or certs can break communication
ARB requires periodic maintenance (credentials / cert refresh) [learn.microsoft.com]
- Attempt ARB recovery
If still offline:
- Restart ARB VM
- Validate services inside ARB
If ARB is irrecoverable:
- Re-deploy / recover ARB using same configuration
- Reconnect it to existing Azure resources
This is recommended when:
- VM corrupted
- Communication permanently broken [learn.microsoft.com]
Important Guidance
- ARB is a critical component > should not be deleted/recreated without guidance [learn.microsoft.com]
- Escalation should be triggered if:
- ARB VM is: Running but still showing Offline in Azure
- Network and connectivity checks pass, but issue persists
- Logs show:
- Internal ARB service failures
- MOC / lifecycle errors
- Update remains stuck after remediation attempts
-
Andrew D • 0 Reputation points
2026-04-29T16:49:29.33+00:00 Regarding the stuck update on cluster CO-HCI, I have completed the requested investigation into the Azure Arc Resource Bridge (ARB). Here are the technical findings:
- ARB Resource Status (Azure Portal)
Resource Name: CO-HCI-arcbridge.
Current Status: Offline.
Provisioning State: Succeeded (but disconnected).
Version: 1.4.0.
- Local Infrastructure Status (On-Premises)
Control Plane IP: 10.100.42.105.
Connectivity: The IP is pingable from the management node (co-hci02) with <1ms latency.
Port Check: TCP Port 22 (SSH) is open and responding (TcpTestSucceeded: True).
VM State: The underlying VM (ID: 15b89813...) is confirmed Running in Failover Cluster Manager.
- Actions Already Taken
Service Restart: I have performed a clean shutdown and restart of the ARB Virtual Machine.
Role Restart: The MOC Cloud Agent Cluster Role (ca-a50a9dd9...) has been stopped and restarted.
Persistence: Despite these restarts and confirmed local connectivity, the bridge remains Offline in the Azure Portal.
- Observed Deployment Errors
The deployment CO-HCI is currently in a terminal 'Failed' state.
Error Code: ResourceDeploymentFailure.
Message: Terminal provisioning state 'Failed' for microsoft.azurestackhci/clusters/CO-HCI/deploymentSettings/default.
It appears that while the ARB appliance is physically running, the management agents are failing to authenticate or connect back to Azure, which is blocking the "Update ARB and Extensions" workflow.
-
Andrew D • 0 Reputation points
2026-04-29T18:59:47.6166667+00:00 We also found and renewed the expired MOC Admin identity tokens (expired April 28, 2026, rotation was overdue since April 2, 2026).
Restarted wssdcloudagent and wssdagent services.
We then re-ran the update, however the update still fails at "Determine Deploy Or Update" with the same error. Application event log shows: "Failed to fetch the resource id of the EdgeMachine from the Configuration, so cannot create an EdgeMachine."
Sign in to comment