Share via

How to fix an HCI Host that fails to drain even though no roles on it

John Rodger 0 Reputation points
2026-04-14T15:36:26.2933333+00:00

How to fix an HCI Host that fails to drain even though no roles on it, and even after a reboot

Azure Local

2 answers

Sort by: Most helpful
  1. Alex Burlachenko 20,665 Reputation points MVP Volunteer Moderator
    2026-04-15T09:47:07.5066667+00:00

    John Rodger hi,

    tl dr something still owned or running on node even if u dont see it...

    yeah this is almost never about roles, its cluster state being stuck, node still “owns” something even if UI shows nothing, usually its storage jobs, CSV ownership, or hidden cluster resources, check Get-ClusterGroup and Get-ClusterResource to see if anything still tied to that node, then check storage jobs Get-StorageJob bc if anything running drain will fail, also check CSV ownership Get-ClusterSharedVolume and move it off the node, sometimes node stuck in draining/paused state so run Resume-ClusterNode -Name <node> -Failback Immediate, if still blocked force move everything Move-ClusterGroup -Node <other-node> -All, if that still doesnt work its stale cluster state so restart cluster service or use Clear-ClusterNode

    rgds,

    Alex

    &pls if it helps accept my answer
    
    0 comments No comments

  2. Ankit Yadav 14,165 Reputation points Microsoft External Staff Moderator
    2026-04-14T16:20:18.9166667+00:00

    In Azure Local, drain is based on cluster ownership, not just visible VM roles. A node can remain in Draining if the cluster still considers it as owning infrastructure resources or if maintenance activity hasn’t fully completed.

    Use the steps below to identify and clear the condition.

    1. Check whether the node still owns any cluster groups

    Even if no VMs are present, the node may still own CSV or other cluster groups.

    Get-ClusterGroup | Format-Table Name, OwnerNode, State
    

    What to look for: Any group where OwnerNode is the affected host. If ownership exists, drain will not complete.

    2. Move any remaining groups off the node

    If ownership is found, move the group manually:

    Move-ClusterGroup -Name "<GroupName>" -Node "<OtherNodeName>"
    

    Re‑run step 1 and confirm the node no longer owns any groups.

    3. Re‑attempt the supported drain operation

    Once ownership is clear, retry the documented drain action:

    Suspend-ClusterNode -Name "<NodeName>" -Drain
    

    What to expect: The node should transition to Paused after draining completes.

    4. Resume and drain again if the state appears stuck

    If the node remains in Draining despite no owned resources, reset the pause state and retry:

    Resume-ClusterNode -Name "<NodeName>"
    Suspend-ClusterNode -Name "<NodeName>" -Drain
    

    This re-applies the supported pause/drain workflow without using undocumented force actions.

    5. Check for active storage jobs (S2D environments)

    Background storage maintenance or repair activity can delay maintenance transitions.

    Get-StorageJob
    

    What to look for: Any running jobs. Allow them to complete before retrying drain.

    6. Restart the cluster service on the node (if quorum is safe)

    If the node is still stuck and no groups or storage jobs are present, restart the cluster service on that node only:

    Stop-ClusterNode -Name "<NodeName>"
    Start-ClusterNode -Name "<NodeName>"
    

    Important: Only do this if the cluster will remain in quorum.

    7. Validate cluster health if the issue persists

    Run cluster validation to surface configuration or health issues that may block maintenance:

    Test-Cluster -Node "<NodeName>"
    

    Review storage, network, and system results.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.