How to manage and monitoring Failover Cluster with Windows Admin Center

In my previous article i explain how can prepare and install Failover Clustering in Windows Server 2016

I hope to help someone to build the Failover Clustering or improve the infrastructure for those that already has build a Failover Clustering.

But after build your Failover Clustering you must monitoring especially in the beginning for issues that maybe arise or to prevent serious problems in your Nodes.

Today i will explain how can manage and monitoring the Failover Clustering that you have build or in case that you have already a Failover Clustering but you don't have in place a Monitoring Tool

So let's start

How to install the Windows Admin Center

Before start the installation of Windows Admin Center you must decide where to install it.

If you don't have try  Windows Admin Center now it's time. You can monitoring all your Servers , Failover Clustering, HYPER-V Hosts, Azure Stack HCI and more.

To proceed first you must read the article Windows Admin Center (Project Honolulu) — Setup Guide to setup the Windows Admin Center.

 

How to Add Failover Cluster in Windows Admin Center

After finish the installation let's proceed with the following steps to add your Failover Cluster in Windows Admin Center.

  • Open the Windows Admin Center
  • Click Add button

 

  • Click Add in Server Clusters.

 

  • Type the name of the Cluster Access Point . Be careful not the name from any of the nodes.

 

  • Select Use Another account for the connection
  • Type the Domain Admin credentials to give access.
  • Click Connect with account

 

  • After some time you will see the verification with green check that found the cluster name.
  • Click Add to proceed

 

  • While opening the Overview Page maybe get an Error .
  • This is something that observe after multiple times remove and add the Cluster.

 

  • The reason is because you must install the Feature Failover Cluster Module for Windows Powershell.
  • From the left menu click in Roles or any other menu and maybe appear the request to install the Feature.

 

  • Unless you can go from Powershell as Administrator and type the following command 
    Install-WindowsFeature -Name RSAT-Clustering-PowerShell
  • After finish the installation of the Feature you can go from the left menu to find out the different Tools which has the Windows Admin Center for the Failover Cluster in Windows Server 2016
  • The Tools that you have in Windows Admin Center for the Failover Clusters are:
    • Roles
    • Nodes
    • Disks
    • Storage Replica
    • Networks
    • Updates
    • Performance Monitor
    • Azure Monitor
  • You can go and see what features has any of the above tools.
  • After add the Failover Cluster you can find in Windows Admin Center and the Nodes of the Failover Cluster as Single Servers which can proceed to manage and monitoring .

 

How to Monitoring Failover Cluster with Windows Admin Center

 

Until now we see how can manage Failover Clustering and how can monitoring very basic things.

Now it's time to dive into Monitoring and explain a few things with more details regarding Performance Monitoring.

As you all know Windows Admin Center update class Windows Performance Monitoring with a better version inside to.

For more detail you can read Introducing the new Performance Monitor for Windows from Microsoft Docs

So let's go to see how can resolve issues in our Failover Cluster.

  • Open the Windows Admin Center
  • Select the Server Clusters connection
  • From the left side go in Performance Monitor.

 

Here start a very long discussion with kind of questions like what must be monitoring in Failover Cluster?

 To be honest you can monitoring a lot of counters but today i will give you the most important.

Let's explain which counters are important to monitoring to avoid downtimes ,help us to troubleshoot an issue , improve perfomance

 

Cluster Database

All the changes that happened in the cluster it writes to the Cluster Database.

Let's see in more details how this working. Cluster write all the changes in a transaction log files and after in the database.

Base on the Microsoft Docs Failover Clustering Performance Counters - Part 1 the cluster database flushes every 20 seconds. 

This counters help you to monitor how often database flushes hapening.

Let's start to add the performance counters

  • First click the button Blank Workspace 
  • Click Add Counter
  • In the Source Select one Nodes. In my Lab select the SRCL01.
  • In the Object write down in the Search Cluster Database and select the Cluster Database.
  • In Instance select all your Nodes..
  • In the Counter select all the Counters.
  • In the Graph Type you  can select only Report because you have a lot of Counters selected.

 

Global Update Manager

Global Update Manager is responsible to manage Cluster Database updates. Let's explain it with an example. In a failover cluster all the nodes must be updated for every state change on the other nodes. If a node is offline for any reason all the other nodes must updated that one node is offline. This is exactly the job of the Global Update Manager.

It's good to monitoring how it's going on the Global Update Manager

  • First click the button Blank Workspace 
  • Click Add Counter
  • In the Source Select all your Nodes. In my Lab are SRCL01 and SRCL02

 

  • In the Object write down in the Search Cluster Global and select the Cluster Global Update Manager.

 

  • In the Instance field select all your Nodes.

 

  • In the Counter field  select all the Counters.

 

  • In the Graph Type field you  have only the option for Report because you have a lot of Counters selected.
  • Now look the numbers in Database Update Messages.
  • Let's do a Live Migrate from Failover Cluster Manager in a Virtual Machine from one node to another.
  • When you finish the Live Migration take a look the numbers in the Database Update Messages.
  • Has change because a Virtual Machine change owner and all the nodes must be updated.

 

 

Cluster Network System

The cluster network is responsible for the reliable communication between the nodes.

In this counters you can monitoring the traffic between your nodes.

  • Click Add Counter
  • In the Source Select one of your Nodes. In my Lab i select the Node SRCL01

 

  • In the Object write down in the Search field cluster network and select the Cluster Network System
  • In the Instance field select the following:
    • The column _Network tell us how much traffic went through the networking stack between srcl01 and srcl02 without include the loopback interface.
    • The column SRCL01(srcl01) tell us how much traffic sent to itself the srcl01. Note that this traffic it's a loopback traffic that it's completed inside the cluster service.
    • The column SRCL02(srcl01) tell us how much traffic sent between srcl01 and srcl02 (inter-cluster communication)

 

  • In the Counter field  select all the Counters
  • Let's monitoring a few minutes to see the traffic in numbers

 

All the above counters has to do with the Cluster Monitoring.

What about Cluster Share Volume? It's very important to have some metrics in the CSV to troubleshoot bottlenecks , increase perfomance and keep the Cluster Share Volume healthy. 

Let's start

Cluster Share Volume Perfomance Counters

On the Coordinating nodes you can use the Performance Counter of SMB Server Share to monitoring all the traffic that comes to the coordinator node on the specific share.

Which is the Coordinator Node? Is the Node that has direct connection with the CSV. The Coordinator Node is the Name of the Node that you can see in the Owner Node field from the Failover Cluster Manager if you expand the Storage, click in the Disks.

 

For all the counters and the explanation in the SMB Server Shares and the SMB Client Shares you can find a useful Article Windows Server 2012 File Server Tip: New per-share SMB client performance counters provide great insight . 

 

SMB Server Shares

For the Cluster Share Volume  i have create a new Workspace but this depends from you how you want to organize it.

In my Lab and Production  i have separate it in 2 Workspaces the Cluster and CSV Traffic.

  • Decide if you want to use another Workspace or the one that you already have create
  • Click Add Counter
  • In the Source Select the Coordinator Node. In my Lab is the SRCL01
  • In the Object write down in the Search field  SMB Server Shares and selected
  • In the Instance field select the ID of the CSV 
  • In the Counter field select all the Counters.
  • In the Graph Type field you  can select only Report because you have a lot of Counters selected.

 

 

SMB Client Shares

In the non coordinator node you can use the Performance Counter SMB Client Shares to monitoring the traffic.

  • Click Add Counter
  • In the Source Select the non-Coordinator Node. In my Lab is the SRCL02
  • In the Object write down in the Search field  SMB Client Shares and selected
  • In the Instance field select the ID of the CSV 
  • In the Counter field select all the Counters.
  • In the Graph Type field you  can select only Report because you have a lot of Counters selected.

 

 

Cluster CSV File System

The Cluster CSV File System has a lot of counters inside that it will be better to split into different categories for better understanding.

Before start to explain let's understand a very important point which is the Mode of the Cluster Share Volume.

The Cluster Share Volume will be in the one of the following modes

  • Direct Mode = In Direct Mode all the IO send directly to the storage bypasses the NTFS or Refs stack
  • File System Redirected IO Mode = In this Mode all the IO send through the NTFS Stack
  • Block Level Redirected IO Mode = In this mode all the IO passes through the local CSFVS proxy and written to the disk.sys in coordinating node. The result is to avoid traversing the NTFS or REFS stack twice.

To find the mode of the Cluster Share Volume open the Powershell as Administrator in one of the Nodes and type the following command.

Change the Cluster Disk 1 base on your Cluster Disk Names

Get-ClusterSharedVolume "Cluster Disk 1" | Get-ClusterSharedVolumeState

 

Now  let's see how can add the Performance Counters

  • Click Add Counter
  • In the Source Select the non-Coordinator Node. In my Lab are the SRCL02
  • In the Object write down in the Search field csv and select CSV File System
  • In the Instance select the Volume
  • In the Counter select all the Counters.
  • In the Graph Type you  can select only Report because you have a lot of Counters selected.

 

 

Now let's split the Counters into different counters base on the Microsoft Docs Cluster Shared Volume Performance Counters

Note that i don't have create these Counters. I read a lot until understand it all and use it in my Production and Lab Environment.

  • Redirected = In these counters you can monitoring if the IO is forwarded using File System Redirected IO. Be careful that these counters not include Block Level redirected IO. With these counters you can see the time which spenf the IP to send it to SMB from a non-coordinating node or in NTFS in coordinating node.

 

  • IO = In these Counters you can monitor if IO redirected using Block Level redirected IO or Direct IO

 

  • Volume= In these Counter you can monitoring the Status of the Volume. The Volume state representing as following
    • 0=Init State. In this State all files are invalidated and all IOs are failing
    • 1=Pause. In the State all the new IO are paused and the down-state (open fiels,lock files,oplock states ....) is clean
    • 2=Draining Stop. In this mode all the new IO are paused but maybe processing IO until complete
    • 3=State Down. In this mode all the new IO are paused and the down level state reapplied.
    • 4=Active. In this State all IOs processing normal

 

  • Latency=These counters can help you to monitoring how much time take IO inside to CSV waiting for  it's turn or the completion.

 

If you find interesting Windows Admin Center you can read more articles that i wrote in the following links

Of course for in depth instruction you can find in Microsoft Docs or you can download the ebook from Altaro How to Get the Most Out of Windows Admin Center by Eric Sorin

Until next article Have a nice weekend !!!

You can send me an email at info@askme4tech.com  or do your comments in Twitter or Facebook

I invite you to follow me on Twitter or Facebook. If you have any questions, send email to me at info@askme4tech.com.