After having participated in many Nutanix projects over the years, I thought it would be useful to some colleages out there to sum up some of the tips I´ve learned regarding the sizing process of a Nutanix HCI solution. I believe some of them applies to any HCI sizing process (I´ve also worked in more VMware vSAN projects than I can remember, which over time led into some form of HCI schizofrenia).
Back in the day when Nutanix Sizer wasn ´t available, Nutanix sizing was performed almost entirely by hand. First, the focus was dead on profiling each workload and requirements as accurately as possible, and from there the challenge was narrowed down to matching the resulting workload with the correspondent Hardware Configuration accordingly.
These days we are lucky enough to have the Nutanix Sizer tool around, so at least this last part of the puzzle can be easily solved.
What hasn´t changed over the years is the need for profiling the required workloads to the most accurate extent, since the proposed hardware configuration is (obviusly) a direct result of this profiling.
Let´s dive in on a couple of thing to consider when defining workloads and tweaking Sizer options:
When defining workloads, it´s almost silly to emphasize that Customer´s input is CRUCIAL. They are the ones that know their applications best, so at this stage we need to open your ears and listen carefully. Taking notes as a madman won´t hurt either.
Pay attention to the different types of applications (cloud native, legacy) and in particular the storage requirements for each. If you are lucky enough you may even get some storage metrics for existing apps in current environments (current I/O access patterns, IOPs, throughput, latencies, etc).
A simple way to estimate the active working set of any given workload is to take a look at it´s Backup statistics. The logic behind this is thoroughly explained in the following blog from Josh Odgers:
Basically, you can calculate the delta between backups in order to estimate the active working set of the workload.
You shoud end up with all workloads categorized in some of the following groups (just a guideline list):
- DB Server VMs
- Mailbox Server VMs
- Mission-critical VMs
- Application Server MVs
- Web Server VMs
- Core Services (AD, DNS, DHCP, etc.)
- General Purpose VMs
- Splunk VMs
Note: For profiling Oracle workloads you can use the Oracle Automatic Workload Repository (AWR) report (requires additional Oracle Diagnostics Pack licensing)
Once these workloads are profiled accordingly, we need to work with Customer on the definition of vCPU:pCPU ratios, Comp/Dedup/EC enablement, etc.
Manual vs Auto
BIG discussion topic among my peers in the past. As a rule of thumb, once the workloads are defined you should start with automatic configuration and work your way from there. Recently the Sizer tool added the ability to modify Auto configs without changing them into Manual Mode:
Also, in Automatic Mode you can specify a really cool subset of options, most of them self-explanatory:
- Type of models
- Cluster type
- Failover level
- Max node count per cluster
- Max budget per cluster
- Utilization Thresholds
- Solution Options
Manual Scenarios might come in handly when working with existing heterogeneous configurations.
Following are some considerations regarding workloads imported from RVtools reports:
- Powered-off VMs are not considered/included
- Snapshot / Point in time data (no analytics)
- Many auto-generated, not-so-meaningful categories:
Sometimes, getting detalied data on all workloads from Customer is harder than it should be. If you are in a dead end and all you ever going to get from customer are some RVtools reports, Raw Input is your friend.
You can categorize each workload in the spreadsheets and then create workloads based on the sum of allocated/utilized resources, and then load them in Sizer as Raw Input.
Keep in mind all existing Best Practices for each workload. Specifically for MS Exchange Mailbox Servers and MS SQL Server DB Servers, the maximum vCPU:pCPU ratio supported by Microsoft for productive environments is 2:1.
As a starting point I´d suggest going with these values:
- VDI: Starting at 8:1
- Production Exchange Mailboxes. 1:1 to 2:1
- Production MS SQL Server DB Servers: 1:1 to 2:1
- Non-Production MS SQL Server DB Servers: 2:1 to 4:1
- Mission-critical VMs: 1:1
- Oracle DB / SAP HANA: 1:1
Compression, Deduplication and Erasure Coding
Some particular workloads gets the most of each of these features, so I´d suggest to take the time to enable the right feature for the right workload.
Most VDI workloads benefits heavily by enabling Deduplication, Compression won´t be effective on digitalized / video surveillance data and EC only works for really cold data.
Also keep in mind the subtleties of Inline vs Post-Process.
I always enable compression with a 30% value, which I think is fairly conservative and can be easily achieved by most workloads. Enabling compression will also improve the efficiency of the Cache tier, so I always keep Inline Compression On by default.
Erasure Coding won´t do nothing for your hot data, and according to Nutanix read performance may be degraded during failure scenarios.
Its benefits won´t be evident right away. It might take some time to see actual benefits, and more often than not these won´t be as flamboyant as expected.
If you don´t have a specific Use Case for it I would leave it alone.
CVM´s CPU overhead
CVMs will use all of it´s assigned resources when needed, and certain processes like Curator and MapReduce scans can be CPU-intensive.
If you opt for smaller processors with fewer cores, keep in mind that the minimum CPU for CVMs for most NX platforms is 8 cores, and eventually CVMs will consume 100% if needed.
Budget-Minded vs Workload-Oriented Sizing
Last but not least! At some point you will be seduced by the enchanting music of the Budget-Minded Sizing´s Sirens (Wait! What a fantastic name for a band!)
Just say no. Focus on defining those workloads, which is the only way to be absolutely sure that your proposed solution will perform as expected. If it goes out of budget but it works, then let the End-of-Quarter Discount Gods (I know, I have to stop) do their thing.
Being able to perform an accurate and realistic Sizing will pay off in customer´s confidence, not only in the technologies involved, but in the added value of your professional services as well.
I leave you with some words of advice from our pal Gene Kranz:
Hope you find this information helpful!