VMC Design

This blog post is aimed to provide a quick reference for the list of design considerations for architecting and implementing VMware on AWS.

The post is divided into two sections. First section lists the key design considerations in a tabular format and the second one will have more details on the features/considerations.

List of key design considerations / features

Design Consideration

Details

Subscription

The Subscription model is similar to an AWS Regional Reserved Instance.

Subscriptions are fixed on the following attributes:

AWS Region

VMware Cloud Org

Host instance type

Shared Responsibility

Customers should understand the shared responsibility model.

VMWare is responsible for the lifecycle of vSphere, VSAN and NSX.

Customers are responsible for configuration of VMs, networks and applications they manage.

Cloud Services Console

VMware Cloud Services Console provides UI to manage

  • Organization
  • Billing
  • Identity and access to VMC.

AWS Accounts

  • VMWare AWS Account: SDDC is deployed in an AWS account managed by VMware
  • Customer AWS Account: The SDDC must be connected to a customer account so that SDDC can consume the AWS services in the account

Management Subnet

When deploying SDDC, customers must provide a number of hosts and management ip range.

CIDR

Max hosts in single AZ

Max hosts in multi AZ

/23

27

22

/20

251

246

/16

See config max.

See config max.

A VPC is created in VMWare managed aws account using the SDDC management IP address range provided during the deployment process, with several additional subnets also created within that VPC at the same time.

Connected VPC

Any services utilized in the customer account of connected VPC are paid by the customer

Host Types

i3 (36 cores, 512GiB RAM, and 10.37TiB raw storage). Suitable for:

  • General applications
  • Databases

i3en (96 cores, 768 GiB RAM and 45.84TB raw storage). Suitable for:

  • NoSQL Databases
  • Distributed File Systems
  • Data Warehouse with high random I/O

Cluster conversion from i3 to i3en can be done by contacting VMware.

Cluster

Types:

  • Single Host
  • Two-Host
  • Multi-Host. Maximum number of hosts: 16.

Once a cluster increases to 3 or more, the cluster cannot be shrunk to 1 or 2.

Allowed transformations:

  • 1—>2—>3←→4+
  • 1—>3←→4+

Stretched Cluster

Does not protect against regional level failures.

Add or remove hosts are done in pairs, one in each AZ.

Max hosts per cluster is the same as a standard cluster (16).

Minimum hosts supported for a stretched cluster is 2.

Once a stretched cluster increases to 6 or more, the cluster cannot be shrunk to 2 or 4 hosts.

Base / Primary Cluster (Cluster-1)

Base cluster, the first cluster deployed in a SDDC, hosts management appliances and NSX edges.

Datastores created in a base cluster:

  • vsanDatastore
  • workloadDatastore

Resource pools created in a base cluster:

  • Mgmt-Resourcepool for management appliances
  • Compute-Resourcepool for workloads

Additional Clusters

Logical networks created in SDDC are automatically shared across all clusters.

They contain only workloadDatastores

Custom Cluster

Specify just the number of CPU cores you need per host.

Reduces costs for running applications licensed per-core.

Changing the number of cores does not affect the price of the host.

Number of clusters

Large clusters vs Small clusters:

  • consolidation ratio
  • Separate clusters for production and non-production workloads

Island clusters:

  • Licensing; dedicated clusters for running windows, rhel, sql, oracle, etc.
  • Security requirements; like DMZ and production workloads co-existing on same host or cluster
  • Extremely resource intensive workloads can impact other workloads on the cluster.

Elastic DRS

Elastic DRS automatically adds or removes hosts from a cluster in response to utilization of cpu, ram and storage.

eDRS algorithm runs every 5 mins.

Configured at cluster level (in vcenter).

Elastic DRS baseline policy is always running and cannot be disabled

Elastic DRS Thresholds

Resource Pools

All workloads should be placed within Compute-ResourcePool.

It is possible to create workloads outside of Compute-ResourcePool. However, it is strongly discouraged due to the potential for creating resource contention between workloads which exist within Resource Pools and those that do not

Management appliance sizes

Supported sizes medium and large.

By default, a new SDDC is created with medium-sized NSX Edge and vCenter Server appliances.

Large-sized appliances are recommended for deployments with more than 30 hosts or 3000 VMs

Storage options

Directly attached storage using VSAN

AWS Storage Services

  • FSx for Windows File Server
  • FSx for Lustre
  • FSx for Netapp Ontap
  • EFS
  • S3

Managed Services Provider Storage

  • Rackspace
  • Faction

APN Storage partners

VSAN Logical Datastores

vSAN has been modified to present two logical Datastores from the same underlying physical storage.

  • vsanDatastore for management appliances
  • workloadDatastore for workloads

Separation exists purely as a means of enforcing permissions on the storage for management appliances

The free space, used space, and total capacity numbers will be identical between the two logical Datastores, because both reflect the same underlying pool of capacity.

VSAN Slack space

VSAN requires slack space for operations such as deduplication, object re-balancing, and for recovering from hardware outages.

Since November 2021 the slack space required is 20%, earlier it was 30%.

To enforce the slack space requirement eDRS will automatically scale up SDDC if storage consumption crosses 79%.

VSAN Storage policies

VSAN Encryption

Interconnectivity options

VPN

DX

VTGW

HCX

SDDC – DX Connectivity

Can connect to Direct Connect DX. Both private VIF and public VIF are supported.

Cannot be directly connected to DXGW. Use the SDDC group to connect DXGW through VTGW.

SDDC Group

A SDDC group can only have SDDC members in the same region.

VTGW

Attachments

  • VPC attachments
  • VPN attachments
  • DXGW attachments
  • Peering attachments
    • Inter Region VTGW to VTGW
    • Inter Region VTGW to TGW
    • Intra Region VTGW to TGW

Provides below connectivity

  • SDDCs in an SDDC group
  • SDDCs and one or more VPCs
  • SDDCs and on-premises via DXGW
  • SDDCs in other Regions (inter-Region)

Doesn’t allow native VPC to native VPC communication.

One Direct Connect VIF for on-prem connectivity to all SDDCs in the same group.

VSAN Policy

Compute Policies

Additional Bandwidth ( for N-S traffic )

In default configurations SDDC network has a single T-0 router (VMXNET3 NIC)

Additional T-0 routers can be created by creating traffic groups.

Requirements for traffic groups

– Must connect SDDC to VTGW , by creating a SDDC Group.

– SDDCs should have large-size management appliances and at least four hosts

NIC Bandwidths

– vmxnet3 10gbps

– TGW attachment 50gbps

Default Route

SDDC is configured with a default route which points to its upstream Internet Gateway (IGW)

Policy based VPN

when new routes are added, the routing tables must be updated on both ends of the network

Appropriate choice if

– On-prem VPN does not support BGP

– Only a few networks on either ends of VPN.

Layer 2 extension

Two ways to extend a network from on-prem.

– Layer-2 VPN Tunnel using NSX Autonomous edge appliance

– HCX network extension

BGP route filtering on VMC side

IP Addresses

Subnet for ENIs in Connected VPC

– Recommended subnet size

Management Subnet

Connected VPC

DHCP

  1. Native NSX DHCP server running in VMC.
    1. On launching SDDC, a NSX DHCP server is automatically attached with CGW.
  2. External DNS Servers using DHCP Relay
    1. DHCP Relay feature which helps customers forward DHCP requests to a 3rd party DHCP server.
    2. We can specify one or multiple DHCP servers. If first server doesn’t respond it goes to the next server
    3. DHCP servers can be running in VMC or on-prem or even in connected VPC.
    4. DHCP Relay cannot be enabled if any existing network segments are using the native NSX DHCP capabilities.

DNS Forwarders in VMC

  1. VMC comes with two DNS forwarders. We cannot add more forwarders or delete the two forwarders
    1. CGW DNS Forwarder
    2. MGW DNS Forwarder
  2. The forwarders support conditional forwarding.
  3. Each forwarder
    1. Has one ip address i.e. service Ip.
    2. is attached with one default DNS Zone
    3. can be attached upto 5 FQDN zones
  4. Service IP : An ip address in NSX edge that represents DNS Forwarder. To use the VMC DNS Forwarders, we shouldconfigure this IP as the DNS Server in VMs.

Conditional Forwarding

How does conditional forwarding work?

When a DNS query is received, the DNS forwarder compares the domain name in the query with the domain names in the FQDN DNS zones.

  1. If a match is found, the query is forwarded to the DNS servers specified in the FQDN DNS zone.
  2. If a match is not found, the query is forwarded to the DNS servers specified in the default DNS zone.

Zone Types

  1. FQDN Zone
  2. Default Zone

Each zone contains the following details

  1. List of domains
  2. Upstream DNS Servers for the domains
  3. Source IP

Source IP:

  1. If you specify a source IP, the DNS forwarder uses the IP when forwarding DNS queries to an upstream DNS server.
  2. If you do not specify a source IP, the DNS query packet’s source IP will be the DNS forwarder’s listener IP.
  3. When to use?
    1. If the listener IP is an internal address that is not reachable from the external upstream DNS server.

VMC DNS Gotcha

  1. Internal only DNS (link: here)

DNS Forwarder

NSX-T Tier 1 Gateways (MGW and CGW) only act as forwarders, relaying the queries from VMs to the actual DNS servers specified.

– Onprem DNS Servers

– Local DNS Servers

– Customer managed DNS on EC2

– AWS managed DNS using AWS Directory services

– Route53 resolver inbound endpoint

Don’t use Amazon’s provided DNS a.k.a .2 dns resolver. It has a limit of 1,024 queries per second from an endpoint, and in the case of VMC the limit is shared with all the VMs running within the SDDC.

Route 53

  1. Authoritative DNS Servers contain the final answer to a DNS query, generally an IP address.
  2. Recursive DNS servers (also known as DNS resolvers) find the correct authoritative answer for any DNS query
  3. Route53 is both an authoritative DNS and Recursive DNS Service
  4. Route53 resolver is a recursive DNS Server

Route 53 Resolver

  1. Route 53 Resolver is the Amazon DNS server (also sometimes referred to as “AmazonProvidedDNS” or the “.2 resolver”) that is available by default in all Amazon VPCs.
  2. When you create a VPC, the Route 53 Resolver that is created by default. The DNS server does not reside within a specific subnet or Availability Zone in a VPC.
  3. Hosted zones: when a hosted zone is associated with a vpc, it uses the hosted zone for name resolution.
  4. DNS queries to Route53 resolver from outside VPC are blocked. We should use an inbound endpoint.
  5. Endpoints answer DNS queries to and from your on-premises environment.
  6. Forwarding Rules: DNS resolvers on your network can forward DNS queries to Route 53 Resolver via this endpoint.Resolver conditionally forwards queries to resolvers on your network via this endpoint.

Route 53 Resolver DNS Firewall

The following traffic is filtered by the firewall.

  1. DNS queries originating within that VPC.
  2. DNS queries that pass through Resolver endpoints from on-premises resources.

DNS Forwarders in windows DNS Servers

  1. Forwarders
  2. Conditional Forwarders

Public IPs

Options

– Use public IPs provided by VMC

– BYOIP

To use the IPs provided by VMC, you can request them in the VMC console and configure NAT.

For BYOIP, NLBs have to be used.

BYOIPs

– In the Single SDDC environments, bring IPs to aws account which has connected VPC and assign the IPs to NLBs that have VMC vms as targets.

– In multiregion environments , create NLBs in custom VPCs and attach the VPCs to VTGW.

Reference architecture here

Backup

Disaster Recovery

Upgrades and Maintenance

VMware sends a notification email, 7 days before a regular update and 2 days before an emergency update

During upgrades the following operations fail

– Provisioning new vm

– Hot or Cold migrations

Add Ons

HCX is provided free to all SDDCs.

Log Insight trial version is enabled by default.

vRA

Carbon black

NSX Advanced firewall

Monitoring SDDC traffic

VMC supports IPFIX and port mirroring.

Hybrid linked mode

Every cloud SDDC has a dedicated vCenter instance that can, if required, be linked to an on-premises vCenter instance using the Hybrid Linked Mode (HLM) feature

Additional roles in vcenter

CloudAdmin

CloudGlobalAdmin

Integration with native aws services

RDS

ALB

Monitoring

  1. Cloudwatch agent -> Cloudwatch
  2. Kinesis agent -> Opensearch

Windows licensing

SQL server licensing

Models

  • Core based licensing: Standard and Enterprise editions
  • Server + CAL: Standard edition

vCenter Inventory

  • Datacenter: SDDC-Datacenter
    • Supports single datacenter
    • Cannot be renamed
  • Clusters: Cluster-n
    • Cluster-1 : hosts both management appliances and end-user workloads
  • Hosts
    • Mixed hosts in a cluster not supported
  • Resource pools
    • Mgmt-ResourcePool
    • Compute-ResourcePool
  • Datastores
    • vsanDatastore
    • workloadDatastore(1-n)
  • Networks: NSX-T virtual distributed switches

vCenter Permissions

  • Administrator
  • CloudAdmin Role
  • CloudAdmin Group
  • CloudAdmin User (cloudadmin@vmc.local)

VMWare AWS Account

Whenever a new cloud services Organization (Org) is created, VMware creates a sub-account within this master account which acts as the parent for all AWS resources used by that Org.

SDDC Underlay VPC

  • For every SDDC, VMware creates a new VPC within the VMware-owned AWS account for that SDDC’s Org.
  • This VPC will be created using the SDDC management IP address range provided during the provisioning process and several subnets will be created within that VPC.
  • An IGW and VGW will also be created for this VPC. These gateways enable internet and Direct Connect connectivity to the VPC.

ESXi Networking

  • ESXi hosts are provisioned with an Elastic Network Adapter(ENA) which provides their connectivity to the AWS underlay VPC.
  • ESXi hosts have a number of vmkernel interfaces:
    • management (vmk0)
    • vSAN (vmk1)
    • vMotion (vmk2)
    • AWS API (vmk4)
    • NSX Tunnel End Point

VPC Cross-Link

Cross-Account ENIs are used to create a connection between every host within the base Cluster of the SDDC to a subnet within the cross-linked VPC.

The cross-link provides the SDDC with a network forwarding path to services maintained within the customer-owned AWS account

The Availability Zone (AZ) of the cross-link subnet will be used to determine the AZ placement of the hosts.

VSAN Encryption

VSAN Encryption is enabled by default on each cluster deployed in your SDDC, and can’t be turned off.

vSAN uses AWS KMS to generate a Customer Master Key (CMK). CMK is used to generate KEK.

KEK is used to encrypt DEKs generated for each disk.

  • Can we change KEKs? Yes by using vSphere UI or vSAN API. It is called shallow rekey.
  • Changing the CMK or DEKs is not supported. If you must change the CMK or DEKs, create a new cluster and migrate your VMs and data to it.

Uplinks of tier-0 edge router

  • Internet Uplink: Connects to IGW
  • VPC uplink: Connects to cross-linked VPC
  • Direct Connect Uplink: To connect to Direct Connect VIF.

Windows and SQL License

Logging

  • vRealize Log Insight
  • Syslog
    • vCenter
    • ESXi
    • NSX-T

Additional details for the above points

Stretched Cluster

Reference: https://vmc.techzone.vmware.com/vmc-arch/docs/compute/vmc-aws-stretched-cluster#section1

Shared Responsibility Model

Source: https://docs.vmware.com/en/VMware-Cloud-on-AWS/solutions/VMware-Cloud-on-AWS.39646badb412ba21bd6770ef62ae00a2/GUID-31CC90E5EB22075B2313FA674D567F2A.html

VMWare Cloud Organization

Direct Connection

SDDC to DX over public VIF

SDDC to DX over private VIF

Source: https://vmc.techzone.vmware.com/vmc-arch/docs/network/vmc-aws-direct-connect#section2

SDDC Group

Multi-Edge SDDC

DNS Servers

https://docs.vmware.com/en/VMware-Cloud-on-AWS/solutions/GUID-25B7F9346825C50F67BF60403CCCAE21.html

https://aws.amazon.com/blogs/apn/choosing-the-right-dns-architecture-for-vmware-cloud-on-aws/

Elastic DRS

https://blogs.vmware.com/cloud/2019/05/15/elastic-drs-vmware-cloud-aws/

References

  1. VMWare Office Hours – Youtube
  2. VMWare AWS operations guide – here
  3. Vmc techzone sddc design – here
  4. Custom Cluster – VMWare blog
  5. Cluster types and sizes – vmctechzone
  6. Elastic DRS – vmctechzone
  7. Cluster design – vmware vmware
  8. SQL server licensing – vmware blog
  9. Very important links – vmc-aws-design-guide , vmc-aws-arch-guide
    1. https://vmc.techzone.vmware.com/vmc-solutions/activities/vmc-aws-design-guide#architectural-planning
      1. VMWare Cloud: Solution Design
      2. VMWare Cloud on AWS: SDDC Design
    2. https://vmc.techzone.vmware.com/vmc-arch/activities/vmc-aws-arch-guide#introduction
  10. vSphere Sizing
    1. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/certification/vmw-cimcasestudy-vspheresecurity-0.99.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *