NSX Design – Best Practices Reference Notes

VMWorld 2017 – NET1535BE – VMware NSX Design: Reference Design for SDDC with NSX and vSphere: Part 1

  1. NSX Manager: DB Schema is specific to version – Restore works only with like-to-like version
  2. Controllers:
    1. Storage resiliency is required in terms of access, paths, disks and LUNs.
    2. Consider IO Oversubscription (Frequent writes and Random Reads.)
  3. VDS & Transport Zone:
    1. 1 VDS for Compute Cluster and 1 VDS for Edge cluster -Recommended Configuration
      1. Flexibility of choosing NIC teaming mode
      2. Edge VLANs (talks to physical hardware) need not configure for compute cluster. (Restrict vlan proliferation).
      3. SPAN, IPFIX, Packet Capture configuration flexibility
    2. Transport zones span Edge cluster and compute cluster.
    3. Transport zone is provisioning and management boundary – not security and data boundary
  4. MTU:
    1. 1600 required. Recommended 9000 (future proof –for high throughput traffic, 8900 MTU on Storage VM )
  5. VXLAN VLAN Id – Consistent across transport zone
    1. L2 design – vlan and subnet spans across racks
    2. L3 design –subnets are specific to rack, but vlan spans across racks
  6. VDS Uplink design
    1. Source port id based : Recommended
    2. LACP teaming mode: Discouraged
      1. LACP NSX edge gateway has supportability issues ( required from hardware switch vendors)
      2. LACP can be used in compute cluster
      3. Brownfield: if we cannot , yes proceed
  7. VTEP design
    1. # of VTEPs:
      1. if VXLAN throughput requirement is more than 10 Gbps, then you need more than 1 VTEP
      2. if deterministic traffic mapping to uplink is desired ( Explicit failover only)
    2. IP Addressing:
      1. L2 Fabric: Single Subnet. Reserve IP address for future growth: /22 recommended
      2. L3 Fabric: Multiple subnets (one per rack) for L3 fabrics.
  8. Edge Cluster Design
    1. Consider Rack availability
  9. ESG Design
    1. Active – Standby
      1. Heartbeat: 9 seconds. L2 connectivity required between active and standby
      2. Protocol timers: hello/hold (40/120). If we don’t tune timers adjacency will be lost on ToR switches and induces another failure.
    2. ECMP
      1. No Heartbeat.
      2. Protocol timers: hello/hold (1/3) 4 second convergence
      3. Sub second convergence not possible for real time traffic (VoIP, Video Conference, etc.)
      4. Bidirectional Forward Detection BFD not yet available.
  10. DLR
    1. DLR Control VM and NSX controllers are not in data path
  11. ECMP with DLR and Edge
    1. Don’t put ECMP NSX Edge VMs and DLR Control VM on same host.
    2. Consider graceful restart design options.
  12. Graceful Restart Guidance
    1. Active-Standby: Enable Graceful restart.
    2. ECMP: Disable Graceful restart
      1. ToR has single control plane ( no dual hypervisor) : Disable graceful restart
      2. ToR has dual hypervisor: Choice -> Disable @ physical router or @ ESG. Recommended -> Disable @ ESG.
  13. NSX edge routing design with Rack Mount Server
    1. Edge uplink = Host uplink = VLAN = Adjacency
  14. Routing Protocol & Topology
    1. NSX domain act as a stub network
      1. Send default route to NSX edges. NSX edges will send summarized routes.
      2. For OSPF it is a Stub Area
    2. Use consistent one protocol end-to-end between ESG-Physical and ESG-DLR
    3. Recommended: BGP
    4. OSPF
      1. Multi Tenancy is difficult
    5. BGP
      1. Multi Tenancy is possible.
  15. NSX connectivity with BGP
    1. Advertise summarized routes from NSX domain to Physical
    2. Advertise default route to NSX domain.
    3. Recommended: Run EBGP from Physical -> Edge->Control VM
  16. Bridging

VMWorld 2017 – NET1536BE – VMware NSX Design: Reference Design for SDDC with NSX and vSphere: Part 2

  1. DC Design consideration – Compute Cluster
    1. Rack based vs Multi rack (horizontal) striping.
  2. Small design
    1. Use case: Single rack design, Only Micro segmentation.
    2. VM mobility is within a rack – no need of VXLAN, DLR
    3. Centralized Edge: Active – Standby Edge. ( use stateful resources)
  3. Medium design
    1. Separate Edge cluster and computer cluster ( Medium & Large)
    2. You can combine Management & Edge Cluster: When you grow Edge can be separated from Management
    3. Don’t combine Edge Cluster & Compute cluster: Compute cluster may grow (type of hardware may change, Operational boundary, VLAN prowl)
    4. Edge Cluster: Minimum 3 hosts: ECMP Edge1 (Host1), ECMP Edge 2 (Host2), DLR Control Active VM ( Host3)
  4. Large Design
    1. For Cross-VC and SRM Deployments: Separation of Management cluster is inevitable.
      1. Dedicated Edge cluster
    2. Edge Cluster
      1. Minimum four hosts: ECMP Edge 1-2 (Host1), ECMP Edge 3-4 (Host2), DLR Control VM Active ( Host3), DLR Control VM Standby (Host4)
      2. (Optional): NSX Controllers can be hosted on Edge cluster for optimizing Edge host utilization.
  5. NIC Card performance
    1. Core limits
    2. For higher throughput
      1. Higher MTU
      2. TSO, LRO & RSS enabled cards
      3. Disable CPU power saving mode
      4. Disable Hyper threading on host
  6. Edge cluster design: Oversubscription
    1. Choices

VMWorld 2017 – NET1775BU – Advanced VMware NSX: Demystifying the VTEP, MAC, and ARP Tables

  1. NSX Controller tables
  2. Controller Disconnected Operation (CDO) Mode : CDO Logical switch is used, when Controllers are not available, by all hosts for BUM traffic


VSAN Design and Sizing
No tags for this post.

Leave a Comment