Cloud Best Practices

Design Principles

  • Scalability
  • Disposable Resources Instead of Fixed Servers
  • Automation
  • Loose Coupling
  • Services, Not Servers
  • Databases
  • Removing Singe Points of Failure
  • Optimize for Cost
  • Caching
  • Security
  1. Scalability
    • Scalable architecture support growth in users, traffic or data size with no drop in performance
    • Scaling Vertically : Increase the specifications of individual resources ( CPU,Harddrive ,etc)
      • Pros: Easy to implement. Sufficient for many use cases in short term
      • Cons: Eventually hit a limit. Not cost effective. Not highly available
    • Scaling Horizontally: Increase the number of resources. E.g. Add more drives to array , Add more servers to application
      • Pros: Great for Internet-Scale applications
      • Cons: Not all applications are designed for this architecture
    • Stateless Applications: An application that needs no knowledge of previous interactions and stores no session information. When more capacity is required add more resources and terminate some resources when added capacity is not required. To distribute load you can use of these methods
      • Push model:
        1. Load balancing solutions (ELB),
        2. Round robin DNS (Route53) – easy to implement but has limitations ( caching DNS resolvers)
      • Pull model: Data that needs to be processed could be stored as messages. Multiple compute needs can pull and consume those messages
        1. Amazon Simple queue service
        2. Amazon kinesis
    • Stateless components. Some components can be made stateless
      • User session information : Browser cookies , Dynamo DB
      • Storage of large files : S3 , EFS
      • Multistep workflow : Amazon Simple Workflow service (SWF)
    • Stateful components. Some components cannot be made stateless. Consider the following
      • ELB Sticky sessions for Https session affinity
      • Client side load balancing
    • Distributed processing: Divide a task and its data into small fragments of work and execute them in large set of available compute resources
      • Offline batch jobs : Apache Hadoop, AWS Elastic Map Reduce
      • Real time processing : Amazon Kinesis
  2. Disposable resources instead of fixed servers: When designing your application for AWS, think of servers and other components as temporary resources.
    • Immutable infrastructure pattern: Once server is launched, it is never updated through its lifetime. If there is a problem or need to be updated then replace it with new server.
    • Approaches to achieve an automated and repeatable process
      • Bootstrapping: Execute automated bootstrapping actions. Parameterize configuration details
        1. User data scripts
        2. AWS Opsworks
        3. Configuration management tools
        4. AWS APIs
        5. AWS Cloud formation
      • Golden Images:
        1. AMI: Customize an EC2 instance and save its configuration by creating AMI. Recommended to use bootstrap script to modify EC2 Instances to create AMIs
        2. Amazon RDS Snapshot
        3. Container
      • Hybrid. Both bootstrapping and Golden images
    • Infrastructure as code : Since AWS assets are programmable apply software development principles and practices to make infrastructure reusable,Maintable,extensibale and testable
      • AWS Cloud formation
  3. Automation :
    • AWS Elastic Beanstalk
    • EC2 Auto recovery : Create Cloud watch alarm to monitor the instance and automatically recover if it becomes impaired
    • Autoscaling
    • Amazon Cloudwatch Events: Route each type of event to AWS Lambda function , Amazon Kinesis, SNS etc
    • AWS Ops Lifecycle events
    • AWS Lambda Scheduled events
  4. Loose Coupling : Design IT Systems in way that reduces interdependencies – A change or failure in one component should not cascade to other components
    • Well defined interfaces
      • Components should interact with each other through technology agnostic interfaces like REST APIs.
      • Consider Amazon API Gateway
    • Service Discovery: Services of an application could be running on multiple compute resources. Each service should be able to discover without knowing the prior knowledge of network topology
      • Elastic Load balancing service
      • Service registration and discovery method : Custom Solutions using tags, database, Scripts calling AWS API ,Opensource tools like Netflix Eureka, Airbnb Synapse, Hashicorp Consul
    • Asynchronous Integration: One component generates events and other components consume. Suitable for interactions that do not need immediate response and acknowledgement will suffice.
      • Amazon SQS
      • Amazon kinesis streams
      • Amazon SWF
      • AWS Lambda functions
    • Graceful failure : Handle component failure in a graceful manner
      • Retry with exponential back off and jitter strategy
      • Store in queue for later processing
      • Provide alternate or cached content
      • Automatically route to backup site
  5. Services , Not Servers
    • Managed Services
      • SQS
      • S3
      • CloudFront
      • ELB
      • etc..
    • Serverless architectures
      • AWS Lambda
      • Amazon Cognito : Mobile apps
  6. Databases
    • Relational Databases
      • Scalability: Vertical scaling by upgrading instance. Aurora provides higher throughput than MySQL. For read-heavy applications create read replicas. Scale write capacity by through data partitioning or sharding.
      • High Availability : For production AWS recommends to use RDS Multi-AZ deployment feature
      • Anti-Patterns: For application that primarily indexes and queries with no need for joins or complex transactions consider NoSQL engine. For Blobs use S3 for data , use database only for metadata
    • NoSQL Databases:
      • Scalability: NoSQL database engines typically perform data partitioning and replication to scale both writes and reads in horizontal fashion. Amazon DynamoDB
      • High Availability : DynamoDB synchronously replicates data across Zones in a region
      • Anti-Patterns: If DB schema cannot be normalized and your application requires joins or complex transactions consider relational database.
    • Data Warehouse: Specialized type of relational database optimized for analysis and reporting of large amounts of data. Amazon Redshift
      • Scalability: Amazon Redshift MPP architecture enables you to increase performance by increasing the number of nodes in data warehousing cluster
      • High Availability: Deploy production workloads in multi-node cluster. Data is continuously backed up to S3. Amazon redshift continuously monitors and re-replicates data from failed drives and replace nodes as necessary
      • Anti-patterns: Redshift is SQL based RDBMS. Not designed for OLTP functions. Not for high concurrency workloads; that read/write to all columns in small set of rows, use RDS or DynamoDB
    • Search: Amazon ElasticSearch and CloudSearch. Search service can be used to index and search both structured and free text. Supports functionality that is not available in other databases, such as customizable result ranking etc.
      • Scalability: Both services use data partitioning and replication to scale horizontally.
      • High Availability: Both services store data redundantly across Availability zones
      • Elastic Search vs Cloud Search:
        1. Elastic search -> Gives more control and provides Opensource API for configuration. Evolved to be analytical engine for some use cases.
        2. Cloud search -> Managed service. Scales automatically.
  7. Removing Single points of failure
    • Introducing Redundancy: Implemented either by Standby Redundancy(A-P) or Active Redundancy ( A-A)
    • Detect Failure: Configure health checks, Setup alarms
    • Durable data storage: Data replication. Replication modes : Synchronous Replication, Asynchronous Replication ,Quorum based Replication ( Combines both Synchronous and Asynchronous replication)
      Note1: Durability is not replacement for backups
      Note2: Type of replication depends on RTO/RPO
    • Automated Multi-Datacenter Resilience: Available options ->DR Plan. Replicate data across datacenters (AZs). Active redundancy
    • Fault Isolation and Traditional Horizontal Scaling: Options -> Sharding ,Shuffle Sharding
  8. Optimize for Cost
    • Right Sizing: Select Right Instance Type, Right Storage Solution. Implement tagging and use AWS provided tools to identify cost saving opportunities.
    • Elasticity: Use Autoscaling. Turn off non-production workloads not in use. Where possible replace EC2 Workloads with AWS managed services
    • Take advantage of purchasing options: Reserved capacity, Spot Instances, Bidding strategy – you are charged for spot price and not bid price, Mix with On-Demand.
  9. Caching
    • Application Data Caching: Applications can be designed to store and retrieve information from in-memory caches. Amazon Elastic Cache supports Memcached and Redis ( Opensource caching engines)
    • Edge Caching : Copies of static and dynamic content can be cached at edge location using CloudFront
  10. Security
    • Utilize AWS features for Defense in Depth : VPC , Web Application Firewall (WAF) , IAM ,In transit and At Rest security features
    • Offload Security responsibility to AWS: AWS operates on shared security responsibility model. Use AWS Managed services where possible, AWS will be responsible for security for managed services
    • Reduced Privileged Access:
      1. Treat servers as programmable resources. Eliminate the need for guest OS to access production environments. Implement Just in time access by using API.
      2. Use IAM roles instead of service accounts. For mobile applications use Amazon Cognito tokens.
    • Security as Code: Implement AWS Cloudformation script to capture security policy and reuse. Use script in security testing and continuous integration. Import AWS Cloudformation templates into AWS service catalog and apply IAM permissions to control it.
      1. AWS Config
      2. Amazon Inspector
      3. AWS Trusted Advisor
      4. Extensive logging by Cloudwatch and AWS CloudTrail. Process logs to find any non-compliance.
        Real time Auditing : Implement continuous monitoring and automation of controls to minimize exposure to security risks


Storage Options and Use cases

Leave a Comment