Google Cloud Digital Leader Notes and questions


DOWNLOAD PDF LINK

GCP has 200+ services :

This exam expects knowledge of 40+ Services

Exam tests your decision making abilities:

Which service do you choose in which situation?

This course is designed to help you make these choices

Our Goal : Help you start your cloud journey AND get certified 

Challenging certification - Expects you to understand and REMEMBER a number of services

As time passes, humans forget things. How do you improve your chances of remembering things?

Active learning - think and take notes

Review the notes every once in a while


 Challenge:

  1. Peak usage during holidays and weekends
  2. Less load during rest of the time
  3. Solution (before the Cloud):
  4. PEAK LOAD provisioning : Procure (Buy) infrastructure for peak load
  5. What would the infrastructure be doing during periods of low loads?
  6.  Startup suddenly becomes popular
  7. How to handle the sudden increase in load?
  8. Solution (before the Cloud):
  9. Procure (Buy) infrastructure assuming they would be successful
  10. What if they are not successful? 


High cost of procuring infrastructure

Needs ahead of time planning (Can you guess the future?) Low infrastructure utilization (PEAK LOAD provisioning)

Dedicated infrastructure maintenance team (Can a startup afford it?)

How about provisioning (renting) resources when you want them and releasing them back when you do not need them?

On-demand resource provisioning . Also called Elasticity. 

Trade "capital expense" for "variable expense"

Benefit from massive economies of scale

Stop guessing capacity

Stop spending money running and maintaining data centers

"Go global" in minutes

 

GCP is the one of the Top 3 cloud service providers Provides a number of services (200+)

Reliable, secure and highly-performant:

Infrastructure that powers 8 services with over 1 Billion Users: Gmail, Google Search, YouTube etc

One thing I love : "cleanest cloud"

Net carbon-neutral cloud (electricity used matched 100% with renewable energy)

The entire course is all about GCP. You will learn it as we go further.

 

Cloud applications make use of multiple GCP services

There is no single path to learn these services independently HOWEVER, we've worked out a simple path!

Create GCP Account

Regions and Zones


Imagine that your application is deployed in a data center in London What would be the challenges?

Challenge 1 : Slow access for users from other parts of the world (high latency)

Challenge 2 : What if the data center crashes?

Your application goes down (low availability)

 

Let's add in one more data center in London What would be the challenges?

Challenge 1 : Slow access for users from other parts of the world

Challenge 2 (SOLVED) : What if one data center crashes?

Your application is still available from the other data center

Challenge 3 : What if entire region of London is unavailable?

Your application goes down

 

Let's add a new region : Mumbai What would be the challenges?

Challenge 1 (PARTLY SOLVED) : Slow access for users from other parts of the world

You can solve this by adding deployments for your applications in other regions

Challenge 2 (SOLVED) : What if one data center crashes?

Your application is still live from the other data centers

Challenge 3 (SOLVED) : What if entire region of London is unavailable?

Your application is served from Mumbai

 Imagine setting up data centers in different regions around the world

Would that be easy?


Solution

  1. Google provides 20+ regions around the world
  2. Expanding every year
  3. Region :Specific geographical location to host your resources
  4. Advantages: High Availability
  5. Low Latency
  6. Global Footprint
  7. Adhere to government regulations

 

How to achieve high availability in the same region (or geographic location)?

Enter Zones

Each Region has three or more zones

(Advantage) Increased availability and fault tolerance within same region

(Remember) Each Zone has one or more discrete clusters

Cluster : distinct physical infrastructure that is housed in a data center

(Remember) Zones in a region are connected through low-latency links

 

Compute

In corporate data centers, applications are deployed to physical servers

Where do you deploy applications in the cloud?

Rent virtual servers

Virtual Machines - Virtual servers in GCP

Google Compute Engine (GCE) - Provision & Manage Virtual Machines

Create and manage lifecycle of Virtual Machine (VM) instances Load balancing and auto scaling for multiple VM instances Attach storage (& network storage) to your VM instances

Manage network connectivity and configuration for your VM instances

Our Goal:

  1. Setup VM instances as HTTP (Web) Server
  2. Distribute load with Load Balancers

 

Let's create a few VM instances and play with them Let's check out the lifecycle of VM instances

Let's use SSH to connect to VM instances

Commands:

  1. sudo su - execute commands as a root user
  2. apt update - Update package index - pull the latest changes from the APT repositories
  3. apt -y install apache2 - Install apache 2 web server
  4. sudo service apache2 start - Start apache 2 web server
  5. echo "Hello World" > /var/www/html/index.html - Write to index.html
  6. $(hostname) - Get host name
  7. $(hostname -I) - Get host internal IP address

 

IP Address Description

Internal IP Address Permanent Internal IP Address that does not change during the lifetime of an instance

Ephemeral External IP Address that changes when an instance is stopped

Static IP Address Permanent External IP Address that can be attached to a VM

 

How do we reduce the number of steps in creating an VM instance and setting up a HTTP Server?

Let's explore a few options:

  1. Startup script 
  2. Instance Template Custom Image

 

Bootstrapping: Install OS patches or software when an VM instance is launched.

In VM, you can configure Startup script to bootstrap

DEMO - Using Startup script

 

Why do you need to specify all the VM instance details (Image, instance type etc) every time you launch an instance?

How about creating a Instance template?

Define machine type, image, labels, startup script and other properties

Used to create VM instances and managed instance groups

Provides a convenient way to create similar instances

CANNOT be updated

To make a change, copy an existing template and modify it

(Optional) Image family can be specified (example - debian-9):

Latest non-deprecated version of the family is used

DEMO - Launch VM instances using Instance templates

 

Installing OS patches and software at launch of VM instances

increases boot up time

How about creating a custom image with OS patches and software pre-installed?

Can be created from an instance, a persistent disk, a snapshot, another

image, or a file in Cloud Storage

Can be shared across projects

(Recommendation) Deprecate old images (& specify replacement image)

(Recommendation) Hardening an Image - Customize images to your corporate security standards

Prefer using Custom Image to Startup script

DEMO : Create a Custom Image and using it in an Instance Template

 

Automatic discounts for running VM instances for significant portion of the billing month

Example: If you use N1, N2 machine types for more

than 25% of a month, you get a 20% to 50% discount on every incremental minute.

Discount increases with usage (graph) No action required on your part!

Applicable for instances created by Google Kubernetes Engine and Compute Engine RESTRICTION: Does NOT apply on certain

machine types (example: E2 and A2)

RESTRICTION: Does NOT apply to VMs created by App Engine flexible and Dataflow

For workloads with predictable resource needs

Commit for 1 year or 3 years

Up to 70% discount based on machine type and GPUs

Applicable for instances created by Google Kubernetes Engine and

Compute Engine

(Remember) You CANNOT cancel commitments

Reach out to Cloud Billing Support if you made a mistake while purchasing commitments

 

Short-lived cheaper (upto 80%) compute instances

Can be stopped by GCP any time (preempted) within 24 hours

Instances get 30 second warning (to save anything they want to save)

Use Preempt VM's if:

  1. Your applications are fault tolerant
  2. You are very cost sensitive
  3. Your workload is NOT immediate
  4. Example: Non immediate batch processing jobs

RESTRICTIONS:

  1. NOT always available
  2. NO SLA and CANNOT be migrated to regular VMs NO Automatic Restarts
  3. Free Tier credits not applicable

 

Shared Tenancy (Default)

Single host machine can have instances from multiple customers

Sole-tenant Nodes: Virtualized instances on hardware dedicated to one customer

Use cases:

  1. Security and compliance requirements: You want your VMs to be physically separated from those in other projects
  2. High performance requirements: Group your VMs together
  3. Licensing requirements: Using per-core or per-processor "Bring your own licenses"

 

What do you do when predefined VM options are NOT appropriate

for your workload?

Create a machine type customized to your needs (a Custom Machine Type)

Custom Machine Type: Adjust vCPUs, memory and GPUs

Choose between E2, N2, or N1 machine types

Supports a wide variety of Operating Systems: CentOS, CoreOS, Debian, Red Hat, Ubuntu, Windows etc

Billed per vCPUs, memory provisioned to each instance

Example Hourly Price: $0.033174 / vCPU + $0.004446 / GB


 2 primary costs in running VMs using GCE:

  1. Infrastructure cost to run your VMs
  2. Licensing cost for your OS (ONLY for Premium Images)

Premium Image Examples: Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Ubuntu Pro, Windows Server, ..


Options For Licensing:

  1. You can use Pay-as-you-go model (PAYG) OR
  2.  (WITHIN A LOT OF CONSTRAINTS) You can use your existing license/subscription (Bring your own subscription/license - BYOS/BYOL)

(RECOMMENDED) If you have existing license for a premium image, use it while your license is valid

After that you can shift to Pay-as-you-go model (PAYG)

 

Image

  1. What operating system and what sohware do you want on the VM instance? Reduce boot time and improve security by creating custom hardened Images.
  2. You can share an Image with other projects
  3. Machine Types
  4. Optimized combination of compute(CPU, GPU), memory, disk (storage) and networking for specific workloads.
  5. You can create your own Custom Machine Types when existing ones don't fit your needs
  6.  
  7. Static IP Addresses: Get a constant IP addresses for VM instances
  8. Instance Templates: Pre-configured templates simplifying the creation of VM instances
  9. Sustained use discounts: Automatic discounts for running VM instances for significant portion of the billing month
  10. Committed use discounts: 1 year or 3 year reservations for workloads with
  11. predictable resource needs
  12. Preemptible VM: Short-lived cheaper (upto 80%) compute instances for non- time-critical fault-tolerant workloads

 

  • How do you create a group of VM instances?
  • Instance Group - Group of VM instances managed as a single entity
  • Manage group of similar VMs having similar lifecycle as ONE UNIT
  • Two Types of Instance Groups:
  • Managed : Identical VMs created using a template:
  • Features: Auto scaling, auto healing and managed releases
  • Unmanaged : Different configuration for VMs in same group:
  • Does NOT offer auto scaling, auto healing & other services
  • NOT Recommended unless you need different kinds of VMs
  • Location can be Zonal or Regional
  • Regional gives you higher availability (RECOMMENDED)

 

Managed Instance Group - Identical VMs created using an instance template

Important Features:

  1. Maintain certain number of instances
  2. If an instance crashes, MIG launches another instance
  3. Detect application failures using health checks (Self Healing) Increase and decrease instances based on load (Auto Scaling) Add Load Balancer to distribute load
  4. Create instances in multiple zones (regional MIGs)
  5. Regional MIGs provide higher availability compared to zonal MIGs
  6. Release new application versions without downtime
  7. Rolling updates: Release new version step by step (gradually). Update a percentage of instances to the new version at a time.
  8. Canary Deployment: Test new version with a group of instances before releasing it across all instances.

 

Instance template is mandatory :

  1. Configure auto-scaling to automatically adjust number of instances based on load:
  2. Minimum number of instances
  3. Maximum number of instances
  4. Autoscaling metrics: CPU Utilization target or Load Balancer Utilization target or Any other metric from Stack Driver
  5. Cool-down period: How long to wait before looking at auto scaling metrics again?
  6. Scale In Controls: Prevent a sudden drop in no of VM instances
  7. Example: Don't scale in by more than 10% or 3 instances in 5 minutes
  8. Autohealing: Configure a Health check with Initial delay (How long should you wait for your app to initialize before running a health check?)

 

Distribute traffic across VM instances in one or more regions

Managed service:

Google Cloud ensures that it is highly available

Auto scales to handle huge loads

Load Balancers can be public or private

Types:

External HTTP(S)

Internal HTTP(S) SSL Proxy

TCP Proxy

External Network TCP/UDP Internal TCP/UDP


Managed Services

Do you want to continue running applications in the cloud, the same way you run them in your data center?

OR are there OTHER approaches?

You should understand some terminology used with cloud services:

IaaS (Infrastructure as a Service)

PaaS (Platform as a Service) FaaS (Function as a Service) CaaS (Container as a Service) Serverless

Let's get on a quick journey to understand these!

 

Use only infrastructure from cloud provider

Example: Using VM to deploy your applications or databases

You are responsible for:

  1. Application Code and Runtime
  2. Configuring load balancing Auto scaling
  3. OS upgrades and patches Availability
  4. etc.. ( and a lot of things!)

 

Use a platform provided by cloud

Cloud provider is responsible for:

  1. OS (incl. upgrades and patches)
  2. Application Runtime
  3. Auto scaling, Availability & Load balancing etc..
  4. You are responsible for:
  5. Configuration (of Application and Services)
  6. Application code (if needed)

Varieties:

  1. CAAS (Container as a Service): Containers instead of Apps
  2. FAAS (Function as a Service): Functions instead of Apps
  3. Databases - Relational & NoSQL (Amazon RDS, Google Cloud SQL, Azure SQL Database etc), Queues, AI, ML, Operations etc!

 


Enterprises are heading towards microservices architectures

Build small focused microservices

Flexibility to innovate and build applications in different programming languages (Go, Java, Python, JavaScript, etc)

BUT deployments become complex!

How can we have one way of deploying Go, Java, Python or JavaScript .. microservices?

Enter containers!

 

Create Docker images for each microservice Docker image has all needs of a microservice:

Application Runtime (JDK or Python or NodeJS)

Application code and Dependencies

Runs the same way on any infrastructure:

Your local machine

Corporate data center Cloud

Advantages

Docker containers are light weight

Compared to Virtual Machines as they do not have a Guest OS

Docker provides isolation for containers Docker is cloud neutral

 

Requirement : I want 10 instances of Microservice A container, 15 instances of Microservice B container and ....

Typical Features:

  1. Auto Scaling - Scale containers based on demand
  2. Service Discovery - Help microservices find one another
  3. Load Balancer - Distribute load among multiple instances of a microservice
  4. Self Healing - Do health checks and replace failing instances
  5. Zero Downtime Deployments - Release new versions without downtime

 

What do we think about when we develop an application?

Where to deploy? What kind of server? What OS?

How do we take care of scaling and availability of the application?

What if you don't need to worry about servers and focus on your code?

Enter Serverless

Remember: Serverless does NOT mean "No Servers"

Serverless for me:


You don't worry about infrastructure (ZERO visibility into infrastructure)

Flexible scaling and automated high availability

Most Important: Pay for use

Ideally ZERO REQUESTS => ZERO COST

You focus on code and the cloud managed service takes care of all that is needed to scale your code to serve millions of requests!

And you pay for requests and NOT servers!

Centrally hosted sohware (mostly on the cloud)

Offered on a subscription basis (pay-as-you-go)

Examples:

Email, calendaring & office tools (such as Outlook 365, Microsoft Office 365, Gmail, Google Docs)

Cloud provider is responsible for:

  1. OS (incl. upgrades and patches)
  2. Application Runtime
  3. Auto scaling, Availability & Load balancing etc.. Application code and/or
  4. Application Configuration (How much memory? How many instances? ..)
  5. Customer is responsible for:
  6. Configuring the software!
  7. And the content (example: docs, sheets etc)

 

Security in cloud is a Shared Responsibility:

Between GCP and the Customer

GCP provides features to make security easy:

  1. Encryption at rest by default
  2. IAM
  3. KMS etc

Customer responsibilities vary with the model:

  1. SaaS: Content + Access Policies + Usage
  2. PaaS: SaaS + Deployment + Web Application Security
  3. IaaS: PaaS + Operations + Network Security + Guest OS

Google Cloud is always responsible for Hardware, Network, Audit Logging etc.

 

Platform using open and familiar languages and tools

Cloud Functions Build event driven applications using simple, single- purpose functions

Cloud Run Develop and deploy highly scalable containerized

applications.

Does NOT need a cluster!

 


Managed Compute Service in GCP

  1. Simplest way to deploy and scale your applications in GCP
  2. Provides end-to-end application management

Supports:

  1. Go, Java, .NET, Node.js, PHP, Python, Ruby using pre-configured runtimes
  2. Use custom run-time and write code in any language
  3. Connect to variety of Google Cloud storage products (Cloud SQL etc)
  4. No usage charges - Pay for resources provisioned

Features:

  1. Automatic load balancing & Auto scaling
  2. Managed platform updates & Application health monitoring Application versioning
  3. Traffic splitting

 

Compute Engine is IAAS

MORE Flexibility MORE Responsibility

Choosing Image

Installing Software Choosing Hardware

Fine grained Access/Permissions (Certificates/Firewalls) Availability etc

App Engine is PaaS and Serverless

LESSER Responsibility LOWER Flexibility

 

Standard: Applications run in language specific sandboxes

  1. V1: Java, Python, PHP, Go (OLD Versions)
  2. V2: Java, Python, PHP, Node.js, Ruby, Go (NEWER Versions) Complete isolation from OS/Disk
  3. Supports scale down to Zero instances

Flexible - Application instances run within Docker containers

  1. Makes use of Compute Engine virtual machines
  2. Support ANY runtime (with built-in support for Python, Java, Node.js, Go, Ruby, PHP, or .NET)
  3. CANNOT scale down to Zero instances


Managed Kubernetes service

  1. Minimize operations with auto-repair (repair failed nodes) and auto-upgrade (use latest version of K8S always) features
  2. Provides Pod and Cluster Autoscaling
  3. Enable Cloud Logging and Cloud Monitoring with simple configuration
  4. Uses Container-Optimized OS, a hardened OS built by Google Provides support for Persistent disks and Local SSD

 

Let's Have Some Fun: Let's get on a journey with Kubernetes:

Let's create a cluster, deploy a microservice and play with it in 13 steps!

  1. Create a Kubernetes cluster with the default node pool .Gcloud container clusters create or use cloud console.
  2. Login to Cloud Shell
  3. Connect to the Kubernetes Cluster .Gcloud container clusters get-credentials my-cluster --zone us-central1-a --project solid-course-258105
  4. Deploy Microservice to Kubernetes


Create deployment & service using kubectl commands :

  1. kubectl create deployment hello-world-rest-api --image=in28min/hello-world-rest-api:0.0.1.RELEASE
  2. kubectl expose deployment hello-world-rest-api --type=LoadBalancer --port=8080

Increase number of instances of your microservice:

  1. kubectl scale deployment hello-world-rest-api --replicas=2

Increase number of nodes in your Kubernetes cluster:

  1. gcloud container clusters resize my-cluster --node-pool my-node-pool --num-nodes 5

You are NOT happy about manually increasing number of instances and nodes!

 Setup auto scaling for your microservice:

  1. kubectl autoscale deployment hello-world-rest-api --max=10 --cpu-percent=70

Also called horizontal pod autoscaling - HPA - kubectl get hpa

Setup auto scaling for your Kubernetes Cluster

  1. gcloud container clusters update cluster-name --enable-autoscaling --min-nodes=1 -- max-nodes=10

Delete the Microservice

  1. Delete service - kubectl delete service
  2. Delete deployment - kubectl delete deployment

Delete the Cluster

  1. gcloud container clusters delete

 

Cloud Functions

Imagine you want to execute some code when an event happens?

  1. A file is uploaded in Cloud Storage
  2. An error log is written to Cloud Logging A message arrives to Cloud Pub/Sub
  3. Enter Cloud Functions
  4. Run code in response to events
  5. Write your business logic in Node.js, Python, Go, Java, .NET, and Ruby
  6. Don't worry about servers or scaling or availability (only worry about your code)
  7. Pay only for what you use
  8. Number of invocations
  9. Compute Time of the invocations Amount of memory and CPU provisioned
  10. Time Bound - Default 1 min and MAX 60 minutes(3600 seconds)
  11. Each execution runs in a separate instance
  12. No direct sharing between invocations

 

Cloud Run - "Container to Production in Seconds"

  1. Built on top of an open standard - Knative
  2. Fully managed serverless platform for containerized applications
  3. ZERO infrastructure management
  4. Pay-per-use (For used CPU, Memory, Requests and Networking)
  5. Fully integrated end-to-end developer experience:
  6. No limitations in languages, binaries and dependencies
  7. Easily portable because of container based architecture
  8. Cloud Code, Cloud Build, Cloud Monitoring & Cloud Logging Integrations
  9. Anthos - Run Kubernetes clusters anywhere
  10. Cloud, Multi Cloud and On-Premise
  11. Cloud Run for Anthos: Deploy your workloads to Anthos clusters running on-premises or on Google Cloud
  12. Leverage your existing Kubernetes investment to quickly run serverless workloads


How can you centrally manage multi-cloud and on-premise Kubernetes clusters ?

Anthos

Storage

What is the type of storage of your hard disk?

Block Storage

You've created a file share to share a set of files with your colleagues in a enterprise. What type of storage are you using?

File Storage

Use case: Harddisks attached to your computers

Typically, ONE Block Storage device can be connected to ONE virtual server

(EXCEPTIONS) You can attach read only block devices

with multiple virtual servers and certain cloud providers are exploring multi-writer disks as well!

HOWEVER, you can connect multiple different block storage devices to one virtual server Used as:

Direct-attached storage (DAS) - Similar to a hard disk

Storage Area Network (SAN) - High-speed network connecting a pool of storage devices

Used by Databases - Oracle and Microsoft SQL Server

 

Media workflows need huge shared storage for supporting processes like video editing

Enterprise users need a quick way to share files in a secure and organized way

These file shares are shared by several virtual servers

Block Storage:

Persistent Disks: Network Block Storage

Zonal: Data replicated in one zone

Regional: Data replicated in multiple zone

Local SSDs: Local Block Storage

File Storage:

Filestore

  1. High performance file storage
  2. Most popular, very flexible & inexpensive storage service
  3. Serverless: Autoscaling and infinite scale

Store large objects using a key-value approach:

  1. Treats entire object as a unit (Partial updates not allowed)
  2. Recommended when you operate on entire object most of the time

Access Control at Object level

Also called Object Storage

  1. Provides REST API to access and modify objects
  2. Also provides CLI (gsutil) & Client Libraries (C++, C#, Java, Node.js, PHP, Python & Ruby)
  3. Store all file types - text, binary, backup & archives:
  4. Media files and archives, Application packages and logs
  5. Backups of your databases or storage devices
  6. Staging data during on-premise to cloud database migration

 

  • Objects are stored in buckets
  • Bucket names are globally unique
  • Bucket names are used as part of object URLs => Can contain ONLY lower case letters, numbers, hyphens, underscores and periods.
  • 3-63 characters max. Can't start with goog prefix or should not contain
  • google (even misspelled)
  • Unlimited objects in a bucket
  • Each bucket is associated with a project
  • Each object is identified by a unique key
  • Key is unique in a bucket
  • Max object size is 5 TB
  • BUT you can store unlimited number of such objects
  •  
  • Different kinds of data can be stored in Cloud Storage
  • Media files and archives
  • Application packages and logs
  • Backups of your databases or storage devices Long term archives
  • Huge variations in access patterns
  • Can I pay a cheaper price for objects I access less frequently?
  • Storage classes help to optimize your costs based on your access needs
  • Designed for durability of 99.999999999%(11 9’s)

 

Storage duration

  • storage region, 99.9% in regions
  • High durability (99.999999999% annual durability) Low latency (first byte typically in tens of milliseconds) Unlimited storage
  • Autoscaling (No configuration needed)
  • NO minimum object size
  • Same APIs across storage classes
  • Committed SLA is 99.95% for multi region and 99.9% for single region for Standard, Nearline and Coldline storage classes
  • No committed SLA for Archive storage
  •  
  • Files are frequently accessed when they are created
  • Generally usage reduces with time
  • How do you save costs by moving files automatically between storage classes?
  • Solution: Object Lifecycle Management
  • Identify objects using conditions based on:
  • Age, CreatedBefore, IsLive, MatchesStorageClass, NumberOfNewerVersions etc
  • Set multiple conditions: all conditions must be satisfied for action to happen
  • Two kinds of actions:
  • SetStorageClass actions (change from one storage class to another)
  • Deletion actions (delete objects)
  • Allowed Transitions:
  • (Standard or Multi-Regional or Regional) to (Nearline or Coldline or Archive)
  • Nearline to (Coldline or Archive) Coldline to Archive

 

{

"lifecycle": {

"rule": [

{

"action": {"type": "Delete"}, "condition": {

"age": 30, "isLive": true

}

},

{

"action": {

"type": "SetStorageClass", "storageClass": "NEARLINE"

},

"condition": {

"age": 365,

"matchesStorageClass": ["STANDARD"]

}

}

]

}

}

 

Most popular data destination is Google Cloud Storage Options:

Online Transfer: 

  1. Use gsutil or API to transfer data to Google Cloud Storage
  2. Good for one time transfers

Storage Transfer Service

  1. Recommended for large-scale (petabytes) online data transfers from your private data centers, AWS, Azure, and Google Cloud
  2. You can set up a repeating schedule
  3. Supports incremental transfer (only transfer changed objects)
  4. Reliable and fault tolerant - continues from where it left off in case of errors

Storage Transfer Service vs gsutil:

gsutil is recommended only when you are transferring less than 1 TB from on-premises or another GCS bucket

Storage Transfer Service is recommended if either of the conditions is met:

  1. Transferring more than 1 TB from anywhere
  2. Transferring from another cloud

Transfer Appliance: Physical transfer using an appliance

Copy, ship and upload data to GCS

Recommended if your data size is

  1. greater than 20TB
  2. OR online transfer takes > 1 week

Process:

  • Request an appliance
  • Upload your data
  • Ship the appliance back Google uploads the data
  • Fast copy (upto 40Gbps)
  • AES 256 encryption - Customer- managed encryption keys
  • Order multiple devices(TA40, TA300) if need

Database Fundamentals

There are several categories of databases:

Relational (OLTP and OLAP), Document, Key Value, Graph, In Memory among others

Choosing type of database for your use case is not easy. A few factors:

Do you want a fixed schema?

Do you want flexibility in defining and changing your schema? (schemaless)

What level of transaction properties do you need? (atomicity and consistency) What kind of latency do you want? (seconds, milliseconds or microseconds)

How many transactions do you expect? (hundreds or thousands or millions of transactions per second)

How much data will be stored? (MBs or GBs or TBs or PBs) and a lot more...

 

This was the only option until a decade back!

Most popular (or unpopular) type of databases

Predefined schema with tables and relationships

Very strong transactional capabilities Used for

OLTP (Online Transaction Processing) use

cases and

OLAP (Online Analytics Processing) use cases

 

Applications where large number of users make large number of small transactions

small data reads, updates and deletes

Use cases:

Most traditional applications, ERP, CRM, e-commerce, banking applications

Popular databases:

MySQL, Oracle, SQL Server etc

Recommended Google Managed Services:

Cloud SQL : Supports PostgreSQL, MySQL, and SQL Server for regional relational databases (upto a few TBs)

Cloud Spanner: Unlimited scale (multiple PBs) and 99.999% availability for global applications with horizontal scaling

 

Applications allowing users to analyze petabytes of data

Examples : Reporting applications, Data ware houses, Business intelligence applications, Analytics systems

Sample application : Decide insurance premiums analyzing data from last hundred years

Data is consolidated from multiple (transactional) databases

Recommended GCP Managed Service

BigQuery: Petabyte-scale distributed data ware house

 

OLAP and OLTP use similar data structures

BUT very different approach in how data is stored

OLTP databases use row storage

Each table row is stored together

Efficient for processing small transactions

OLAP databases use columnar storage

Each table column is stored together

High compression - store petabytes of data efficiently

Distribute data - one table in multiple cluster nodes

Execute single query across multiple nodes - Complex queries can be executed efficiently

 

New approach (actually NOT so new!) to building your databases

NoSQL = not only SQL

Flexible schema

Structure data the way your application needs it

Let the schema evolve with time

Horizontally scale to petabytes of data with millions of TPS

NOT a 100% accurate generalization but a great starting point:

Typical NoSQL databases trade-off "Strong consistency and SQL features" to achieve "scalability and high-performance"

Google Managed Services:

  1. Cloud Firestore (Datastore)
  2. Cloud BigTable

 

Cloud Datastore - Managed serverless NoSQL document database

Provides ACID transactions, SQL-like queries, indexes

Designed for transactional mobile and web applications

Firestore (next version of Datastore) adds:

Strong consistency

Mobile and Web client libraries

Recommended for small to medium databases (0 to a few Terabytes)

Cloud BigTable - Managed, scalable NoSQL wide column database

NOT serverless (You need to create instances)

Recommend for data size > 10 Terabytes to several Petabytes Recommended for large analytical and operational workloads:

NOT recommended for transactional workloads (Does NOT support multi row transactions -

supports ONLY Single-row transactions)

 

Retrieving data from memory is much faster than retrieving data from disk

In-memory databases like Redis deliver microsecond latency by storing persistent data in memory

Recommended GCP Managed Service

Memory Store

Use cases : Caching, session management, gaming leader boards, geospatial applications

 

Databases/caches

A start up with quickly evolving schema (table structure) Cloud

Datastore/Firestore

Non relational db with less storage (10 GB) Cloud Datastore

Transactional global database with predefined schema needing to process million of transactions per second CloudSpanner

Transactional local database processing thousands of transactions per second Cloud SQL

Cache data (from database) for a web application : MemoryStore

Database for analytics processing of petabytes of data: BigQuery

Database for storing huge volumes stream data from IOT devices: BigTable

Database for storing huge streams of time series data : BigTable

IAM

 You have resources in the cloud (examples - a virtual server, a database etc)

You have identities (human and non-human) that need to access those resources and perform actions

For example: launch (stop, start or terminate) a virtual server

How do you identify users in the cloud?

How do you configure resources they can access?

How can you configure what actions to allow?

In GCP: Identity and Access Management (Cloud IAM) provides this service

 

Authentication (is it the right user?) and Authorization (do they have the right access?) Identities can be

A GCP User (Google Account or Externally Authenticated User)

A Group of GCP Users

An Application running in GCP

An Application running in your data center Unauthenticated users

Provides very granular control

Limit a single user:

to perform single action

on a specific cloud resource from a specific IP address during a specific time window

 

I want to provide access to manage a specific cloud storage bucket to a colleague of mine:

Important Generic Concepts:

Member: My colleague

Resource: Specific cloud storage bucket

Action: Upload/Delete Objects

In Google Cloud IAM:

Roles: A set of permissions (to perform specific actions on specific resources)

Roles do NOT know about members. It is all about permissions!

How do you assign permissions to a member?

Policy: You assign (or bind) a role to a member

1: Choose a Role with right permissions (Ex: Storage Object Admin)

2: Create Policy binding member (your friend) with role (permissions) IAM in AWS is very different from GCP (Forget AWS IAM & Start FRESH!)

Example: Role in AWS is NOT the same as Role in GCP

 

Member : Who?

Roles : Permissions (What Actions? What Resources?)

Policy : Assign Permissions to Members

Map Roles (What?) , Members (Who?) and Conditions (Which Resources?, When?, From Where?)

Remember: Permissions are NOT directly assigned to Member

Permissions are represented by a Role

Member gets permissions through Role!

A Role can have multiple permissions

You can assign multiple roles to a Member

 

Roles are assigned to users through IAM Policy documents Represented by a policy object

Policy object has list of bindings

A binding, binds a role to list of members

Member type is identified by prefix:

Example: user, serviceaccount, group or domain

 {

"bindings": [

{

"role": "roles/storage.objectAdmin", "members": [

"user:you@in28minutes.com", "serviceAccount:myAppName@appspot.gserviceaccount.com", "group:administrators@in28minutes.com", "domain:google.com"

]

},

{

"role": "roles/storage.objectViewer", "members": [

"user:you@in28minutes.com"

],

"condition": {

"title": "Limited time access", "description": "Only upto Feb 2022",

"expression": "request.time < timestamp('2022-02-01T00:00:00.000Z')",

}

}

]

}

 

Scenario: An Application on a VM needs access to cloud storage

You DONT want to use personal credentials to allow access

(RECOMMENDED) Use Service Accounts

Identified by an email address (Ex: id-compute@developer.gserviceaccount.com)

Does NOT have password

Has a private/public RSA key-pairs

Can't login via browsers or cookies


Service account types:

Default service account - Automatically created when some services are used

(NOT RECOMMENDED) Has Editor role by default

User Managed - User created

(RECOMMENDED) Provides fine grained access control

Google-managed service accounts - Created and managed by Google

Used by GCP to perform operations on user's behalf

In general, we DO NOT need to worry about them


 

Post a Comment

0 Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.