Commonly Used GCP Services#
Below are some key Google Cloud Platform (GCP) services that can be used for your project:
Google Cloud Storage (GCS)#
GCS is a scalable and secure object storage for data files, datasets, and ML-ready data.
Why Use GCS?#
- Centralized storage for raw and processed datasets.
- Facilitates data sharing across team members.
- Integration with other GCP services like BigQuery and AI/ML tools.
Setting Up and Using GCS#
Ensure that you have completed How to Use GCP before starting this process.
Use the GCP console, gcloud CLI, or API to create a bucket:
Creating a GCS Bucket via Cloud Console#
- Follow the instructions in this documentation to create buckets.
Creating a GCS Bucket via Terminal#
- In your development environment, run the
gcloud storage buckets create
command: Where:<BUCKET_NAME>
is the name you want to give your bucket, subject to naming requirement. For example,my-bucket
.<BUCKET_LOCATION>
is the location of your bucket. For example,us-east1
.
- If the request is successful, the command returns the following message:
Transferring Data Across GCS#
- From Local to GCS:
- From GCS to Local:
BigQuery#
BigQuery is a powerful SQL-based data warehouse that allows you to process, load, and analyze large datasets efficiently using SQL queries.
Why Use BigQuery?#
- Quickly preprocess and explore large datasets using SQL-like query.
- Simplifies aggregation, feature extraction, and preparation for ML models.
- You can directly load data from a GCS bucket into an SQL-based data warehouse (BigQuery). It supports all types of data — structured, semi-structured, and unstructured; including tsv, csv, parquet, avro, xlsx, and many more.
- To use BigQuery with a client library, please follow this link for detailed guide.
Setting Up and Using BigQuery#
Ensure that you have completed How to Use GCP before starting this process.
Loading Data into BigQuery#
- From GCS:
Querying Data#
- Use BigQuery's web interface or CLI to run SQL queries for data cleaning, feature engineering, and exploratory analysis.
- Example:
Follow the instructions on this page to learn more about BigQuery.
Cloud SQL for MySQL, PostgreSQL, and Microsoft SQL Server#
Google Cloud SQL is a fully-managed relational database service for MySQL, PostgreSQL, and Microsoft SQL Server. It eliminates the need for database maintenance while offering high availability, scalability, and security. Below is a comprehensive guide to using Cloud SQL effectively for your projects.
Why Use Cloud SQL?#
- Managed Service: Automated backups, updates, and maintenance.
- Scalability: Seamless scaling for growing workloads.
- Security: Built-in encryption, IAM-based access, and network security.
- Integration: Works seamlessly with GCP services like Compute Engine, Kubernetes Engine, and BigQuery.
- Flexibility: Supports popular relational databases: MySQL, PostgreSQL, and Microsoft SQL Server.
Setting Up and Using Cloud SQL#
Ensure that you have completed How to Use GCP before starting this process.
Enabling the Cloud SQL API#
Creating a Cloud SQL Instance#
- Example Using MySQL:
- Example using PostgreSQL:
- Parameters:
--cpu
: Number of vCPUs (e.g., 2).--memory
: RAM allocation (e.g., 4GB).--region
: Choose a region (e.g.,us-central1
).
Configuring Users and Databases#
Using the same INSTANCE_NAME
as configured in the previous step:
- Create a Database with the command below:
- Add a User with the command below:
For a detailed guide on using client services, please refer to this link
GCP Virtual Machines (VMs)#
A Cloud VM is a scalable, on-demand virtual machine hosted in the cloud. It functions like a physical computer, providing compute power, memory, storage, and network connectivity. Cloud VMs are versatile and can be used for a variety of tasks, from running applications and hosting websites to managing databases and performing intensive data processing.
In summary, VMs offer flexible compute instances to run custom ML experiments, manage pipelines, or host applications.
Why Use Cloud VMs?#
- Ideal for workloads requiring full control over the environment, OS, and configurations.
- Provides isolated environments for training ML models.
- Supports GPU/TPU acceleration for deep learning tasks.
- Can host containerized ML workflows using Docker.
Setting Up and Using GCP VMs#
Ensure that you have completed How to Use GCP before starting this process.
Creating a GPU-Enabled VM for Model Training#
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator="type=nvidia-tesla-v100,count=1" \
--metadata="install-nvidia-driver=True"
Parameters:
--image-family
must be one of the GPU-specific image types. For more information, see Choosing an Image.--image-project
must bedeeplearning-platform-release
.--maintenance-policy
must beTERMINATE
. For more information, see GPU Restrictions.--accelerator
specifies the GPU type to use. Must be specified in the format--accelerator="type=TYPE,count=COUNT"
. Supported values ofTYPE
are:nvidia-tesla-v100
, (count=1
or8
)nvidia-tesla-p100
, (count=1
,2
, or4
)nvidia-tesla-p4
, (count=1
,2
, or4
)
Google Cloud Artifact Registry#
Google Cloud Artifact Registry is a fully-managed service for storing and managing container images, as well as other software artifacts like Maven, npm, and Python packages. It is designed to integrate seamlessly with GCP, providing enhanced security, authentication, and efficiency over external services like Docker Hub.
Why Use Artifact Registry Instead of Docker Hub?#
Feature | Docker Hub | Artifact Registry |
---|---|---|
Security | Publicly accessible by default. Limited security features unless on paid tiers. | Private by default, with IAM-based fine-grained access control and integration with GCP security features. |
Authentication | Separate login credentials required. | Uses GCP-managed identities (IAM roles and service accounts). |
Network Proximity | External to GCP, introducing latency. | Hosted within GCP, reducing latency and egress costs. |
Cost | Free tier has pull limits. Paid plans for more. | Pay only for what you store and access. |
Integration | Limited GCP integration. | Full integration with GCP services like Cloud Build, Compute Engine, and Kubernetes Engine. |
Setting Up and Using Artifact Registry#
Ensure that you have completed How to Use GCP before starting this process.
Enabling the Artifact Registry API#
Creating an Artifact Repository#
gcloud artifacts repositories create [REPOSITORY_NAME] \
--repository-format=docker \
--location=[REGION] \
--description="Repository for storing Docker images"
Authenticating Docker with Artifact Registry#
Run the following command to configure Docker to authenticate with your Artifact Registry:
Pushing Images to Artifact Registry#
- Tag your Docker image for Artifact Registry:
- Push the image: