Create a Secure Data Lake on Cloud Storage: Challenge Lab - ARC119

Create a Secure Data Lake on Cloud Storage: Challenge Lab - ARC119

Overview

In this challenge lab, you’re given a scenario and a set of tasks. Instead of following step-by-step instructions, you use the skills learned from the lab in the quest to figure out how to complete the tasks on your own! An automated scoring system (shown on this page) provides feedback on whether you have completed your tasks correctly.

When taking a challenge lab, you won't receive instruction on new Google Cloud concepts. You are expected to extend your learned skills, like changing default values and reading and researching error messages to fix your own mistakes.

To score 100% you must successfully complete all tasks within the time period!

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).

Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.

  • Time to complete the lab---remember, once you start, you cannot pause a lab.

Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

Note: Once the lab environment has been fully provisioned, the tasks will become visible. The tasks that are assigned to you are independent, so you are free to perform the tasks in any order you want.

Challenge scenario

You are just starting your junior data engineer role. So far you have been helping teams understand and assign required permissions to users, and create a secure data lake on Cloud Storage.

You are expected to have the skills and knowledge for these tasks.

Your challenge

You are asked to help a newly formed development team with some of their initial work on creating a secure data lake. You have been asked to create a secure data lake on Cloud Storage and Big Query Dataset you receive the following request to complete the following tasks:

For each time when you start the lab, you get different tasks and you need to perform it accordingly to learn the concept of the data lake.

  • Ensure that any needed APIs (such as Dataplex API) are successfully enabled.

  • Create all resources in the us-east4 and us-east4-c, unless otherwise directed.

Each task is described in detail below, good luck!

Task 1: Create a Cloud Storage bucket

  1. Sign into the project as User 1

  2. Create a regional Cloud Storage bucket using the following bucket name:[PROJECT-ID]-bucket and replace the PROJECT_ID in the bucket name with the project ID provided at the left side of the lab instructions.

  3. Use the same bucket you have created in the above step to attach as an asset to the zone

Click Check my progress to verify the objective.

Verify Task

Check my progress

Task 2: Create a lake in Dataplex and add a zone to your lake

  1. Sign into the project as User 2

  2. Create a lake in Dataplex using the below information:

PropertyValue
Display NameCustomer-Lake
IDLeave the default value.
RegionRegion from the Lab Details panel which is located at the left side of the lab instructions
  1. Add a zone to your lake:Customer-Lake. Use the information below:
Display NameValue
Display NamePublic-Zone
IDLeave the default value.
TypeRaw zone
Data locationsRegional
Discovery settingsEnable metadata discovery
  1. For the Lables set key_1 as: domain_type and value_1 as: source_data

Click Check my progress to verify the objective.

Verify Task

Check my progress

Task 3: Environment Creation for Dataplex Lake

  1. Navigate to the lake:Customer-Lake and create an environment for the lake

  2. Use the below configuration for the environment creation

PropertyValue
Display NameDataplex-lake-env
Configure ComputeNumber of nodes= 3
Enable auto shut-down
Software PackageLeave the default value

This will create the environment which we would be using while exploring the data.

Note: Based on the volume of data being analyzed and the complexity of operations while analyzing the data, we can configure the environment by increasing/decreasing the number of nodes, space on disk, Auto Scaling etc.

Click Check my progress to verify the objective.

Verify Task

Check my progress

Task 4: Create a tag template

To begin tagging data, an asset must be added to the zone . A tag template is a reusable structure that can be used to swiftly create new tags. You are required to structure the tags by topic using tag templates. Therefore, in this task you must create a tag template and attach a tag template to an asset and related fields as detailed in the following table.

Tag template nameTag template IDLocationFieldsType
Customer Data Tag Templatecustomer_data_tag_templateUse the default regionData OwnerString
PII DataEnumuratedValue 1: YesValue 2: No

Therefore, attach a tag to theStorage bucketData Catalog entry under the CLOUD STORAGE source system. Use theCustomer Data Tag Templatetemplate to tag this entry, and provide the values for the tag fields provided in the following table.

Tag fieldValue
Data OwnerEnter your name here
PII DataYes

Search for assets using a Tag Template: Customer Data Tag Template

Click Check my progress to verify the objective.


Solution of Lab


curl -LO raw.githubusercontent.com/ArcadeCrew/Google-Cloud-Labs/refs/heads/main/%5BForm%204%5D%20Secure%20Data%20Lake%20on%20Cloud%20Storage%20-%20Challenge%20Lab/arcadecrew.sh
sudo chmod +x arcadecrew.sh
./arcadecrew.sh

Solution alternative

export ZONE=

curl -LO raw.githubusercontent.com/Techcps/ARC/master/Create%20a%20Secure%20Data%20Lake%20on%20Cloud%20Storage%3A%20Challenge%20Lab/techcps119.sh
sudo chmod +x techcps119.sh
./techcps119.sh

Invalid form number. Please enter 1, 2, 3, or 4: (Let's find the Form Number: Press Ctrl + G)


🚀Form 1

  • Task 1. Create a Cloud Storage bucket

  • Task 2. Create a lake in Dataplex and add a zone to your lake

  • Task 3. Environment Creation for Dataplex Lake

  • Task 4. Create a tag template


🚀 Form 2

  • Task 1. Create a lake in Dataplex and add a zone to your

  • Task 2. Environment Creation for Dataplex

  • Task 3. Attach an existing Cloud Storage bucket to the zone

  • Task 4. Create a tag template


🚀 Form 3

  • Task 1. Create a BigQuery dataset

  • Task 2. Add a zone to your lake

  • Task 3. Attach an existing BigQuery Dataset to the Lake

  • Task 4. Create a tag template


🚀 Form 4

  • Task 1. Create a lake in Dataplex and add a zone to your lake

  • Task 2. Attach an existing Cloud Storage bucket to the zone

  • Task 3. Attach an existing BigQuery Dataset to the Lake

  • Task 4. Create Entities