# Derive Insights from BigQuery Data: Challenge Lab - GSP787

## Overview

You must complete a series of tasks within the allocated time period. Instead of following step-by-step instructions, you'll be given a scenario and a set of tasks - you figure out how to complete it on your own! An automated scoring system (shown on this page) will provide feedback on whether you have completed your tasks correctly.

To score 100% you must complete all tasks within the time period!

When you take a Challenge Lab, you will not be taught Google Cloud concepts. To build the solution to the challenge presented, use skills learned from the labs in the course this challenge lab is part of. You will be expected to extend your learned skills; you will be expected to change broken queries.

This lab is recommended for students who have enrolled in the [Derive Insights from BigQuery Data](https://www.cloudskillsboost.google/course_templates/623) skill badge. Are you ready for the challenge?

## Setup and requirements

### Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click **Start Lab**, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

* Access to a standard internet browser (Chrome browser recommended).
    

**Note:** Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.

* Time to complete the lab—remember, once you start, you cannot pause a lab.
    

**Note:** Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

## Scenario

You're part of a public health organization which is tasked with identifying answers to queries related to the Covid-19 pandemic. Obtaining the right answers will help the organization in planning and focusing healthcare efforts and awareness programs appropriately.

The dataset and table that will be used for this analysis will be : `bigquery-public-data.covid19_open_data.covid19_open_data`. This repository contains country-level datasets of daily time-series data related to COVID-19 globally. It includes data relating to demographics, economy, epidemiology, geography, health, hospitalizations, mobility, government response, and weather.

## Task 1. Total confirmed cases

* Build a query that will answer "What was the total count of confirmed cases on `May 15, 2020`?" The query needs to return a single row containing the sum of confirmed cases across all countries. The name of the column should be **total\_cases\_worldwide**.
    

Columns to reference:

* cumulative\_confirmed
    
* date
    

Click *Check my progress* to verify the objective.

Total Confirmed Cases

## Task 2. Worst affected areas

* Build a query for answering "How many states in the US had more than `100` deaths on `May 15, 2020`?" The query needs to list the output in the field **count\_of\_states**.
    

**Note:** Don't include NULL values.

Columns to reference:

* country\_name
    
* subregion1\_name (for state information)
    
* cumulative\_deceased
    

Click *Check my progress* to verify the objective.

Worst Affected Areas

## Task 3. Identify hotspots

* Build a query that will answer "List all the states in the United States of America that had more than `1500` confirmed cases on `May 15, 2020`?" The query needs to return the State Name and the corresponding confirmed cases arranged in descending order. Name of the fields to return **state** and **total\_confirmed\_cases**.
    

Columns to reference:

* country\_code
    
* subregion1\_name (for state information)
    
* cumulative\_confirmed
    

Click *Check my progress* to verify the objective.

Identifying Hotspots

## Task 4. Fatality ratio

1. Build a query that will answer "What was the case-fatality ratio in Italy for the month of `April` 2020?" Case-fatality ratio here is defined as (total deaths / total confirmed cases) \* 100.
    
2. Write a query to return the ratio for the month of `April` 2020 and contain the following fields in the output: **total\_confirmed\_cases**, **total\_deaths**, **case\_fatality\_ratio**.
    

Columns to reference:

* country\_name
    
* cumulative\_confirmed
    
* cumulative\_deceased
    

Click *Check my progress* to verify the objective.

Fatality Ratio

## Task 5. Identifying specific day

* Build a query that will answer: "On what day did the total number of deaths cross `8000` in Italy?" The query should return the date in the format **yyyy-mm-dd**.
    

Columns to reference:

* country\_name
    
* cumulative\_deceased
    

Click *Check my progress* to verify the objective.

Identifying Hotspots

## Task 6. Finding days with zero net new cases

The following query is written to identify the number of days in India between `22, Feb 2020` and `13, March 2020` when there were zero increases in the number of confirmed cases. However it is not executing properly.

* You need to update the query to complete it and obtain the result:
    

```apache
WITH india_cases_by_date AS (
  SELECT
    date,
    SUM(cumulative_confirmed) AS cases
  FROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`
  WHERE
    country_name="India"
    AND date between '2020-02-22' and '2020-03-13'
  GROUP BY
    date
  ORDER BY
    date ASC
 )

, india_previous_day_comparison AS
(SELECT
  date,
  cases,
  LAG(cases) OVER(ORDER BY date) AS previous_day,
  cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases
FROM india_cases_by_date
)
```


Click *Check my progress* to verify the objective.

Finding days with zero net new cases

## Task 7. Doubling rate

* Using the previous query as a template, write a query to find out the dates on which the confirmed cases increased by more than `5`% compared to the previous day (indicating doubling rate of ~ 7 days) in the US between the dates March 22, 2020 and April 20, 2020. The query needs to return the list of dates, the confirmed cases on that day, the confirmed cases the previous day, and the percentage increase in cases between the days.
    
    * Use the following names for the returned fields: **Date**, **Confirmed\_Cases\_On\_Day**, **Confirmed\_Cases\_Previous\_Day** and **Percentage\_Increase\_In\_Cases**.
        

Click *Check my progress* to verify the objective.

Doubling rate

## Task 8. Recovery rate

1. Build a query to list the recovery rates of countries arranged in descending order (limit to `5`) on the date May 10, 2020.
    
2. Restrict the query to only those countries having more than 50K confirmed cases.
    
    * The query needs to return the following fields: **country**, **recovered\_cases**, **confirmed\_cases**, **recovery\_rate**.
        

Columns to reference:

* country\_name
    
* cumulative\_confirmed
    
* cumulative\_recovered
    

Click *Check my progress* to verify the objective.

Recovery rate

## Task 9. CDGR - Cumulative daily growth rate

* The following query is trying to calculate the CDGR on `May 15, 2020`(Cumulative Daily Growth Rate) for France since the day the first case was reported.The first case was reported on Jan 24, 2020.
    
* The CDGR is calculated as:
    

`((last_day_cases/first_day_cases)^1/days_diff)-1)`

Where :

* `last_day_cases` is the number of confirmed cases on May 10, 2020
    
* `first_day_cases` is the number of confirmed cases on Jan 24, 2020
    
* `days_diff` is the number of days between Jan 24 - May 10, 2020
    
* The query isn’t executing properly. Can you fix the error to make the query execute successfully?
    

```apache
WITH
  france_cases AS (
  SELECT
    date,
    SUM(cumulative_confirmed) AS total_cases
  FROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`
  WHERE
    country_name="France"
    AND date IN ('2020-01-24',
      '2020-05-15')
  GROUP BY
    date
  ORDER BY
    date)
, summary as (
SELECT
  total_cases AS first_day_cases,
  LEAD(total_cases) AS last_day_cases,
  DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diff
FROM
  france_cases
LIMIT 1
)

select first_day_cases, last_day_cases, days_diff, SQRT((last_day_cases/first_day_cases),(1/days_diff))-1 as cdgr
from summary
```


**Note:** Refer to the following [Functions, operators, and conditionals documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators) to learn more about the SQL function referenced `LEAD()`.

Click *Check my progress* to verify the objective.

CDGR - Cumulative Daily Growth Rate

## Task 10. Create a Looker Studio report

* Create a [Looker Studio](https://datastudio.google.com/) report that plots the following for the United States:
    
* Use the BigQuery connector, authorize access, select Custom Query under your project `qwiklabs-gcp-03-e96502303e35`, enter the query, then click Add and Add to report.
    
    * Number of Confirmed Cases
        
    * Number of Deaths
        
    * Date range : `2020-03-25 to 2020-04-24`
        

Click *Check my progress* to verify the objective.

Create a Looker Studio report

**Note:** Use the below image as a reference while building the report and make sure your report is similar to it.

**Note:** Do not use the **Explore with Looker Studio** option from BigQuery.

![line graph](https://cdn.qwiklabs.com/JHIWj6ObPVnrYKeTANbhaWLBtwvMUWOR5a5beYdhYaU%3D align="left")

---

## Solution of Lab

%[https://youtu.be/5dUoAKtfjEI]

```apache
curl -LO raw.githubusercontent.com/ePlus-DEV/storage/refs/heads/main/labs/GSP787/lab.sh
source lab.sh
```

**Script Alternative**

```apache
curl -LO https://raw.githubusercontent.com/Itsabhishek7py/GoogleCloudSkillsboost/refs/heads/main/Derive%20Insights%20from%20BigQuery%20Data%3A%20Challenge%20Lab/abhishek.sh
sudo chmod +x abhishek.sh
./abhishek.sh
```


```sql
SELECT
  DATE(date) AS date
FROM (
  SELECT
    date,
    SUM(cumulative_deceased) AS total_deaths
  FROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`
  WHERE
    country_name = 'Italy'
    AND date >= '2020-01-01'
  GROUP BY
    date
)
WHERE
  total_deaths > 8000
ORDER BY
  date ASC
LIMIT 1;
```
