Process Data from Dirty to Clean - Module 3 challenge

Process Data from Dirty to Clean - Module 3 challenge

  1. A data professional analyzes medical data for a health insurance company. The dataset they are working with contains millions of rows of data. What tool would be most efficient for the analyst to use?

    • SQL

    • CSV

    • Word processor

    • Spreadsheet

  2. A junior data professional notices their Boolean column is incorrectly storing True/False values as strings. What SQL function can the analyst use to convert the data type from a string to Boolean?

    • LENGTH

    • CAST

    • SUBSTR

    • TRIM

  3. Fill in the blank: A data team collaborating with the HR department uses the SQL command _____ to add a row for a new employee to their organization’s database.

    • DROP TABLE IF EXISTS

    • CREATE TABLE IF NOT EXISTS

    • UPDATE

    • INSERT INTO

  4. You are working with a database table that has columns about trees, such as tree_species. Which SUBSTR function and AS command will retrieve the first 3 characters of each species name and store the result in a new column called species_ID?

    • SUBSTR(tree_species, 3) AS species_ID

    • SUBSTR AS (tree_species 1, 3) species_ID

    • SUBSTR(tree_species, 1, 3) AS species_ID

    • SUBSTR(tree_species, 1, 3 AS) species_ID

  5. In SQL, what function can be used to remove leading spaces from a piece of data?

    • TRIM

    • FORMAT

    • SUBSTR

    • AVG

  6. While working with a database table that contains the column employee_name, you notice that there are some duplicate entries. Which SQL clause would you use in a query to return the employee_name data without these duplicates?

    • DISTINCT employee_name

    • DROP employee_name

    • DUPLICATE employee_name

    • DELETE employee_name

  7. Fill in the blank: A data analyst uses the SQL command _____ to remove unnecessary tables so they do not clutter their organization’s database.

    • UPDATE

    • INSERT INTO

    • DROP TABLE IF EXISTS

    • CREATE TABLE IF NOT EXISTS

  8. You are using a database table that includes the column credit_card_numbers, and you want to check for any fraudulent activity. Which SQL clause will help you identify any credit card numbers that are more than 16 characters long?

    • COUNT(credit_card_numbers) > 16

    • LENGTH(credit_card_numbers) > 16

    • WHERE(credit_card_numbers) < 16

    • IDENTIFY(credit_card_numbers) < 16

  9. In a table of customer data, you note that some customers have not placed any orders, so the order_value column contains null values. What SQL function can you use to replace these null values with a value in a different column?

    • CAST

    • COALESCE

    • TRIM

    • CONCAT