Fill in the blank: A data scientist keeps code for data analysis pipelines in a _____, which enables them to track the evolution of the pipelines over time.
changelog
dashboard
dataset
version control system
You are a data analyst working for an e-commerce company. You have just finished cleaning data from a customer survey whose original objective was to gather feedback on specific usability issues within the mobile app checkout process. While reviewing the cleaned data and preparing for verification, you notice that a large number of respondents complain about slow website loading times on desktop computers, which was not related to the mobile app checkout.
Based on the principles of data verification and taking the "big picture view," what is the most critical first step you should take in this situation?
Remove from the dataset all survey responses that mention desktop loading times.
Immediately begin a separate analysis of the website loading time comments, as this appears to be a significant customer pain point.
Ask a teammate to double-check your cleaning process to ensure no data related to desktop loading times was accidentally duplicated or introduced.
Pause and reassess whether focusing on the desktop loading time comments aligns with the original project goal centered on the mobile app checkout usability.
Which SQL clause will consider a condition and return a value when that condition is met?
CASE column_name = 'condition' THEN 'value' END
WHEN column_name = 'condition' CASE 'value' END
WHEN
CASE
What is the process of tracking changes, additions, deletions, and errors during data cleaning?
Documentation
Observation
Cataloging
Recording
During verification, you notice an error in a dataset. You remember fixing a similar error when previously cleaning the data. What tool can you reference to find documentation about how to fix the error?
Notepad
Changelog
Data table
Text editor
A junior data analyst notices an error in a product ID number in their dataset. Using a pivot table in Google Sheets, what function will let them know how many times this error occurs within the dataset?
CONCAT
CHECK
COUNTA
CASE
You are a data analyst responsible for cleaning sales data entered manually by the sales team into a shared spreadsheet. Month after month, you notice and document in your cleaning report that a significant number of entries for
Product_ID
contain typos (e.g., transposing digits, usingO
instead of0
, etc.). These errors require considerable time to identify and correct during the cleaning process before the data can be used for reporting.During the feedback phase of the data cleaning process, what would be the most effective recommendation to propose to stakeholders or the process owner?
To implement a change in the data entry process, such as using a dropdown menu for
Product_ID
in the spreadsheet or adding a validation rule, to prevent these typos at the source.To allocate more time for data cleaning each month specifically to handle the
Product_ID
typos.To build an automated script that runs after data entry each month to specifically find and fix the common
Product_ID
typos before analysis begins.To create a comprehensive guide documenting all known Product_ID typos and distribute it to the sales team, asking them to be more careful.
Fill in the blank: To update a client's last name in their spreadsheet, a data professional uses _____ to search for any instance of “Reynolds” and change it to “Mehta.”
formatting
TRIM
find and replace
Remove duplicates