Process Data from Dirty to Clean - Module 1 challenge

Process Data from Dirty to Clean - Module 1 challenge

  1. Fill in the blank: If a test is statistically significant, the results are less likely to be due to _____ and more likely to be due to a real difference between the groups being compared.

    • causation

    • bias

    • insufficient data

    • random chance

  2. A company has 1,000 stores around the country. Each store tracks its inventory in live time using its point-of-sale system. Every night, each store's inventory data is sent to a central database. When customers go to the company's website, they can check if their local store has a particular item in stock, which the website does by querying the central database. Unfortunately, customers are coming to stores and finding that the item they want is sold out, even though the website said it was available. What data integrity problem does this scenario describe?

    • Gathering

    • Transfer

    • Manipulation

    • Replication

  3. In a survey about a new smartphone app, 65% of respondents report they would recommend the app to others. The margin of error for the survey is 3%. Based on that margin of error, what range reflects the population's true response?

    • 60-63%

    • 68-71%

    • 65-68%

    • 62-68%

  4. A car dealer conducts a survey to understand why customers choose their dealership. They are eager for positive feedback, so they email the survey to only those customers who purchased two or more vehicles from the dealership in the past five years. What is likely to result?

    • Random sampling

    • Unbiased sampling

    • Geographically limited sampling

    • Sampling bias

  5. Fill in the blank: Hypothesis testing is a process to determine whether a survey or _____ has meaningful results.

    • data source

    • action item

    • experiment

    • sample

  6. A data professional in the logistics industry wants to calculate the margin of error for a study about transportation route efficiency. They know the sample size and confidence level. What must they also know in order to accurately calculate margin of error?

    • Correlation

    • Population size

    • Distribution

    • Testing methodology

  7. A junior data analyst copies data from one computer to another over their company network. The network connection goes down during the process, which results in an incomplete copy of the data. What data integrity problem does this scenario describe?

    • Replication

    • Cleaning

    • Transfer

    • Manipulation

  8. Which of the following statements accurately describe sample size, population, and confidence level? Select all that apply.

    • Random sampling involves selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen.

    • Having an 80% confidence level is ideal, but most industries hope for 70-75%.

    • Confidence level is the probability that a sample accurately reflects the greater population.

    • When data professionals use sample size, they are using a part of a population that is representative of the population.