Discussion questions#

  1. Provide a hypothetical example of each type of anomaly in your data tables (semantic, syntactic, coverage) and show how it can impact the interpretation of the relationship between a response variable (Y) and a set of predictor variables (X).

  2. How can steps that you take in your data cleansing pipeline potentially harm the mapping between the “real world” and your data table? Provide a specific example to illustrate your answer.