How Do We Ensure Good Data Quality and Prepare Data?

We have loads of data... but it’s all over the place! This section will guide you on how to organise it.

This is what you will learn

It’s your turn to host dinner for your friends! This time, you’re going all in. You’ve found a solid recipe for baked ziti, gathered all the right ingredients, even snagged some spicy Italian sausages from the local butcher. In the kitchen, you lay out your ingredients, the recipe, and the equipment you need.
But wait. You’re used to the metric system, and the recipe uses imperial units. What does a quarter “pound””, eight “ounces” or 375 “fahrenheit” amount to? A quick online search gives you the conversions: 113 grams, 2.4 deciliters, and 190 degrees celsius.
Working with data-driven processes is a lot like cooking a great meal. To achieve your goal (the dinner), you need a good plan (recipe), reliable data sources, relevant data (ingredients), all in understandable and accurate formats (conversion from “pounds” and “cups” to grams and deciliters).
Everything must be prepped and processed ahead of time, and you must be clear on all the steps, before everything comes together in the end result. Preparation is key: If the rosemary isn’t prepped and you forget it in the heat of the moment, the entire meal could be ruined!
In this section, we’ll look closely at how we organise, tidy up and prepare data for analysis and use. Where the previous chapter was about finding the right “ingredients”, this part will be about making sense of them.
But first, let’s understand data quality - and how to ensure our data is reliable.