Basis of cleaning data
- Park Daniel
- Oct 1, 2020
- 1 min read
Updated: Oct 16, 2020
When using data, most people agree that insights and analysis are only as good as what you are using. Essentially, garbage data in is garbage analysis out. This is where data cleaning comes in to place where it is known as one of the most important steps for fixing or removing incorrect, corrupted, duplicate, or incomplete data within dataset.

Does what you're looking at make sense?
Before programmers start cleaning data through code, they ask themselves intuitive questions so that they can make the most out of their time. A common question that programmers might ask is "Does what you're looking at make sense?" This question is the basis towards beginning the process of the data cleaning as we have to identify the problem before beginning to fix it. After that, you can come up with follow up questions that can help you towards getting the best data.
Does the data match the column label?

Although the chart above might seem hard to read, there is one row that you should focus on and that is the first row. After you get your data from Kaggle, often times you can open the data through excel and when I opened it, I saw the picture above. However, what caught my eyes were the headers/label where I went through each column to see if the data matches the column label. I was glad to see that all of the labels matched the data allowing me to move on to the next step without any difficulties.
Comentarios