Data Cleaning in MS Excel and SQL Server

Manjari Ganapathy
Geek Culture
Published in
4 min readMay 25, 2021

--

A comparison between MS Excel and SQL for data cleaning

Objective

The main objective of this article is to walk through a few ways to obtain clean data using MS Excel and SQL

Introduction

Data cleaning is the process of ensuring data is correct, consistent and usable. Data cleaning is really important because it improves data quality and there by increases the overall productivity. Clean data helps in accurate decision making, boost results and revenue, save money , time and minimize compliance risks. This articles presents few ways which helps in cleaning the data using MS Excel and SQL.

Null

Objective : Replace Null values with 0

EXCEL :

Image By Author

SQL
SELECT ISNULL(column_name, 0 ) FROM table1
OR
SELECT COALESCE(column_name, 0 ) FROM table1

Spelling

Objective : Replace Y to “Yes”, “N” to “No” and correct other spelling mistakes

In Excel, this can be achieved in the same way using Find and Replace method . Find Y and replace all with Yes in the required column

SQL : 
SELECT CASE WHEN IsAvailable='Y' THEN 'Yes'
WHEN IsAvailble='N' THEN 'No'
ELSE IsAvailable END
FROM IceCream

Fill

Objective : Find the first name and last name from Email

To extract the first name and last name from email ID in Excel, we use a shortcut for Autofill Ctrl+E . Select each column and do ctrl +E to autofill first name and last name . This can be also achieved using the FlashFill option in home tab from Fill which helps in filling these fields quickly. Excel is intelligent to understand the delimiter which helps in extracting first name and last name easily.

Image By Author
First Name and Last Name from Email. Image By Author

Another alternate way is to select the cells to split. Select Data->Text to Columns-> select delimiter and apply.

Sort and Filter

Objective : Sort the table based on first name in ascending order

Sort First Name in Ascending order.

To filter the employees who have salary greater than average of all employees is done by right click on salary filter and select number filters. You can see many options and choose the required option. We can also add custom filters based on user requirements.

Image By Author
SQL:
#ORDER BY clause is used for sorting in ascending or descending order
SELECT * FROM TABLE
ORDER BY First Name ASC;

Format

Objective :

  • Remove the trailing and leading spaces in product_name
  • Lower case the product_name
  • Change the sale_date to ‘year-month’ format
Source
Excel:
Lowercase formula: =lower(cellrange)
Trim formula: = Trim(cellrange)
Date and Month =LEFT(number of characters) or format the cells
SQL:
SELECT lower(TRIM(product_name))product_name
,LEFT(sale_date,7) sale_date
FROM Sales
GROUP BY product_name,sale_date

Duplicates

Objective : Find the duplicate Email and remove from the table

Source
EXCEL:
Select the full data, go to Data and click on Remove duplicates.
To visualize the duplicate values, we can use conditional formatting and highlight the duplicate cells or row
SQL:
#List of Duplicate Emails
SELECT email
,count(*) cnt_email
FROM PERSON
GROUP BY email
HAVING cnt_email > 1
#To Delete Duplicate Emails
Delete FROM TABLE
WHERE Id NOT IN
(select min(Id)
from Person
group by Email)

Transpose

Objective : Change rows to columns

Ice-cream table

If you need to shifts rows to become columns or vice-versa, select the cells you need to transpose, and copy it. Right click and click on paste special and select transpose. An alternate way to transpose using the transpose function which is using the format {=TRANSPOSE(Range)}

Transpose Ice-cream Table : Image By Author

Summary

We have seen how to clean a raw data by removing duplicates, spelling mistakes, formatting, replacing null values, sorting and filling in MS Excel and SQL.

--

--