__STYLES__
Tools used in this project
Database Normalization

About this project

Introduction

When I work with data, specially with data stored in Excel, I have to deal with not normalized data that require a lot of cleansing and some operations before they can be analyzed.

Even if I work with Excel, I use to clean, standardize and normalize the data to build a data model and analyze them.

If I have the possibility, I prefer to load the data and perform these operations in a relational database such as MySQL.

Project Scope

In this project I have a not normalized table in a MySQL database. I have to shape the data to create a normalized database. To do that I have to :

  • analyze the table to spot some pattern
  • define the new tables
  • populate them with normalized data

First Step: AS-IS Analysis and Normalization

As a first step I perform a "select *" to analyze the table, then some operations to identify the possible normalized table. As the operation shows that for the selected columns I have the same number of distinct values, I can proceed to create a new table in a new schema.

undefinedI do the same to spot the second table, the "Store Table". As there's no more dimension table to create, after the Store Table I create the Inventory Table (Fact Table)

undefinedSecond Step: Filling New Tables

After the creation of the new normalized tables I provide to fill them. As the normalized tables have the same column name of non normalized tables, I can fill them with a simple combination of INSERT INTO and SELECT DISTINCT.

undefinedThe process is the same for all the tables. After the insert I perform:

  • a SELECT * to check the update.
  • a DISTINCT COUNT of film_id, title and description (like in the AS-IS analysis)

Final Step: Indexes and Constraints

As final step, after successful checked the new tables, I update the database with indexes and constraintsundefinedOn Delete / On Update No Action

"No Action" can be considered the default value for these constraints, the one proposed if you add an index / constraint from the GUI. There's others value like CASCADE and SET NULL for more specific usage and maintenance.

Final Considerations

Database normalization optimizes structure for integrity across normal forms, minimizing redundancy. Also it identifies functional dependencies, reducing redundant data and maintaining integrity.

Note: the dataset used for the project was part of the SQL Course by Maven Analytics.

Discussion and feedback(0 comments)
2000 characters remaining