__STYLES__

SQL Database Administration: Data exploration, key mapping and database normalization

Tools used in this project
SQL Database Administration: Data exploration, key mapping and database normalization

About this project

Business need:

To explore and become familiar with the 'Maven Fuzzy Factory' database and to suggest improvements, if any.

My goal:

To conduct exploratory data analysis of the 'Maven Fuzzy Factory' database, create an EER diagram of the existing database, then create an improved EER diagram and finally implement the improvements in the actual database.

SQL functions and commands used:

JOINS, AGGREGATE FUNCTIONS, DATE FUNCTION, DDL and DML COMMANDS.

Result:

undefinedThe Process:

Exploratory data analysis to answer the following questions:

  1. Find the money. Which table(s) in the mavenfuzzyfactory database will help you analyze revenue information?

► The orders and order_items tables

  1. Run some quick analysis on the orders table. How many sales has the company made? What are the highest volume primary_product_id values?

undefined► The company has filled 32313 orders and sold 40025 items.

undefined► The highest volume primary_product_id value is 1- The Original Mr. Fuzzy, with 24226 sold, followed by values 2- The Forever Love Bear with 5796 sold, 4- The Hudson River Mini bear with 5018 sold and 3- The Birthday Sugar Panda with 4985 sold.

  1. Explore the order_items table. What is the date range covered? How many unique products do you see in the table? Are prices and cost of goods fixed for the products, or can they vary over time? Find the min and max prices.

undefined► The earliest date in the table is 2012-03-19 and the latest is 2015-03-19. Thus, the difference between the two dates in years is 3.

undefined► There are 4 unique products in the table.

undefined► The prices and cost of goods are fixed and are as follows-

undefinedundefined► The min price is 29.99 and max price is 59.99.

  1. Next, dig into website_sessions. What are the most popular values for utm_source? How about for device_type?

undefined► The most popular utm_source is gsearch with 316035 sessions coming from it, followed by NULL (unpaid traffic or traffic that did not originate from a paid campaign) with 83328 sessions, bsearch with 62823 sessions and finally, socialbook with 10685 sessions.

undefinedDesktop is the more popular device type with 327027 sessions coming from it, followed by mobile with 145844 sessions.

  1. Check out the website_pageviews table. What are the most popular pageview_url values? What is the maximum pageviews visited on one session?

undefined► The top 3 most popular pageview_url values are /products with 261231 pageviews, /the-original-mr-fuzzy with 162525 pageviews and /home with 137576 pageviews.

undefined► The maximum number of pageviews for one session is 7.

Creating an EER diagram of the existing database:

I created the following EER diagram of the existing database using the MySQL Model Editor.

undefined

Changes required to improve the existing database:

  1. As is clear from the EER diagram, there are no relationships between the entities (tables) in the database. Primary and foreign key mappings must be created between the tables.
  2. The database has not been normalized. Multiple tables include columns containing the price and cost of goods of each item being sold (price_usd and cogs_usd columns). These columns can be included in the products table and queried using table joins when necessary.

Creating the new and improved EER diagram:

I created the following improved EER diagram using the MySQL Model Editor.

undefined

Implementing the diagrammed changes in the database:

I implemented the diagrammed changes by creating a new schema to store normalized tables, setting the new schema as the default schema, creating the structures of normalized tables in the new schema, and finally, by populating the normalized tables with values from the non- normalized tables of the old schema.

undefinedundefinedundefinedNext step:

The next step was to analyze sales and website traffic data from the new and improved 'Maven Fuzzy Factory' database using SQL and to present my insights and recommendations to the management in an Excel dashboard. Check out how I did that here.

Discussion and feedback(0 comments)
2000 characters remaining