__STYLES__

USA Power Outages (Jan 2002 - Jun 23)

Tools used in this project
USA Power Outages (Jan 2002 - Jun 23)

USA Power Outages (2002-23)

About this project

1. CHALLENGE OBJECTIVES

Role: Senior Analytics Consultant

Data: US Power failure (Jan 2002 – June 2023), Survey form examples

undefinedSource: US Department of Energy

Task: Data Cleaning and Dashboard/Report

2. DATA UNDERSTANDING

Upon obtaining the dataset, the initial step involved conducting a comprehensive analysis of the sheets. As an analyst, my strategy involves juxtaposing the Data dictionary with the dataset to enhance comprehension of each column. While scrutinizing the sheets, it was essential to gain insights into the data collection process, particularly as it spanned 22 years and involved survey forms. A critical aspect of understanding the data entailed examining the structure of each sub-sheet from 2002 to 2023.

Notably, it was observed that the 22 sheets exhibited variations in the number of columns and rows.

3. SHEET GROUPING

I organized sheets with similar structures to form four distinct groups.

undefined4. DATA CONSOLIDATION

In order to amalgamate these sheets into a single main sheet, the creation of a master dataset was imperative to harmonize and unify all the individual sheets. The consolidation process involved fundamental splitting and text-to-columns operations, along with the addition of specific columns within groups 1 and 2. The established standard for the master dataset was:

undefinedDifferent set of operations were needed for different year groups.

Group 1:

  • Remove the title rows and information rows for the year.
  • Eliminate entirely empty rows.
  • Apply color coding to all lines without content, which consist of single values or words.
  • Combine and attach those words to the cell above.
  • Generate a new column labeled "Alert Criteria."

Group 2:

  • Above steps +
  • Copying this data to master dataset

Group 3: Delete month column

Group 4: Delete month column

5. DATA CLEANING AND MANIPULATION

Next is the intriguing and challenging task of standardizing each column to a consistent and uniform data type.

A. Event Date & Restoration Date

  • Separate the Date-time into distinct date and time columns using the Text-to-columns feature.
  • Convert the entire column to text.
  • Employ the IF(CONTAINS) function to determine if the string contains "/" or "-".
  • Categorize values with "/" as 0 and those with "-" as 1. Substitute "/" with "-".
  • For Group 2, split it into separate columns for day, month, and year (DD MM YYYY).
  • Concatenate these values into the standard DD-MM-YYYY format in a new column.

B. Event Time

FIND AND REPLACE:

  • a.m. / a.m with AM
  • p.m. / p.m with PM
  • noon with PM

REMOVE:

  • “approximately”
  • Extra spaces from few times

SWAP:

  • evening with 6:00 PM
  • Midnight with 3:00 AM

DELETE:

  • N/A
  • Ongoing

C. Restoration Time

SETTING A STANDARD: DD-MM-YYYY hh:mm REPLACING all date values with DD-MM-YYYY 12:00 AM

REPLACE:

  • noon with PM
  • Midnight with AM
  • PM Nov 10 with 10-11-2002 00:00

DELETE:

  • NA
  • Unknown
  • Ongoing
  • Approx

The Time column after this basic cleaning returned 3 different types of format

  • MM/DD/YY, hh:mm a.m.
  • hh:mm a.m. Month DD
  • OTHERS

GROUPING all these times into 5 different groups

= IF(ISNUMBER(SEARCH("-",A1)), IF(ISNUMBER(SEARCH("a.m.",A1)), 2, IF(ISNUMBER(SEARCH("p.m.",A1)), 3, 0)), IF(ISNUMBER(SEARCH("a.m.",A1)), 4, IF(ISNUMBER(SEARCH("p.m.",A1)), 5, 1)))

  • This formula checks for different patterns in the date and time values to determine the format and assigns the corresponding code. Here's how it works:
  • If true, it checks for "a.m." or "p.m." to differentiate between format 2 and format 3.
  • If the first IF statement is false, it means the date format is DD-MM-YYYY hh:mm.
  • The second IF statement checks for "a.m." or "p.m." to differentiate between format 4 and format 5.
  • If both conditions in the second IF statement are false, it means the date format is DD-MM-YYYY hh:mm.
  • This formula assumes that the date and time entries follow the provided patterns consistently. Adjustments may be needed if there are variations in the data. Copy this formula down the column to apply it to all rows in your dataset.

Group 1: DD-MM-YYYY hh:mm

Group 2: MM/DD/YY, hh:mm a.m.

Group 3: MM/DD/YY, hh:mm p.m

.Group 4: hh:mm a.m. Month DD

Group 5: hh:mm p.m. Month DD

The steps to clean these group were:

  • Transform "Text-to-Column (1:00 a.m. April 29)" into "01:00 | a.m. | April | 29."
  • Change the month to MM using the formula =MONTH(DATEVALUE(MONTH&"01"))
  • Extract the year using =YEAR(YEAR)
  • Correct some inaccurate dates (29-02-2006 to 29-04-2006).
  • Combine time and AM/PM as =CONCAT(TEXT(TIME,"hh:mm")," ",AM/PM)
  • Standardize all formats like 05:20 PM to 17:20 using =TEXT(TIME,"hh:mm")
  • Concatenate all columns to a standard format using =CONCAT("DD,"-", "MM,"-",YYYY," ",TIME)
  • Convert the new column to a date and time format.

D. Customers affected

This column had around 40 unique valuesundefinedundefinedundefinedundefinedConvert the column to whole number

E. Megawatt loss

This particular column comprised approximately 35 distinct values.

undefinedundefinedundefinedF. Alert Criteria

The column consisted repeated string patterns like

Employing basic TRIM, SORT, and UNIQUE functions yielded a set of unique values present in the column. The resulting unique list was structured as follows:

undefinedThe most straightforward method for organizing such strings involves utilizing VLOOKUP. In this dataset, I segmented the Alert Criteria details into 11 primary alert criteria for ease of grouping.

undefinedG. Event Type

Like alert criteria, Basic TRIM, SORT and UNIQUE functions returns a unique set of values used in the column.

undefinedThis column set was divided into 6 main Events and 25 Sub-Events

![](file:///C:/Users/pc/AppData/Local/Temp/msohtmlclip1/01/clip_image025.png)undefinedH. NERC Region

The initial task involved gaining insight into the NERC region by consulting the official website (https://www.nerc.com/AboutNERC/keyplayers/Pages/default.aspx), where a color-formatted representation of the NERC region, organized by state, was available

undefinedThe allotment of the regions to states needed an understanding of the how state boundaries were drawn using a state map of USA

undefinedUtilizing the aforementioned two maps, I created an Excel sheet in which each state was assigned to a specific NERC region. This allocation was based on the maximum area covered by each state within the respective NERC region, irrespective of state boundaries. For instance, although certain areas of Wyoming fall under WECC and others under MRO, the predominant coverage is under WECC, leading to the assignment of Wyoming to the WECC region.

To ensure the accuracy of my assumptions regarding the NERC regions for each state in the subsequent analysis, I generated a dummy Tableau map. The following map illustrates the appearance of the dummy:

undefined

The State-NERC Region was created on a back sheet.

Cleaning of the column “Areas affected

  • Assign a unique Record_ID to each record.
  • Employ the TEXT to COLUMN function to divide areas based on ":".
  • Implement an IF condition to categorize single states as 0 and multiple states as 1.
  • Utilize the IF(CONTAINS) function to group records with 0, extracting only the state names.
  • Manually identify the primary affected state for each row based on the count of affected counties, determined using the COUNTIF function.
  • Validate each Record_ID for consistency between the NERC region in the original data and the extracted data.
  • Use an IF function to assign 0 or 1 based on whether the values match or not.

For every 1 value, selecting again with respect to the counties

6. COLUMN ADDITION

New columns have been introduced to the comprehensive dataset, encompassing:

  • Major states impacted
  • Duration of restoration
  • Columns indicating errors
  • State abbreviations

7. DATA VALIDATION

  • In order to verify the dataset, introduced a new column named "Difference," indicating the time gap in hours between Restoration time and Event time. The variations in these time differences were within the standard range.
  • Subsequently, a random sample of 50 dates was selected from the original dataset to check with clean data

The final dataset looked like this:

undefined8. CONCEPTUALIZATION OF DASHBOARD

The initial concept for the theme aimed to align with elements associated with electricity and darkness. Among various choices such as Blue/White, Yellow/Black, and Green/Grey, I ultimately decided to adopt the BULB-DARKNESS theme.

9. EXPLORATORY DATA ANALYSIS

The initial phase involved the identification of dimensions and measures, followed by the creation of various combinations using these elements. I generated approximately 35 sheets featuring diverse combinations of the dataset and chart styles, encompassing Bar, Line, Text, Heatmap, Table, Scatter Plot, Treemap, Radar, Dendrogram, Donut, and Stacked Bars charts.

10. ALL CHARTS AND DASHBOARD ARRANGEMENT

undefinedCreated an Excel spreadsheet to organize all charts, outlining the significance of each chart along with key findings. Assign each chart to one of the dashboards (Dashboard 1, 2, 3, or 4).

  • Arranging all sheets designated for the dashboard
  • Removing redundant charts that contribute minimal value and are repetitive
  • Ultimately exporting the chosen dashboards in PNG format.

11. FIGMA

Creating the dashboard visuals by referencing the dashboard setup image from the Figma website to design the graphics.

undefinedExport the completed images again into SVG format.

12. ALL FORMATTING AND NAVIGATIONS

Complete the remaining formatting and all the details on dashboards and then adding User friendly navigation buttons for easy understanding.

undefined

Additional project images

Discussion and feedback(2 comments)
comment-591-avatar
Aaron Parry
Aaron Parry
5 months ago
Love the interactivity and all the details, Danish!

comment-592-avatar
Danish Sayyed
Danish Sayyed
Project owner
5 months ago
Project owner
Thanks for the compliment Aaron!
2000 characters remaining