__STYLES__

Analysis on a survey of 630 Data Professional participants

Tools used in this project
Analysis on a survey of 630 Data Professional participants

About this project

Project goal:

The objective of this Power BI project was to inspect its insights to explore if education level influences the success of data professionals, in a scenario that would capture the attention of business stakeholders.

Data source:

Source of data set was found from this hyperlink:

Power BI data source

Business insights:

Numerous business insightful scenarios can be conceptualized based upon the interactivity of the data professional survey dashboard. However, the business stakeholders were mainly interested in gaining insights into whether education level basis for a data professional to be successful is not a crucial factor.

To answer this primary question for the stakeholders, a range of values would be created for a particular survey question that centered on education levels. The range of values would start at the high school education level and progress to that of a PhD education level. Additionally, the top five earning occupations were brought into consideration which were dependent on the education level being observed. Survey participants also provided the country that they reside in.

Based on data professional survey, the following results:

In the United States, among 36 survey participants holding high school degrees, the average yearly salary of data architects was $85,000.

In the United States, among 16 survey participants holding associate’s degrees, the average yearly salary of data scientists was $115,000.

In the United States, among 329 survey participants holding bachelor‘s degrees, the average yearly salary of data scientists was $93,000.

In the United States, among 192 survey participants holding masters degrees, the average yearly salary of data scientists was $104,000.

In the United States, among 5 survey participants holding PhD‘s, the average yearly salary of data scientists was $206,000.

Special mention:

Survey participants chose not to select their education level; however, data results will be shown.

In the United States, among 52 survey participants holding undetermined education level, the average yearly salary of data engineers was $98,000.

Based on the insights derived from the data above, business stakeholders can be informed that the top five occupations on a education level basis for a data professional to be successful is not a crucial factor due to earning higher than average yearly salaries in the United States.

Other possible questions that stakeholders may think of based on the information given in the dashboard:

  • What was the top occupation for survey participants who provided their own occupation titles in the United States?
    • Systems configuration was the top earning occupation where the average yearly salary earned was $75,000.
  • What was the top occupation that female survey participants held who reside in the United States?
    • Data scientist was the top earning occupation where the average yearly salary earned was $95,000.
  • What was the top occupation for survey participants who reside in India?
    • Data scientist was the top earning occupation where the average yearly salary earned was 7,644,693.00 rupees.

Special mention:

The yearly salary data found in the data set was configured to United States dollars. So, I used the following website to convert $93,000 United States dollars to 7,644,693.00 rupees

https://usd.currencyrate.today/convert/amount-93000-to-inr.html

Import data set and data tools used:

Imported the data set into Microsoft Excel to preview the data columns. Afterwards, Microsoft Power BI was utilized for data analysis and data visualization purposes.

Data set details:

Raw data set contained 28 columns and 630 rows.

The following column names were found in the data set:

  • Unique ID
  • Email
  • Date Taken (America/New_York)
  • Time Taken (America/New_York)
  • Browser
  • OS
  • City
  • Country
  • Referrer
  • Time Spent
  • Q1 - Which Title Best Fits your Current Role?
  • Q2 - Did you switch careers into Data?
  • Q3 - Current Yearly Salary (in USD)
  • Q4 - What Industry do you work in?
  • Q5 - Favorite Programming Language
  • Q6 - How Happy are you in your Current Position with the following? (Salary)
  • Q6 - How Happy are you in your Current Position with the following? (Work/Life Balance)
  • Q6 - How Happy are you in your Current Position with the following? (Coworkers)
  • Q6 - How Happy are you in your Current Position with the following? (Management)
  • Q6 - How Happy are you in your Current Position with the following? (Upward Mobility)
  • Q6 - How Happy are you in your Current Position with the following? (Learning New Things)
  • Q7 - How difficult was it for you to break into Data?
  • Q8 - If you were to look for a new job today, what would be the most important thing to you?
  • Q9 - Male/Female?
  • Q10 - Current Age
  • Q11 - Which Country do you live in?
  • Q12 - Highest Level of Education
  • Q13 - Ethnicity

Data Cleaning methods:

Upon loading the Microsoft Excel workbook into Microsoft Power BI, I selected the Power BI option to transform data. By doing this, I was directed to the Power Query Editor.

Special mention:

"Power BI Desktop also comes with Power Query Editor. Use Power Query Editor to connect to one or many data sources, shape and transform the data to meet your needs, then load that model into Power BI Desktop."

Source of this special mention is found in the hyperlink below:

Query overview in Power BI Desktop - Power BI | Microsoft Learn

The very first step in the data cleaning process involved selecting the entire data set and then used the format command followed by the trim and clean features.

By selecting the trim and clean features, I have ensured that no invisible character values are not found anywhere within the data set.

I deleted the following empty columns in Power Query Editor:

  • Email
  • Date Taken (America/New_York)
  • Time Taken (America/New_York)
  • Browser
  • OS
  • City
  • Country
  • Referrer
  • Time Spent
Six steps were used in the standardizing data process with column: Q1 - Which Title Best Fits your Current Role?

1.) Duplicate column:

  • Renamed column: Q1 - Which Title Best Fits your Current Role? Preselected (Modified)

2.) Split column by delimiter value:

  • Other
    • Column was not deleted due to containing other specific data job title values as defined by the survey participants.

3.) Split new column by delimiter value:

  • (Please Specify)
    • deleted column

4.) Split new column by delimiter value:

  • :
    • deleted column

5.) Renamed data values for Student/Looking/None:

  • Column value was renamed because it has a range of values rather than having a specific value.
    • Renamed value:
      • Did not select

6.) Viewed all data values within renamed column:

  • Data Analyst
  • Data Architect
  • Data Engineer
  • Data Scientist
  • Database Developer
  • Did not select
Three steps were used in the standardizing data process with the split column: Q1 - Which Title Best Fits your Current Role? - Copy

1.) Renamed new column:

  • Q1 - Which Title Best Fits your Current Role? - (User specified - Modified)
    • Column contains other data job titles, defined by the survey participants, that were split from column: Q1 - Which Title Best Fits your Current Role?

2.) Standardize data values by renaming to proper job titles:

  • Data job titles as listed in the column
    • Some examples of job titles that I renamed:
      • Analyst Primary Market Intelligence
      • Business Analys
      • Does a social media analyst count?
      • I work with data tools and can create simple dashboards but I am not a data scientist
      • Manager of a team of Data Analysts
      • Manager, Business Intelligence Develop

3.) Viewed all data values within renamed column:

  • A small sample set based on the 86 data job entries that were corrected:
    • Account manager
    • Ads operations
    • AI Software Engineer
    • Analytics Consultant
    • Analytics Engineer
    • BI consultant
    • BI Manager
    • Billing analyst
    • Business analyst

The next column to be standardized was the Q3 - Current Yearly Salary (in USD).

Upon viewing this column, I discovered the data values were in a range rather than whole numbers. As a result, I decided to use DAX functionality within Power Query Editor to obtain the average values.

Special mention:

"Data Analysis Expressions (DAX) is a library of functions and operators that can be combined to build formulas and expressions in Power BI, Analysis Services, and Power Pivot in Excel data model.."

Source of this special mention is found in the hyperlink below:

Data Analysis Expressions (DAX) Reference

Six steps were used in the standardizing data process with column: Q3 - Current Yearly Salary (in USD).

1.) Duplicate column:

  • Renamed column
    • Q3 - Current Yearly Salary (in USD) - Modified

2.) Split column by delimiter value:

  • k
    • deleted column

3.) Replaced values within column Q3 - Current Yearly Salary (in USD):

  • k
      • null values
      • 225 data values

4.) Created a new custom column:

Before new custom column was created, #Q3 - Current Yearly Salary (in USD) - Copy.1 and #Q3 - Current Yearly Salary (in USD) - Copy.2 columns were changed to a different data type. Selected whole as the new data type.

  • New custom column name
    • Average Salary
      • DAX custom column formula:
        • ([#Q3 - Current Yearly Salary (in USD) - Copy.1"] + [#Q3 - Current Yearly Salary (in USD) - Copy.2"]) / 2

5.) Deleted columns

  • Q3 - Current Yearly Salary (in USD) - Copy.1
  • Q3 - Current Yearly Salary (in USD) - Copy.2

6.) Viewed all data values within renamed column:

  • A small sample set based on the 630 average yearly salary entries that were corrected:
    • 20
    • 53
    • 75.5
    • 95.5
    • 115.5
    • 137.5
Two steps were used in the standardizing data process with column: Q11 - Which Country do you live in?

I filtered for the countries that users specified. 21 rows were found. Examples of countries that survey participants listed were:

  • Chile
  • Costa Rica
  • France
  • Greece
  • Mexico
  • Panama
  • Poland

However, I also filtered for countries that were preselected from the survey. 26 rows were found. The preselected countries were:

  • Canda
  • India
  • United Kingdom
  • United States

Overall, I decided to only use the preselected countries due to more rows of data for analysis.

1.) Split column by delimiter value:

  • (
    • deleted column

2.) Viewed all data values within column:

  • Canda
  • India
  • Other
  • United Kingdom
  • United States

Creation of Power BI Dashboard data elements:

Overview of designing Power BI dashboard elements:

Dashboard header:

  • Text box: Data Professional Survey Dashboard
  • Alignment: Center
  • Background color: Light blue color
  • Font: Segue UI

undefined

Card 1:

  • Data field: Unique ID
  • Field value: Count (Distinct)
  • Text: Total number of Survey Participants

undefined

Card 2:

  • Data field: Current Age
  • Field value: Average
  • Text: Average age of Survey Participant

undefinedChart 1:

  • Title: Number of participants by Gender
  • Chart type: Stacked column chart
  • X and Y-axis:
    • X-axis:
      • Data field: Male/Female
      • Text: Gender
    • Y-axis:
      • Data field: Male/Female
      • Field value: Count
      • Text: Number of participants

undefinedChart 2:

  • Title: Survey Participants by Country
  • Chart type: Funnel
  • Data Category: Which Country do you live in?
  • Data values: Which Country do you live in?
    • Field value: Count

undefinedChart 3:

  • Chart type: Stacked bar chart
    • Title: Top 5 Occupations (based on preselection)
    • X-axis:
      • Data field: Average Yearly Salary
      • Field value: Average
      • Text: Average Yearly Salary
    • Y-axis:
      • Data field: Which Title Best Fits your Current Role?
      • Text: Occupation
      • Filter: Occupation
        • Filter type:
          • Top N
            • Specify number of top rows to show:
              • 5
                • Displays the top 5 rows found in column Q1 - Which Title Best Fits your Current Role? Preselected (Modified)
  • Top 5 Occupations (based on preselection)

undefinedChart 4:

  • Chart type: Stacked bar chart
    • Title: Top 5 Occupations (based on Participant input)
    • X-axis:
      • Data field: Average Yearly Salary
      • Field value: Average
      • Text: Average Yearly Salary
    • Y-axis:
      • Data field: Which Title Best Fits your Current Role?
      • Text: Occupation
      • Filter: Occupation
        • Filter type:
          • Top N
            • Specify number of top rows to show:
              • 5
                • Displays the top 5 rows found in column Q1 - Which Title Best Fits your Current Role? - (User specified - Modified)

undefinedChart 5:

  • Chart type: Stacked column chart
    • Title: Number of participants by Education level
    • X-axis:
      • Data field:
        • Highest Level of Education
      • Text: Education level
    • Y-axis:
      • Data field: Unique ID
      • Field value: Count
      • Text: Number of Participants

undefined

Completion of Power BI Dashboard:

When thinking of the color scheme for this Power BI Dashboard, I decided to select the accessible theme called Tidal. By choosing this particular theme, the colors utilized throughout the Power BI dashboard will be easy on the eyes to as many viewers as possible.

The Power BI dashboard data elements have been organized in a manner that illustrates helpful statistical information that was collected from the data professional survey.

undefined

Areas for data set improvements:

  • Add column called motivational factors
    • Survey participants will have the chance to express themselves in an open ended survey question that centers on their motivational factors with pursuing the data analytics career path.
      • Possible survey responses include but are not limited to:
        • Learning new data analytics skills
        • Money
        • Provide for family
        • Express myself artistically by using data visualization tools
        • Opportunity to meet new people
  • Add column for learning SQL skills
    • By using a drop down combination box, survey participants can indicate their learning curve with developing SQL skills.
      • Survey responses will have the following range values:
        • Very easy
        • Easy
        • Moderate
        • Difficult
        • Very Difficult
  • Add column for Favorite data visualization tool
    • By using an open ended question format, survey participants can share their favorite data visualization tool.
      • Possible survey responses include but are not limited to:
        • Tableau
        • Looker
        • Qlik Sense
        • Power BI

Conclusion and thankfulness for viewing data project:

Please feel free to reach out to me on LinkedIn if you have any comments or questions.

Lastly, thank you very much for viewing this Power BI data project.

Discussion and feedback(0 comments)
2000 characters remaining