__STYLES__

(R): Pandemic pulse: Data driven investigation to explore strategic insights from the Covid-19 Global pandemic

Tools used in this project
(R): Pandemic pulse: Data driven investigation to explore strategic insights  from the Covid-19 Global pandemic

Covid 10 data analysis by Paul Carmody

About this project

Full report here

Key Insights

  1. The United States has the highest number of cases and deaths, followed by India, Brazil, and several European countries.

  2. The top 10 countries with the most cases are the USA, India, France, Germany, Brazil, Japan, South Korea, Italy, the UK, and Russia. The top 10 countries with the most deaths are the USA, Brazil, India, Russia, Mexico, Peru, the UK, Italy, Germany, and France. These charts provide an overview of the countries most affected by the pandemic.

  3. Yemen has the highest case fatality rate (CFR), followed by Peru, Mexico, Syria, and Brazil. It's important to consider various factors that can influence the CFR, such as the age distribution of the population and healthcare capacity.

  4. The geographical distribution of deaths shows variations in different regions. The United States is an anomaly in that it presents very high deaths compared to its neighbours, with only Brazil coming close. In Europe, Scandinavia and most of Eastern Europe (excluding the UK) have come out relatively unscathed, while Germany, France, and the UK have taken a big hit, as well as Russia. In Asia, there are pockets of high cases, such as in India. In Africa, South Africa fared the worst, with the rest of Africa performing better.

Analytical Process

The analytical process for this project involved several steps:

  1. Importing Data: The project started by importing the COVID-19 dataset into RStudio. The read.csv() function was used to read the data from a CSV file.
  2. Data Preparation: The dataset was cleaned and prepared for analysis. This involved changing column types, removing missing values using the na.omit() function, and checking and modifying column names.
  3. Exploratory Data Analysis: Once the data was cleaned, various exploratory data analysis techniques were applied to gain insights into the pandemic. This included calculating the total number of cases and deaths worldwide, visualizing the distribution of cases and deaths by country using bar charts, and creating a geographical plot to display the distribution of deaths around the world.
  4. Additional Analysis: The project also included identifying the top 5 countries with the highest active cases and calculating the case fatality rate (CFR) for COVID-19 across different countries. This involved performing calculations and creating visualizations to compare the CFR across different countries.

Key Skills Used

The following key skills were used in this project, which can be beneficial to any business from their analysts:

1. Data Import and Manipulation: The read.csv() the function was used to import the dataset, and the manipulation of column types and cleaning of missing values was performed using functions such as as.integer(), na.omit(), and gsub().2

2. Data Visualization: The ggplot2 package was used to create various visualizations, including bar charts and a geographical plot. The geom_bar() and geom_polygon() functions were used to create the bar charts and the geographical plot, respectively.

3. Data Aggregation and Summary: The dplyr package was utilized for data aggregation and summary operations. Functions such as group_by(), summarise(), and arrange() were used to calculate total cases and deaths, aggregate data by country, and sort data.

4. Data Analysis and Calculation: The dplyr package was also used to perform calculations such as calculating active cases and calculating the CFR for each country.

5. Markdown Reporting: The project documentation was written using R Markdown, allowing for the inclusion of code, visualizations, and descriptions in a single document.

These skills can help businesses in analyzing and interpreting data, identifying key insights, and presenting findings in a clear and concise manner.

Additional project images

CFR rate
Covid map
sample code
Discussion and feedback(0 comments)
2000 characters remaining