top of page

Coronavirus Graphs

  • Writer: Park Daniel
    Park Daniel
  • Oct 6, 2023
  • 2 min read

Updated: Oct 14, 2023

Goal: Analyze how covid cases have spread over time and look at the countries with the the most covid deaths.


Method: Use read method from csv data and group it into data frame list so that we are able to make graphs and observe trends

confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-15-2023.csv'
num_dates = len(confirmed.keys())
ck = confirmed.keys()
dk = deaths.keys()

world_cases = []
total_deaths = [] 
mortality_rate = []


for i in range(num_dates):
    confirmed_sum = confirmed[ck[i]].sum()
    death_sum = deaths[dk[i]].sum()
    
    world_cases.append(confirmed_sum)
    total_deaths.append(death_sum)
    
    # calculate rates
    mortality_rate.append(death_sum/confirmed_sum)
# window size: allows how much data to group
window = 7

# confirmed cases
world_daily_increase = daily_increase(world_cases)
world_confirmed_avg= moving_average(world_cases, window)
world_daily_increase_avg = moving_average(world_daily_increase, window)

# deaths
world_daily_death = daily_increase(total_deaths)
world_death_avg = moving_average(total_deaths, window)
world_daily_death_avg = moving_average(world_daily_death, window)

plot_data = [world_daily_increase_avg, world_daily_death_avg]
print(plot_data)
plt.plot(confirmed_dates, world_daily_increase_avg, label='Daily Increase')
#plt.plot(deaths_dates, world_daily_death_avg, label='Daily Deaths')
plt.xlabel('Dates')
x = range(0,len(confirmed_dates), 100)
plt.xticks(x, [confirmed_dates[i] for i in x], rotation=45)
plt.ylabel('Daily Increase')
plt.title('Time vs. Daily Increase')
plt.legend()
plt.show()

Plot Covid Cases over time



Plot the Top 20 countries with most death

df_death_sum=latest_data[['Country_Region', 'Deaths']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Deaths', ascending=False) 
df_death_sum.head(20).plot.bar(x='Country_Region', y='Deaths', title='The top 20 highest death numbers')

df_death_confirmed=latest_data[['Country_Region', 'Confirmed']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Confirmed', ascending=False)df_death_confirmed.head(3)# df_death_ratio=latest_data[['Country_Region', 'Deaths']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Deaths', ascending=False)df_death_ratio = df_death_confirmed.merge(df_death_sum, on='Country_Region')df_death_ratio['death_ratio'] = df_death_ratio['Deaths'] / df_death_ratio['Confirmed']df_death_ratio = df_death_ratio.sort_values(by='death_ratio', ascending=False)df_death_ratio.head(3)

Analysis: I generated a separate list detailing daily confirmed cases and corresponding deaths. Additionally, I calculated the mortality rate using the formula death_sum/confirmed_sum. The scatter plot illustrating COVID cases over time highlighted daily increases, with peaks during the winter of 2021 and spring of 2022. Using the .groupby method with ascending=False, I grouped countries from highest to lowest cases. However, evaluating total deaths alone can be misleading due to the lack of contextual information. For example, the U.S. has the highest death count, largely because of its vast population. It's worth noting that most other countries with high death rates tend to have lower GDPs, lower standards of living, and suboptimal sanitation systems.






Conclusion: The final graph demonstrates the correlation between confirmed cases and deaths across various countries. Notably, countries with the highest death ratios include Yemen, Peru, among others. An intriguing anomaly is North Korea, which appears as a data point far to the left, with 1 confirmed case and 6 deaths. This inconsistency likely indicates inaccurate data reporting by North Korea. Therefore, it would be prudent to exclude this data point for a more accurate analysis.








Commentaires


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page