Coronavirus Graphs
- Park Daniel
- Oct 6, 2023
- 2 min read
Updated: Oct 14, 2023
Goal: Analyze how covid cases have spread over time and look at the countries with the the most covid deaths.
Method: Use read method from csv data and group it into data frame list so that we are able to make graphs and observe trends
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-15-2023.csv'
num_dates = len(confirmed.keys())
ck = confirmed.keys()
dk = deaths.keys()
world_cases = []
total_deaths = []
mortality_rate = []
for i in range(num_dates):
confirmed_sum = confirmed[ck[i]].sum()
death_sum = deaths[dk[i]].sum()
world_cases.append(confirmed_sum)
total_deaths.append(death_sum)
# calculate rates
mortality_rate.append(death_sum/confirmed_sum)
# window size: allows how much data to group
window = 7
# confirmed cases
world_daily_increase = daily_increase(world_cases)
world_confirmed_avg= moving_average(world_cases, window)
world_daily_increase_avg = moving_average(world_daily_increase, window)
# deaths
world_daily_death = daily_increase(total_deaths)
world_death_avg = moving_average(total_deaths, window)
world_daily_death_avg = moving_average(world_daily_death, window)
plot_data = [world_daily_increase_avg, world_daily_death_avg]
print(plot_data)
plt.plot(confirmed_dates, world_daily_increase_avg, label='Daily Increase')
#plt.plot(deaths_dates, world_daily_death_avg, label='Daily Deaths')
plt.xlabel('Dates')
x = range(0,len(confirmed_dates), 100)
plt.xticks(x, [confirmed_dates[i] for i in x], rotation=45)
plt.ylabel('Daily Increase')
plt.title('Time vs. Daily Increase')
plt.legend()
plt.show()
Plot Covid Cases over time

Plot the Top 20 countries with most death
df_death_sum=latest_data[['Country_Region', 'Deaths']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Deaths', ascending=False)
df_death_sum.head(20).plot.bar(x='Country_Region', y='Deaths', title='The top 20 highest death numbers')

df_death_confirmed=latest_data[['Country_Region', 'Confirmed']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Confirmed', ascending=False)df_death_confirmed.head(3)# df_death_ratio=latest_data[['Country_Region', 'Deaths']].groupby(['Country_Region']).sum().reset_index().sort_values(by='Deaths', ascending=False)df_death_ratio = df_death_confirmed.merge(df_death_sum, on='Country_Region')df_death_ratio['death_ratio'] = df_death_ratio['Deaths'] / df_death_ratio['Confirmed']df_death_ratio = df_death_ratio.sort_values(by='death_ratio', ascending=False)df_death_ratio.head(3)

Analysis: I generated a separate list detailing daily confirmed cases and corresponding deaths. Additionally, I calculated the mortality rate using the formula death_sum/confirmed_sum. The scatter plot illustrating COVID cases over time highlighted daily increases, with peaks during the winter of 2021 and spring of 2022. Using the .groupby method with ascending=False, I grouped countries from highest to lowest cases. However, evaluating total deaths alone can be misleading due to the lack of contextual information. For example, the U.S. has the highest death count, largely because of its vast population. It's worth noting that most other countries with high death rates tend to have lower GDPs, lower standards of living, and suboptimal sanitation systems.


Conclusion: The final graph demonstrates the correlation between confirmed cases and deaths across various countries. Notably, countries with the highest death ratios include Yemen, Peru, among others. An intriguing anomaly is North Korea, which appears as a data point far to the left, with 1 confirmed case and 6 deaths. This inconsistency likely indicates inaccurate data reporting by North Korea. Therefore, it would be prudent to exclude this data point for a more accurate analysis.
Commentaires