top of page

Exploring and visualizing data

  • Writer: Park Daniel
    Park Daniel
  • Oct 1, 2020
  • 2 min read

Updated: Nov 22, 2020

Exploring the data by analyzing its statistics and visualizing the values of features and correlations between different features. Explaining the process and the results


Data visualization definition


Data visualization refers to transforming figures and raw data into visual objects: points, bars, plots, maps, etc. By combining user-friendly and aesthetically pleasing features, these visualizations make research and data analysis much quicker and are also a powerful communication tool.


Ground rules


rule: we are going to be working from left to right. The reason for this is due to the approach as some data set have big number attributes, plus this way we will remember to explore column individually.


Which host have the most listing on Airbnb platform


Find out which hosts (IDs) have most listing on Airbnb platform and take advantage of this service



top_host=airbnb.host_id.value_counts().head(10)
top_host

What this code does is that it will give us the values of listings or reviews for each host to see how much reviews we are working with



219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
16098958      96
61391963      91
22541573      87
200380610     65
7503643       52
Name: host_id, dtype: int64

The output is shown above and the column to the left will symbolize the different host IDs as we do not want to refer to the host by their real name due to privacy. In the column to the right, we can see how many reviews each host got. This is important because hosts with greater reviews will have less variability. It is also worth mentioning that you can consider deleting values for host that have less than 10 reviews as this is not enough for this data visualization process.



Setting figure size for future visualization



#setting figure size for future visualizations
sns.set(rc={'figure.figsize':(10,8)})
sns.set_style('white')

What this code snippet does is that it sets the figures size so that one box is 10 pixels, while the other is 8 pixels. The next part touches upon the seaborn figure styles. There are five seaborn themes: darkgrid, whitegrid, dark, white, and ticks. In this situation, we are using the basic white, because we don't want any other tick marks or distraction that can take away from data in the chart.



Plotting the Chart



viz_1=sns.barplot(x="Host_ID", y="P_Count", data=top_host_df,
                 palette='Blues_d')
viz_1.set_title('Hosts with the most listings in NYC')
viz_1.set_ylabel('Count of listings')
viz_1.set_xlabel('Host IDs')
viz_1.set_xticklabels(viz_1.get_xticklabels(), rotation=45)

This is what is necessary so that the different parts of the graph can be seen by the user. For instance, there are basic methods such as set_title, set_ylabel, and set_xlabel. However the method barplot is what truly makes this graph a barchart.




Above is the output of our code and we can see analyze how some users have more listing in the Airbnb site than others. Also, It is interesting to note that the count of listing is also color coded where hosts with greater listing has a lighter shade of blue compared to others. This phenomenon is attributed to the code palette='Blues_d'.


Comments


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page