Exploring and visualizing data
- Park Daniel
- Oct 1, 2020
- 2 min read
Updated: Nov 22, 2020
Exploring the data by analyzing its statistics and visualizing the values of features and correlations between different features. Explaining the process and the results¶

Data visualization definition
Data visualization refers to transforming figures and raw data into visual objects: points, bars, plots, maps, etc. By combining user-friendly and aesthetically pleasing features, these visualizations make research and data analysis much quicker and are also a powerful communication tool.
Ground rules
rule: we are going to be working from left to right. The reason for this is due to the approach as some data set have big number attributes, plus this way we will remember to explore column individually.
Which host have the most listing on Airbnb platform
Find out which hosts (IDs) have most listing on Airbnb platform and take advantage of this service
top_host=airbnb.host_id.value_counts().head(10)
top_host
What this code does is that it will give us the values of listings or reviews for each host to see how much reviews we are working with
219517861 327
107434423 232
30283594 121
137358866 103
12243051 96
16098958 96
61391963 91
22541573 87
200380610 65
7503643 52
Name: host_id, dtype: int64
The output is shown above and the column to the left will symbolize the different host IDs as we do not want to refer to the host by their real name due to privacy. In the column to the right, we can see how many reviews each host got. This is important because hosts with greater reviews will have less variability. It is also worth mentioning that you can consider deleting values for host that have less than 10 reviews as this is not enough for this data visualization process.
Setting figure size for future visualization
#setting figure size for future visualizations
sns.set(rc={'figure.figsize':(10,8)})
sns.set_style('white')
What this code snippet does is that it sets the figures size so that one box is 10 pixels, while the other is 8 pixels. The next part touches upon the seaborn figure styles. There are five seaborn themes: darkgrid, whitegrid, dark, white, and ticks. In this situation, we are using the basic white, because we don't want any other tick marks or distraction that can take away from data in the chart.
Plotting the Chart
viz_1=sns.barplot(x="Host_ID", y="P_Count", data=top_host_df,
palette='Blues_d')
viz_1.set_title('Hosts with the most listings in NYC')
viz_1.set_ylabel('Count of listings')
viz_1.set_xlabel('Host IDs')
viz_1.set_xticklabels(viz_1.get_xticklabels(), rotation=45)
This is what is necessary so that the different parts of the graph can be seen by the user. For instance, there are basic methods such as set_title, set_ylabel, and set_xlabel. However the method barplot is what truly makes this graph a barchart.

Above is the output of our code and we can see analyze how some users have more listing in the Airbnb site than others. Also, It is interesting to note that the count of listing is also color coded where hosts with greater listing has a lighter shade of blue compared to others. This phenomenon is attributed to the code palette='Blues_d'.
Comments