Coding/Python
Visualizing Data
- -
728x90
반응형
Visualizing Data¶
Two primary uses for data visualization:¶
- To explore data
- To communicate data
Data visualization is a rich field of study that deserves its own book.
In [1]:
import numpy as np
from collections import Counter
import random
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib¶
- Widely used
- Good for simple bar charts, line charts, and scatterplots
- matplotlib.pyplot module
In [2]:
def make_chart_simple_line_chart():
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')
# add a title
plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")
plt.show()
In [3]:
make_chart_simple_line_chart()
Bar Charts¶
A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items.
In [4]:
def make_chart_simple_bar_chart():
movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side Story"]
num_oscars = [5, 11, 3, 8, 10]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered
xs = [i + 0.1 for i, _ in enumerate(movies)]
# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.bar(xs, num_oscars)
plt.ylabel("# of Academy Awards")
plt.title("My Favorite Movies")
# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies)
plt.show()
In [5]:
make_chart_simple_bar_chart()
- A bar chart can also be a good choice for plotting histograms of bucketed numeric values, in order to visually explore how the values are distributed
In [6]:
def make_chart_histogram():
grades = [83,95,91,87,70,0,85,82,100,67,73,77,0]
decile = lambda grade: grade // 10 * 10
histogram = Counter(decile(grade) for grade in grades)
plt.bar(
[x - 4 for x in histogram.keys()], # shift each bar to the left by 4
histogram.values(), # give each bar its correct height
8 # give each bar a width of 8
)
plt.axis([-5, 105, 0, 5]) # x-axis from -5 to 105, y-axis from 0 to 5
plt.xticks([10 * i for i in range(11)]) # x-axis labels at 0, 10, ..., 100
plt.xlabel("Decile")
plt.ylabel("# of Students")
plt.title("Distribution of Exam 1 Grades")
plt.show()
In [7]:
make_chart_histogram()
Misleading bar chart¶
In [8]:
def make_chart_misleading_y_axis(mislead=True):
mentions = [500, 505]
years = [2013, 2014]
plt.bar([2012.6, 2013.6], mentions, 0.8)
plt.xticks(years)
plt.ylabel("# of times I heard someone say 'data science'")
# if you don't do this, matplotlib will label the x-axis 0, 1
# and then add a +2.013e3 off in the corner (bad matplotlib!)
plt.ticklabel_format(useOffset=False)
if mislead:
# misleading y-axis only shows the part above 500
plt.axis([2012.5,2014.5,499,506])
plt.title("Look at the 'Huge' Increase!")
else:
plt.axis([2012.5,2014.5,0,550])
plt.title("Not So Huge Anymore.")
plt.show()
In [9]:
make_chart_misleading_y_axis()
- Use more-sensible axes,
In [10]:
make_chart_misleading_y_axis(mislead=False)
Line Charts¶
- Line charts using plt.plot()
- A good choice for showing trends
In [11]:
def make_chart_several_line_charts():
variance = [1,2,4,8,16,32,64,128,256]
bias_squared = [256,128,64,32,16,8,4,2,1]
total_error = [x + y for x, y in zip(variance, bias_squared)]
xs = range(len(variance))
# we can make multiple calls to plt.plot to show multiple series on the same chart
plt.plot(xs, variance, 'g-', label='variance') # green solid line
plt.plot(xs, bias_squared, 'r-.', label='bias^2') # red dot-dashed line
plt.plot(xs, total_error, 'b:', label='total error') # blue dotted line
# because we've assigned labels to each series
# we can get a legend for free
# loc=9 means "top center"
plt.legend(loc=9)
plt.xlabel("model complexity")
plt.title("The Bias-Variance Tradeoff")
plt.show()
In [12]:
make_chart_several_line_charts()
Scatter plots¶
- Right choice for visualizing the relationship between two paired sets of data.
- Relationship between the number of friends your users have and the number of minutes they spend on the site every day
In [13]:
def make_chart_scatter_plot():
friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
plt.scatter(friends, minutes)
# label each point
for label, friend_count, minute_count in zip(labels, friends, minutes):
plt.annotate(
label,
xy=(friend_count, minute_count), # put the label with its point
xytext=(5, -5), # but slightly offset
textcoords='offset points'
)
plt.title("Daily Minutes vs. Number of Friends")
plt.xlabel("# of friends")
plt.ylabel("daily minutes spent on the site")
plt.show()
In [14]:
make_chart_scatter_plot()
- If you’re scattering comparable variables, you might get a misleading picture if you let matplotlib choose the scale
In [15]:
def make_chart_scatterplot_axes(equal_axes=False):
test_1_grades = [ 99, 90, 85, 97, 80]
test_2_grades = [100, 85, 60, 90, 70]
plt.scatter(test_1_grades, test_2_grades)
plt.xlabel("test 1 grade")
plt.ylabel("test 2 grade")
if equal_axes:
plt.title("Axes Are Comparable")
plt.axis("equal")
else:
plt.title("Axes Aren't Comparable")
plt.show()
In [16]:
make_chart_scatterplot_axes()
In [17]:
make_chart_scatterplot_axes(equal_axes=True)
Always try to read help page¶
In [18]:
help(plt.axis)
Help on function axis in module matplotlib.pyplot: axis(*args, emit=True, **kwargs) Convenience method to get or set some axis properties. Call signatures:: xmin, xmax, ymin, ymax = axis() xmin, xmax, ymin, ymax = axis([xmin, xmax, ymin, ymax]) xmin, xmax, ymin, ymax = axis(option) xmin, xmax, ymin, ymax = axis(**kwargs) Parameters ---------- xmin, xmax, ymin, ymax : float, optional The axis limits to be set. This can also be achieved using :: ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax)) option : bool or str If a bool, turns axis lines and labels on or off. If a string, possible values are: ======== ========================================================== Value Description ======== ========================================================== 'on' Turn on axis lines and labels. Same as ``True``. 'off' Turn off axis lines and labels. Same as ``False``. 'equal' Set equal scaling (i.e., make circles circular) by changing axis limits. This is the same as ``ax.set_aspect('equal', adjustable='datalim')``. Explicit data limits may not be respected in this case. 'scaled' Set equal scaling (i.e., make circles circular) by changing dimensions of the plot box. This is the same as ``ax.set_aspect('equal', adjustable='box', anchor='C')``. Additionally, further autoscaling will be disabled. 'tight' Set limits just large enough to show all data, then disable further autoscaling. 'auto' Automatic scaling (fill plot box with data). 'image' 'scaled' with axis limits equal to data limits. 'square' Square plot; similar to 'scaled', but initially forcing ``xmax-xmin == ymax-ymin``. ======== ========================================================== emit : bool, default: True Whether observers are notified of the axis limit change. This option is passed on to `~.Axes.set_xlim` and `~.Axes.set_ylim`. Returns ------- xmin, xmax, ymin, ymax : float The axis limits. See Also -------- matplotlib.axes.Axes.set_xlim matplotlib.axes.Axes.set_ylim
Pie Charts¶
- Circle divided into slices to illustrate numerical proportion
In [19]:
def make_chart_pie_chart():
plt.pie([0.95, 0.05], labels=["Uses pie charts", "Knows better"])
# make sure pie is a circle and not an oval
plt.axis("equal")
plt.show()
In [20]:
make_chart_pie_chart()
Data scientists move to bokeh¶
- Bokeh is a newer library that brings D3-style (interactive) visualizations into Python
- https://demo.bokeh.org/movies
In [21]:
# Bokeh Libraries
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
#friends = [i + 3 * random.random() for i in friends for _ in range(100)]
#minutes = [i + 50 * random.random() for i in minutes for _ in range(100)]
output_notebook()
TOOLS="hover,crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select,"
# Set up a generic figure() object
fig = figure(tools=TOOLS)
fig.scatter(friends, minutes)
# See what it looks like
show(fig)
Complete Example¶
In [22]:
dept_names = ["ME", "EE", "CS", "CE", "IE"]
num_apps = [100, 123, 212, 50, 55]
num_adms = [50, 60, 60, 30, 30]
plt.figure(figsize=(10, 5))
plt.bar(np.array(range(len(dept_names)))-0.2, num_apps, width=0.5, label="# applied")
plt.bar(range(len(dept_names)), num_adms, color="g", width=0.5, label="# admitted")
plt.xticks(range(len(dept_names)), dept_names)
plt.yticks(range(0, 250, 50))
plt.legend(loc=1)
plt.title("comparing # of dept applicants")
plt.xlabel("departments")
plt.ylabel("# applied vs # admitted")
plt.annotate(f"CS competition rate = {num_adms[2]/num_apps[2]:.2}", xy=(-0.1, 180))
Out[22]:
Text(-0.1, 180, 'CS competition rate = 0.28')
728x90
반응형
'Coding > Python' 카테고리의 다른 글
Pandas Tutorial (0) | 2024.11.03 |
---|---|
Numpy Tutorial (0) | 2024.11.03 |
Crash Cource in Python (0) | 2024.11.03 |
FastAPI를 이용한 웹캠 스트리밍 서버 (0) | 2024.10.29 |
Numpy in Python (0) | 2024.09.10 |
Contents
소중한 공감 감사합니다