Coding/Python

Visualizing Data

728x90

Untitled1

Visualizing Data¶

Two primary uses for data visualization:¶

To explore data
To communicate data

Data visualization is a rich field of study that deserves its own book.

In [1]:

import numpy as np
from collections import Counter
import random
import matplotlib.pyplot as plt
%matplotlib inline

matplotlib¶

Widely used
Good for simple bar charts, line charts, and scatterplots
matplotlib.pyplot module

In [2]:

def make_chart_simple_line_chart():
    years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
    gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
    
    # create a line chart, years on x-axis, gdp on y-axis
    plt.plot(years, gdp, color='green', marker='o', linestyle='solid')
    
    # add a title
    plt.title("Nominal GDP")
    
    # add a label to the y-axis
    plt.ylabel("Billions of $")
    
    plt.show()

In [3]:

make_chart_simple_line_chart()

Bar Charts¶

A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items.

In [4]:

def make_chart_simple_bar_chart():
    movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side Story"]
    num_oscars = [5, 11, 3, 8, 10]
    
    # bars are by default width 0.8, so we'll add 0.1 to the left coordinates
    # so that each bar is centered
    xs = [i + 0.1 for i, _ in enumerate(movies)]
    
    # plot bars with left x-coordinates [xs], heights [num_oscars]
    plt.bar(xs, num_oscars)
    plt.ylabel("# of Academy Awards")
    plt.title("My Favorite Movies")
    
    # label x-axis with movie names at bar centers
    plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies)
    
    plt.show()

In [5]:

make_chart_simple_bar_chart()

A bar chart can also be a good choice for plotting histograms of bucketed numeric values, in order to visually explore how the values are distributed

In [6]:

def make_chart_histogram():
    grades = [83,95,91,87,70,0,85,82,100,67,73,77,0]
    decile = lambda grade: grade // 10 * 10
    histogram = Counter(decile(grade) for grade in grades)
    
    plt.bar(
        [x - 4 for x in histogram.keys()],  # shift each bar to the left by 4
        histogram.values(),  # give each bar its correct height
        8  # give each bar a width of 8
    )
    plt.axis([-5, 105, 0, 5])  # x-axis from -5 to 105, y-axis from 0 to 5
    plt.xticks([10 * i for i in range(11)])  # x-axis labels at 0, 10, ..., 100
    plt.xlabel("Decile")
    plt.ylabel("# of Students")
    plt.title("Distribution of Exam 1 Grades")
    plt.show()

In [7]:

make_chart_histogram()

Misleading bar chart¶

In [8]:

def make_chart_misleading_y_axis(mislead=True):
    mentions = [500, 505]
    years = [2013, 2014]
    
    plt.bar([2012.6, 2013.6], mentions, 0.8)
    plt.xticks(years)
    plt.ylabel("# of times I heard someone say 'data science'")
    
    # if you don't do this, matplotlib will label the x-axis 0, 1
    # and then add a +2.013e3 off in the corner (bad matplotlib!)
    plt.ticklabel_format(useOffset=False)
    
    if mislead:
        # misleading y-axis only shows the part above 500
        plt.axis([2012.5,2014.5,499,506])
        plt.title("Look at the 'Huge' Increase!")
    else:
        plt.axis([2012.5,2014.5,0,550])
        plt.title("Not So Huge Anymore.")
        plt.show()

In [9]:

make_chart_misleading_y_axis()

Use more-sensible axes,

In [10]:

make_chart_misleading_y_axis(mislead=False)

Line Charts¶

Line charts using plt.plot()
A good choice for showing trends

In [11]:

def make_chart_several_line_charts():
    variance = [1,2,4,8,16,32,64,128,256]
    bias_squared = [256,128,64,32,16,8,4,2,1]
    total_error = [x + y for x, y in zip(variance, bias_squared)]
    
    xs = range(len(variance))
    
    # we can make multiple calls to plt.plot to show multiple series on the same chart
    plt.plot(xs, variance, 'g-', label='variance')  # green solid line
    plt.plot(xs, bias_squared, 'r-.', label='bias^2')  # red dot-dashed line
    plt.plot(xs, total_error, 'b:', label='total error') # blue dotted line
    
    # because we've assigned labels to each series
    # we can get a legend for free
    # loc=9 means "top center"
    plt.legend(loc=9)
    plt.xlabel("model complexity")
    plt.title("The Bias-Variance Tradeoff")
    plt.show()

In [12]:

make_chart_several_line_charts()

Scatter plots¶

Right choice for visualizing the relationship between two paired sets of data.
Relationship between the number of friends your users have and the number of minutes they spend on the site every day

In [13]:

def make_chart_scatter_plot():
    friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
    minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
    labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
    
    plt.scatter(friends, minutes)
    
    # label each point
    for label, friend_count, minute_count in zip(labels, friends, minutes):
        plt.annotate(
            label,
            xy=(friend_count, minute_count),  # put the label with its point
            xytext=(5, -5),  # but slightly offset
            textcoords='offset points'
        )
    
    plt.title("Daily Minutes vs. Number of Friends")
    plt.xlabel("# of friends")
    plt.ylabel("daily minutes spent on the site")
    plt.show()

In [14]:

make_chart_scatter_plot()

If you’re scattering comparable variables, you might get a misleading picture if you let matplotlib choose the scale

In [15]:

def make_chart_scatterplot_axes(equal_axes=False):
    test_1_grades = [ 99, 90, 85, 97, 80]
    test_2_grades = [100, 85, 60, 90, 70]
    
    plt.scatter(test_1_grades, test_2_grades)
    plt.xlabel("test 1 grade")
    plt.ylabel("test 2 grade")
    
    if equal_axes:
        plt.title("Axes Are Comparable")
        plt.axis("equal")
    else:
        plt.title("Axes Aren't Comparable")
        plt.show()

In [16]:

make_chart_scatterplot_axes()

In [17]:

make_chart_scatterplot_axes(equal_axes=True)

Always try to read help page¶

In [18]:

help(plt.axis)

Help on function axis in module matplotlib.pyplot:

axis(*args, emit=True, **kwargs)
    Convenience method to get or set some axis properties.
    
    Call signatures::
    
      xmin, xmax, ymin, ymax = axis()
      xmin, xmax, ymin, ymax = axis([xmin, xmax, ymin, ymax])
      xmin, xmax, ymin, ymax = axis(option)
      xmin, xmax, ymin, ymax = axis(**kwargs)
    
    Parameters
    ----------
    xmin, xmax, ymin, ymax : float, optional
        The axis limits to be set.  This can also be achieved using ::
    
            ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))
    
    option : bool or str
        If a bool, turns axis lines and labels on or off. If a string,
        possible values are:
    
        ======== ==========================================================
        Value    Description
        ======== ==========================================================
        'on'     Turn on axis lines and labels. Same as ``True``.
        'off'    Turn off axis lines and labels. Same as ``False``.
        'equal'  Set equal scaling (i.e., make circles circular) by
                 changing axis limits. This is the same as
                 ``ax.set_aspect('equal', adjustable='datalim')``.
                 Explicit data limits may not be respected in this case.
        'scaled' Set equal scaling (i.e., make circles circular) by
                 changing dimensions of the plot box. This is the same as
                 ``ax.set_aspect('equal', adjustable='box', anchor='C')``.
                 Additionally, further autoscaling will be disabled.
        'tight'  Set limits just large enough to show all data, then
                 disable further autoscaling.
        'auto'   Automatic scaling (fill plot box with data).
        'image'  'scaled' with axis limits equal to data limits.
        'square' Square plot; similar to 'scaled', but initially forcing
                 ``xmax-xmin == ymax-ymin``.
        ======== ==========================================================
    
    emit : bool, default: True
        Whether observers are notified of the axis limit change.
        This option is passed on to `~.Axes.set_xlim` and
        `~.Axes.set_ylim`.
    
    Returns
    -------
    xmin, xmax, ymin, ymax : float
        The axis limits.
    
    See Also
    --------
    matplotlib.axes.Axes.set_xlim
    matplotlib.axes.Axes.set_ylim

Pie Charts¶

Circle divided into slices to illustrate numerical proportion

In [19]:

def make_chart_pie_chart():
    plt.pie([0.95, 0.05], labels=["Uses pie charts", "Knows better"])
    
    # make sure pie is a circle and not an oval
    plt.axis("equal")
    plt.show()

In [20]:

make_chart_pie_chart()

Data scientists move to bokeh¶

Bokeh is a newer library that brings D3-style (interactive) visualizations into Python
https://demo.bokeh.org/movies

In [21]:

# Bokeh Libraries
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]

#friends = [i + 3 * random.random() for i in friends for _ in range(100)]
#minutes = [i + 50 * random.random() for i in minutes for _ in range(100)]
output_notebook()
TOOLS="hover,crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select,"

# Set up a generic figure() object
fig = figure(tools=TOOLS)
fig.scatter(friends, minutes)
# See what it looks like
show(fig)

Loading BokehJS ...

Complete Example¶

In [22]:

dept_names = ["ME", "EE", "CS", "CE", "IE"]
num_apps = [100, 123, 212, 50, 55]
num_adms = [50, 60, 60, 30, 30]

plt.figure(figsize=(10, 5))

plt.bar(np.array(range(len(dept_names)))-0.2, num_apps, width=0.5, label="# applied")
plt.bar(range(len(dept_names)), num_adms, color="g", width=0.5, label="# admitted")

plt.xticks(range(len(dept_names)), dept_names)
plt.yticks(range(0, 250, 50))
plt.legend(loc=1)

plt.title("comparing # of dept applicants")
plt.xlabel("departments")
plt.ylabel("# applied vs # admitted")

plt.annotate(f"CS competition rate = {num_adms[2]/num_apps[2]:.2}", xy=(-0.1, 180))

Out[22]:

Text(-0.1, 180, 'CS competition rate = 0.28')

728x90

저작자표시 비영리 동일조건

'Coding > Python' 카테고리의 다른 글

Pandas Tutorial (0)	2024.11.03
Numpy Tutorial (0)	2024.11.03
Crash Cource in Python (0)	2024.11.03
FastAPI를 이용한 웹캠 스트리밍 서버 (0)	2024.10.29
Numpy in Python (0)	2024.09.10

Contents

새소식

인기 검색어