Numpy Tutorial
- -
Introduction to Numpy¶
Numpy, which stands for numerical Python, is a Python library package to support numerical computations. The basic data structure in numpy is a multi-dimensional array object called ndarray. Numpy provides a suite of functions that can efficiently manipulate elements of the ndarray.
- To see the reference manual
- Help → Numpy Reference
- or https://docs.scipy.org/doc/numpy/reference/?v=20190305121057
- We will introduce basic building blocks of numpy.
- To use a specific function, always try googling.
- If you are an expert in numpy, you will learn SPARK RDD computing faster. (based on similar concepts)
import numpy as np
import math
Creating ndarray¶
An ndarray can be created from a list or tuple object.
- numpy.array is just a convenience function to create an ndarray
- numpy.ndarray is a class
Note:
- Tensor is "something" which can be represented as multidimensional array.
- Tensorflow is a google product for deep learning.
- For tensorflow coding, you must be very proficient in numpy n-dimensional array.
oneDim = np.array([1.0,2,3,4,5]) # a 1-dimensional array (vector)
print(oneDim)
print("#Dimensions =", oneDim.ndim)
print("shape =", oneDim.shape)
print("Size =", oneDim.size)
print("Array type =", oneDim.dtype)
[1. 2. 3. 4. 5.] #Dimensions = 1 shape = (5,) Size = 5 Array type = float64
twoDim = np.array([[1,2],[3,4],[5,6],[7,8]]) # a two-dimensional array (matrix)
print(twoDim)
print("#Dimensions =", twoDim.ndim)
print("Dimension =", twoDim.shape)
print("Size =", twoDim.size)
print("Array type =", twoDim.dtype)
[[1 2] [3 4] [5 6] [7 8]] #Dimensions = 2 Dimension = (4, 2) Size = 8 Array type = int64
arrFromTuple = np.array([(1,'a',3.0),(2,'b',3.5)]) # create ndarray from tuple
print(arrFromTuple)
print("#Dimensions =", arrFromTuple.ndim)
print("shape =", arrFromTuple.shape)
print("Size =", arrFromTuple.size)
[['1' 'a' '3.0'] ['2' 'b' '3.5']] #Dimensions = 2 shape = (2, 3) Size = 6
# Guess what is printed
print(np.array([1]).shape)
print(np.array([1,2]).shape)
print(np.array([[1],[2]]).shape)
print(np.array([[[1,2,3],[1,2,3]]]).shape)
print(np.array([[[[]]]]).shape)
(1,) (2,) (2, 1) (1, 2, 3) (1, 1, 1, 0)
There are several built-in functions in numpy that can be used to create ndarrays
print(np.random.rand(5)) # random numbers from a uniform distribution between [0,1]
print(np.random.randn(5)) # random numbers from a normal distribution
print(np.arange(-10, 10, 2)) # similar to range, but returns ndarray instead of list
print(np.arange(12).reshape(3, 4)) # reshape to a matrix
print(np.linspace(0, 1, 10)) # split interval [0,1] into 10 equally separated values
# create ndarray with values from 10^-3 to 10^3
# logspace returns numbers spaced evenly on a log scale.
print(np.logspace(-3, 3, 7))
[0.1478559 0.71121211 0.3819452 0.4919969 0.87905582] [ 0.96579878 -0.4737488 -0.45666711 -0.83650159 -0.33204275] [-10 -8 -6 -4 -2 0 2 4 6 8] [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [0. 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556 0.66666667 0.77777778 0.88888889 1. ] [1.e-03 1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03]
print(np.zeros((2,3))) # a matrix of zeros
print(np.ones((3,2))) # a matrix of ones
print(np.eye(3)) # a 3 x 3 identity matrix
[[0. 0. 0.] [0. 0. 0.]] [[1. 1.] [1. 1.] [1. 1.]] [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
Element-wise Operations¶
You can apply standard operators such as addition and multiplication on each element of the ndarray.
x = np.array([1,2,3,4,5])
print(x + 1) # addition
print(x - 1) # subtraction
print(x * 2) # multiplication
print(x // 2) # integer division
print(x ** 2) # square
print(x % 2) # modulo
print(1 / x) # division
[2 3 4 5 6] [0 1 2 3 4] [ 2 4 6 8 10] [0 1 1 2 2] [ 1 4 9 16 25] [1 0 1 0 1] [1. 0.5 0.33333333 0.25 0.2 ]
x = np.array([2,4,6,8,10])
y = np.array([1,2,3,4,5])
print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(x // y)
print(x ** y)
[ 3 6 9 12 15] [1 2 3 4 5] [ 2 8 18 32 50] [2. 2. 2. 2. 2.] [2 2 2 2 2] [ 2 16 216 4096 100000]
Indexing and Slicing¶
There are various ways to select certain elements with an ndarray.
x = np.arange(-5,5)
print(x)
y = x[3:5] # y is a slice, i.e., pointer to a subarray in x
print(y)
y[:] = 1000 # modifying the value of y will change x
print(y)
z = x[3:5].copy() # makes a copy of the subarray
print(z)
z[:] = 500 # modifying the value of z will not affect x
print(z)
print(x)
[-5 -4 -3 -2 -1 0 1 2 3 4] [-2 -1] [1000 1000] [1000 1000] [500 500] [ -5 -4 -3 1000 1000 0 1 2 3 4]
Remark: slicing a list makes a copy of the sublist, but slicing numpy array does not.¶
my2dlist = [[1,2,3,4],[5,6,7,8],[9,10,11,12]] # a 2-dim list
print(my2dlist)
print(my2dlist[2]) # access the third sublist
print(my2dlist[:][2]) # can't access third element of each sublist
# print(my2dlist[:,2]) # this will cause syntax error
my2darr = np.array(my2dlist)
print(my2darr)
print(my2darr[2][:]) # access the third row
print(my2darr[2,:]) # access the third row
print(my2darr[:][2]) # access the third row (similar to 2d list)
print(my2darr[:,2]) # access the third column
print(my2darr[:2,2:]) # access the first two rows & last two columns
sliced = my2darr[::2, 2:]
print(sliced)
print(type(sliced))
sliced[:, :] = 1000
print(my2darr)
sliced[0, 0] = 2000
print(my2darr)
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]] [9, 10, 11, 12] [9, 10, 11, 12] [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] [ 9 10 11 12] [ 9 10 11 12] [ 9 10 11 12] [ 3 7 11] [[3 4] [7 8]] [[ 3 4] [11 12]] <class 'numpy.ndarray'> [[ 1 2 1000 1000] [ 5 6 7 8] [ 9 10 1000 1000]] [[ 1 2 2000 1000] [ 5 6 7 8] [ 9 10 1000 1000]]
ndarray also supports boolean indexing (also called masking).¶
# slicing vs masking vs integer array indexing
x = np.array([1, 2, 3])
# slicing
print(x[1:])
print(x[1:][0])
# boolean masking
print(x[[True, False, True]])
# integer array indexing
print(x[[2, 1]])
print(x[[2, 1, 1, 1, 0]])
x[[2, 1, 1, 1, 0]] = 0
print(x)
[2 3] 2 [1 3] [3 2] [3 2 2 2 1] [0 0 0]
y = np.arange(35).reshape(5,7)
print(y[1:5:2,::3])
b = y > 20
print(b)
print(y[b]) # Filtering
[[ 7 10 13] [21 24 27]] [[False False False False False False False] [False False False False False False False] [False False False False False False False] [ True True True True True True True] [ True True True True True True True]] [21 22 23 24 25 26 27 28 29 30 31 32 33 34]
my2darr = np.arange(1,13,1).reshape(3,4)
print(my2darr)
print(my2darr % 3 == 0)
divBy3 = my2darr[my2darr % 3 == 0]
print(divBy3, type(divBy3))
print(my2darr[2,:] % 3 == 0)
divBy3LastRow = my2darr[1:, my2darr[2,:] % 3 == 0]
print(divBy3LastRow)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] [[False False True False] [False True False False] [ True False False True]] [ 3 6 9 12] <class 'numpy.ndarray'> [ True False False True] [[ 5 8] [ 9 12]]
M = np.arange(35).reshape(5, 7)
print(M)
M[M%2==0] = 0
print(M)
[[ 0 1 2 3 4 5 6] [ 7 8 9 10 11 12 13] [14 15 16 17 18 19 20] [21 22 23 24 25 26 27] [28 29 30 31 32 33 34]] [[ 0 1 0 3 0 5 0] [ 7 0 9 0 11 0 13] [ 0 15 0 17 0 19 0] [21 0 23 0 25 0 27] [ 0 29 0 31 0 33 0]]
More indexing examples.
my2darr = np.arange(1,13,1).reshape(4,3)
print(my2darr)
indices = [2,1,0,3] # selected row indices
print(my2darr[indices,:])
rowIndex = [0,0,1,2,3] # row index into my2darr
columnIndex = [0,2,0,1,2] # column index into my2darr
print(my2darr[rowIndex,columnIndex])
[[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] [[ 7 8 9] [ 4 5 6] [ 1 2 3] [10 11 12]] [ 1 3 4 8 12]
Numpy Arithmetic and Statistical Functions¶
There are many built-in mathematical functions available for manipulating elements of nd-array.
y = np.array([-1.4, 0.4, -3.2, 2.5, 3.4]) # generate a random vector
print(y)
print(np.abs(y)) # convert to absolute values
print(np.sqrt(np.abs(y))) # apply square root to each element
print(np.sign(y)) # get the sign of each element
print(np.exp(y)) # apply exponentiation
print(np.sort(y)) # sort array
print(y) # y does not change
[-1.4 0.4 -3.2 2.5 3.4] [1.4 0.4 3.2 2.5 3.4] [1.18321596 0.63245553 1.78885438 1.58113883 1.84390889] [-1. 1. -1. 1. 1.] [ 0.24659696 1.4918247 0.0407622 12.18249396 29.96410005] [-3.2 -1.4 0.4 2.5 3.4] [-1.4 0.4 -3.2 2.5 3.4]
x = np.arange(-2,3)
y = np.random.randn(5)
print(x)
print(y)
print(np.add(x,y)) # element-wise addition x + y
print(np.subtract(x,y)) # element-wise subtraction x - y
print(np.multiply(x,y)) # element-wise multiplication x * y
print(np.divide(x,y)) # element-wise division x / y
print(np.maximum(x,y)) # element-wise maximum max(x, y)
[-2 -1 0 1 2] [-0.6937334 -0.36121667 -1.73120616 -1.0393869 -0.44727627] [-2.6937334 -1.36121667 -1.73120616 -0.0393869 1.55272373] [-1.3062666 -0.63878333 1.73120616 2.0393869 2.44727627] [ 1.3874668 0.36121667 -0. -1.0393869 -0.89455254] [ 2.88295186 2.76842152 -0. -0.96210564 -4.47150928] [-0.6937334 -0.36121667 0. 1. 2. ]
y = np.array([-3.2, -1.4, 0.4, 2.5, 3.4]) # generate a random vector
print(y)
print("Min =", np.min(y)) # min
print("Max =", np.max(y)) # max
print("Average =", np.mean(y)) # mean/average
print("Std deviation =", np.std(y)) # standard deviation
print("Sum =", np.sum(y)) # sum
[-3.2 -1.4 0.4 2.5 3.4] Min = -3.2 Max = 3.4 Average = 0.34000000000000014 Std deviation = 2.432776191925595 Sum = 1.7000000000000006
More on filtering¶
M = np.arange(25).reshape(5,5)
print(M)
print(M[M%2==1]) # filtering in general
print(np.argwhere(M >= 20)) # indexes satisfying condition
print(np.where(M % 2 == 1, M, 0)) # M, 0 is broadcast
[[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19] [20 21 22 23 24]] [ 1 3 5 7 9 11 13 15 17 19 21 23] [[4 0] [4 1] [4 2] [4 3] [4 4]] [[ 0 1 0 3 0] [ 5 0 7 0 9] [ 0 11 0 13 0] [15 0 17 0 19] [ 0 21 0 23 0]]
New axis¶
- Used to increase the dimension of the existing array by one more dimension
- shape change, for example: $ n \times \text{(newaxis)} \times m $ → $ n \times 1 \times m $
t = np.array([1,2,3])
x = t[:, np.newaxis]
y = t[np.newaxis, :]
x + y
array([[2, 3, 4], [3, 4, 5], [4, 5, 6]])
t = np.array([[1,2,3],[4,5,6]])
t[:, :, np.newaxis]
t[np.newaxis, :, :]
t[:, np.newaxis, :]
# t[np.newaxis, :]
# t[:, np.newaxis, np.newaxis].shape
array([[[1, 2, 3]], [[4, 5, 6]]])
Stacking two arrays¶
A = np.array([[1,1,1],[2,2,2]])
B = np.array([[3,3,3],[4,4,4]])
print('A = \n', A)
print('B = \n', B)
print('Stacks arrays in sequence vertically (row wise)')
print(np.vstack((A,B)))
print('Stacks arrays in sequence horizontally (column wise)')
print(np.hstack((A,B)))
print('Stack the two arrays along axis 0')
print(np.stack((A,B), axis=0))
print('Stack the two arrays along axis 1')
print(np.stack((A,B), axis=1))
A = [[1 1 1] [2 2 2]] B = [[3 3 3] [4 4 4]] Stacks arrays in sequence vertically (row wise) [[1 1 1] [2 2 2] [3 3 3] [4 4 4]] Stacks arrays in sequence horizontally (column wise) [[1 1 1 3 3 3] [2 2 2 4 4 4]] Stack the two arrays along axis 0 [[[1 1 1] [2 2 2]] [[3 3 3] [4 4 4]]] Stack the two arrays along axis 1 [[[1 1 1] [3 3 3]] [[2 2 2] [4 4 4]]]
- First stack along axis, then read y-z plane
Broadcasting¶
Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
Examples:
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
np.array([[1,2],[3,4]]) + np.array([[10]])
array([[11, 12], [13, 14]])
np.array([[1,2],[3,4]]) + np.array([[10,100]])
array([[ 11, 102], [ 13, 104]])
A = np.array([[1,2]])
B = np.array([[10],[100]])
print(A.shape, B.shape)
C = A + B
C
(1, 2) (2, 1)
array([[ 11, 12], [101, 102]])
X = np.array([[1]]*3) + np.array([[0]*10])
X
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
a = np.array([[1],[2],[3]])
b = a.T
a + b
array([[2, 3, 4], [3, 4, 5], [4, 5, 6]])
Meshgrid¶
Make D x N mesh grid for vectorized evaluations
v = np.array([10,20,30]) # N
w = np.array([5,6]) # D
X, Y = np.meshgrid(v, w)
X + Y
array([[15, 25, 35], [16, 26, 36]])
Example: distance matrix¶
pts = np.array([[1,0], [1,1], [0,1]])
print(pts.shape)
u = pts[:, :, np.newaxis]
v = pts.T[np.newaxis, :, :]
(3, 2)
print(v.shape)
print(u.shape)
(1, 2, 3) (3, 2, 1)
np.sqrt(np.sum((u - v)**2, axis=1))
array([[0. , 1. , 1.41421356], [1. , 0. , 1. ], [1.41421356, 1. , 0. ]])
np.linalg.norm(u - v, axis=1)
array([[0. , 1. , 1.41421356], [1. , 0. , 1. ], [1.41421356, 1. , 0. ]])
Axis ordering¶
- By definition, the axis number of the dimension is the index of that dimension within the array's shape. It is also the position used to access that dimension during indexing.
- For example, if a 2D array a has shape (5,6), then you can access a[0,0] up to a[4,5]. Axis 0 is thus the first dimension (the "rows"), and axis 1 is the second dimension (the "columns"). In higher dimensions, where "row" and "column" stop really making sense, try to think of the axes in terms of the shapes and indices involved.
- If you do .sum(axis=n), for example, then dimension n is collapsed and deleted, with each value in the new matrix equal to the sum of the corresponding collapsed values. For example, if b has shape (5,6,7,8), and you do c = b.sum(axis=2), then axis 2 (dimension with size 7) is collapsed, and the result has shape (5,6,8). Furthermore, c[x,y,z] is equal to the sum of all elements b[x,y,:,z].
X = np.array([[0,0,0], [1,1,1]])
X.shape
# axis 0 is row; axis 1 is column
(2, 3)
X.sum(axis=0) # dimension 0 is collapsed and deleted; or aggregated over dimension 0
array([1, 1, 1])
X.sum(axis=1) # dimension 1 is collapsed and deleted; or aggregated over dimension 0
array([0, 3])
X = np.array(range(1,24+1)).reshape(2,3,4)
X.shape
(2, 3, 4)
X.sum(axis=0)
array([[14, 16, 18, 20], [22, 24, 26, 28], [30, 32, 34, 36]])
X.sum(axis=1)
array([[15, 18, 21, 24], [51, 54, 57, 60]])
X.sum(axis=2)
array([[10, 26, 42], [58, 74, 90]])
X.sum(axis=(1,2))
array([ 78, 222])
X.sum(axis=(0,1,2))
300
X = np.array(np.arange(2*3, 0, -1).reshape(2,3))
print(X)
print(np.sort(X))
print(np.sort(X, axis=-1))
print(np.sort(X, axis=0))
print(np.sort(X, axis=None))
[[6 5 4] [3 2 1]] [[4 5 6] [1 2 3]] [[4 5 6] [1 2 3]] [[3 2 1] [6 5 4]] [1 2 3 4 5 6]
X = np.array([4,10,1,20,45,100,2,1])
print(np.sort(X))
print(np.partition(X, 3))
print(np.argpartition(X, 3))
print(np.partition(X, -3))
print(np.argpartition(X, -3))
[ 1 1 2 4 10 20 45 100] [ 2 1 1 4 45 100 10 20] [6 7 2 0 4 5 1 3] [ 2 1 1 4 10 20 45 100] [6 7 2 0 1 3 4 5]
Vectorized function¶
similar to map function
u = np.array([100,2,3,4])
v = np.array([1,2,3,4])
w = np.array([4,3,2,1])
np.vectorize(max)(u, v, w)
array([100, 3, 3, 4])
dist = np.vectorize(lambda x, y: np.sqrt(x**2 + y**2))
dist(v, w)
array([4.12310563, 3.60555128, 3.60555128, 4.12310563])
Numpy linear algebra¶
Numpy provides many functions to support linear algebra operations.
X = np.random.randn(2,3) # create a 2 x 3 random matrix
print(X)
print(X.T) # matrix transpose operation X^T
y = np.random.randn(3) # random vector
print(y)
print(X.dot(y)) # matrix-vector multiplication X * y
print(X.dot(X.T)) # matrix-matrix multiplication X * X^T
print(X.T.dot(X)) # matrix-matrix multiplication X^T * X
[[-0.05665379 -1.32903354 2.8708451 ] [-0.20927368 0.84143082 0.15460587]] [[-0.05665379 -0.20927368] [-1.32903354 0.84143082] [ 2.8708451 0.15460587]] [0.12253041 0.35717184 1.31722386] [3.29991049 0.47854354] [[10.0112914 -0.66258415] [-0.66258415 0.77570427]] [[ 0.04700512 -0.10079453 -0.1949992 ] [-0.10079453 2.47433599 -3.6853593 ] [-0.1949992 -3.6853593 8.26565457]]
X = np.random.randn(5,3)
print(X)
C = X.T.dot(X) # C = X^T * X is a square matrix
invC = np.linalg.inv(C) # inverse of a square matrix
print(invC)
detC = np.linalg.det(C) # determinant of a square matrix
print(detC)
S, U = np.linalg.eig(C) # eigenvalue S and eigenvector U of a square matrix
print(S)
print(U)
[[-0.7255551 -0.31396318 -0.34947148] [ 1.29196297 -2.72691972 -1.18236585] [ 0.01286274 0.67958609 0.1102564 ] [ 0.03250624 0.06626556 0.2192918 ] [ 0.71476089 0.64241649 0.6680848 ]] [[ 1.27372998 1.55092348 -2.45296975] [ 1.55092348 2.80574251 -4.73050544] [-2.45296975 -4.73050544 8.53202824]] 1.7346038816986777 [11.21040131 1.854144 0.0834518 ] [[ 0.3227218 -0.90971356 -0.26128887] [-0.86261029 -0.16907082 -0.47677934] [-0.38955631 -0.37925756 0.83929112]]
The Frobenius norm¶
X = np.array([1,2])
print(np.linalg.norm(X)) # 2-norm
print(np.linalg.norm(X, ord=1)) # 1-norm
print(np.linalg.norm(X, ord=np.inf)) # inf-norm
print(np.linalg.norm(X, ord=-np.inf)) # -inf-norm
2.23606797749979 3.0 2.0 1.0
x = np.array([1,0])
y = np.array([0,1])
print("cosine =", x.dot(y) / (math.sqrt((x.dot(x))) * math.sqrt((y.dot(y)))))
print("cosine =", x.dot(y) / (np.linalg.norm(x) * np.linalg.norm(y)))
cosine = 0.0 cosine = 0.0
'Coding > Python' 카테고리의 다른 글
Pandas Tutorial (0) | 2024.11.03 |
---|---|
Visualizing Data (0) | 2024.11.03 |
Crash Cource in Python (0) | 2024.11.03 |
FastAPI를 이용한 웹캠 스트리밍 서버 (0) | 2024.10.29 |
Numpy in Python (0) | 2024.09.10 |
소중한 공감 감사합니다