Python - 파이썬 (0411) 6주차

imports

import numpy as np

numpy공부 3단계: 차원

2차원 배열과 연립 1차 방정식

- 아래의 연립방정식 고려

\(\begin{cases} y+z+w = 3 \\ x+z+w = 3 \\ x+y+w = 3 \\ x+y+z = 3 \end{cases}\)

- 행렬표현?

\(\begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \\ 3 \\ 3 \end{bmatrix}\)

- 풀이

A = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
A

array([[0, 1, 1, 1],
       [1, 0, 1, 1],
       [1, 1, 0, 1],
       [1, 1, 1, 0]])

b= np.array([3,3,3,3]).reshape(4,1)
b

array([[3],
       [3],
       [3],
       [3]])

np.linalg.inv(A) @ b

array([[1.],
       [1.],
       [1.],
       [1.]])

- 다른풀이

b를 아래와 같이 만들어도 된다.

b=np.array([3,3,3,3])
b

array([3, 3, 3, 3])

b.shape # b.shape은 길이가 1인 튜플로 나온다.

(4,)

np.linalg.inv(A) @ b

array([1., 1., 1., 1.])

`@`의 유연성

- 엄밀하게는 아래의 행렬곱이 가능하다. - (2,2) @ (2,1) => (2,1) - (1,2) @ (2,2) => (1,2)

A = np.array([1,2,3,4]).reshape(2,2) 
b = np.array([1,2]).reshape(2,1) 
A@b

array([[ 5],
       [11]])

A.shape, b.shape, (A@b).shape

((2, 2), (2, 1), (2, 1))

A = np.array([1,2,3,4]).reshape(2,2) 
b = np.array([1,2]).reshape(1,2) 
b@A

array([[ 7, 10]])

A.shape, b.shape, (b@A).shape

((2, 2), (1, 2), (1, 2))

- 당연히 아래는 성립안한다.

A = np.array([1,2,3,4]).reshape(2,2) 
b = np.array([1,2]).reshape(2,1) 
b@A

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

A = np.array([1,2,3,4]).reshape(2,2) 
b = np.array([1,2]).reshape(1,2) 
A@b

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)

- 아래는 어떨까? 계산가능할까? \(\to\) 모두 계산가능! - (2,) @ (2,2) = (2,) - (2,2) @ (2,) = (2,)

A = np.array([1,2,3,4]).reshape(2,2)
b = np.array([1,2]) 
A@b

array([ 5, 11])

A.shape, b.shape, (A@b).shape

((2, 2), (2,), (2,))

b를 마치 (2,1)처럼 해석하여 행렬곱하고 결과는 다시 (2,) 로 만든것 같다.

b@A

array([ 7, 10])

A.shape, b.shape, (b@A).shape

((2, 2), (2,), (2,))

이때는 \(b\)를 마치 (1,2)처럼 해석하여 행렬곱하고 결과는 다시 (2,)로 만든것 같다.

- 아래는 어떠할까?

b1 = np.array([1,2,3,4]) 
b2 = np.array([1,2,3,4]) 
b1@b2

b1.shape, b2.shape, (b1@b2).shape

((4,), (4,), ())

(1,4) @ (4,1) = (1,1) 로 생각

- 즉 위는 아래와 같이 해석하고 행렬곱한것과 결과가 같다.

b1 = np.array([1,2,3,4]).reshape(1,4) 
b2 = np.array([1,2,3,4]).reshape(4,1) 
b1@b2

array([[30]])

b1.shape, b2.shape, (b1@b2).shape

((1, 4), (4, 1), (1, 1))

- 때로는 (4,1) @ (1,4)와 같은 계산결과를 얻고 싶을 수 있는데 이때는 차원을 명시해야함

b1 = np.array([1,2,3,4]).reshape(4,1) 
b2 = np.array([1,2,3,4]).reshape(1,4) 
b1@b2

array([[ 1,  2,  3,  4],
       [ 2,  4,  6,  8],
       [ 3,  6,  9, 12],
       [ 4,  8, 12, 16]])

차원

- 넘파이배열의 차원은 .shape 으로 확인가능

- 아래는 모두 미묘하게 다르다.

a=np.array(3.14) # 스칼라, 0d array 
a, a.shape

(array(3.14), ())

a=np.array([3.14]) # 벡터, 1d array 
a, a.shape

(array([3.14]), (1,))

a=np.array([[3.14]]) # 매트릭스, 2d array 
a, a.shape

(array([[3.14]]), (1, 1))

a=np.array([[[3.14]]]) # 텐서, 3d array 
a, a.shape

(array([[[3.14]]]), (1, 1, 1))

numpy공부 4단계: 축

np.concatenate

- 기본예제

a=np.array([1,2]) 
b=-a

np.concatenate([a,b])

array([ 1,  2, -1, -2])

- 응용

a=np.array([1,2])
b=-a 
c=np.array([3,4,5])

np.concatenate([a,b,c])

array([ 1,  2, -1, -2,  3,  4,  5])

여기까진 딱히 칸캐터네이트의 메리트가 없어보임
리스트였다면 a+b+c 하면 되는 기능이니까?

- 2d array에 적용해보자.

a=np.arange(4).reshape(2,2) 
b=-a

np.concatenate([a,b])

array([[ 0,  1],
       [ 2,  3],
       [ 0, -1],
       [-2, -3]])

- 옆으로 붙일려면?

np.concatenate([a,b],axis=1)

array([[ 0,  1,  0, -1],
       [ 2,  3, -2, -3]])

- 위의 코드에서 axis=1 이 뭐지? axis=0,2 등을 치면 결과가 어떻게 될까?

np.concatenate([a,b],axis=0)

array([[ 0,  1],
       [ 2,  3],
       [ 0, -1],
       [-2, -3]])

이건 그냥 np.concatenate([a,b])와 같다.
np.concatenate([a,b])는 np.concatenate([a,b],axis=0)의 생략버전이군?

np.concatenate([a,b],axis=2)

AxisError: axis 2 is out of bounds for array of dimension 2

이런건 없다.

- axis의 의미가 뭔지 궁금함. 좀 더 예제를 살펴보자.

a=np.array(range(2*3*4)).reshape(2,3,4)
a

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

b=-a
b

array([[[  0,  -1,  -2,  -3],
        [ -4,  -5,  -6,  -7],
        [ -8,  -9, -10, -11]],

       [[-12, -13, -14, -15],
        [-16, -17, -18, -19],
        [-20, -21, -22, -23]]])

np.concatenate([a,b],axis=0)

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [  8,   9,  10,  11]],

       [[ 12,  13,  14,  15],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23]],

       [[  0,  -1,  -2,  -3],
        [ -4,  -5,  -6,  -7],
        [ -8,  -9, -10, -11]],

       [[-12, -13, -14, -15],
        [-16, -17, -18, -19],
        [-20, -21, -22, -23]]])

np.concatenate([a,b],axis=1)

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [  8,   9,  10,  11],
        [  0,  -1,  -2,  -3],
        [ -4,  -5,  -6,  -7],
        [ -8,  -9, -10, -11]],

       [[ 12,  13,  14,  15],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23],
        [-12, -13, -14, -15],
        [-16, -17, -18, -19],
        [-20, -21, -22, -23]]])

np.concatenate([a,b],axis=2)

array([[[  0,   1,   2,   3,   0,  -1,  -2,  -3],
        [  4,   5,   6,   7,  -4,  -5,  -6,  -7],
        [  8,   9,  10,  11,  -8,  -9, -10, -11]],

       [[ 12,  13,  14,  15, -12, -13, -14, -15],
        [ 16,  17,  18,  19, -16, -17, -18, -19],
        [ 20,  21,  22,  23, -20, -21, -22, -23]]])

이번에는 axis=2까지 된다?

np.concatenate([a,b],axis=3)

AxisError: axis 3 is out of bounds for array of dimension 3

axis=3까지는 안된다?

- 뭔가 나름의 방식으로 합쳐지는데 원리가 뭘까?

(분석1) np.concatenate([a,b],axis=0)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

a.shape, b.shape, np.concatenate([a,b],axis=0).shape

((2, 3, 4), (2, 3, 4), (4, 3, 4))

첫번째차원이 바뀌었다 => 첫번째 축이 바뀌었다 => axis=0 (파이썬은 0부터 시작하니까!)

(분석2) np.concatenate([a,b],axis=1)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

a.shape, b.shape, np.concatenate([a,b],axis=1).shape

((2, 3, 4), (2, 3, 4), (2, 6, 4))

두번째차원이 바뀌었다 => 두번째 축이 바뀌었다 => axis=1

(분석3) np.concatenate([a,b],axis=2)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

a.shape, b.shape, np.concatenate([a,b],axis=2).shape

((2, 3, 4), (2, 3, 4), (2, 3, 8))

세번째차원이 바뀌었다 => 세번째 축이 바뀌었다 => axis=2

(분석4) np.concatenate([a,b],axis=3)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

a.shape, b.shape, np.concatenate([a,b],axis=3).shape

AxisError: axis 3 is out of bounds for array of dimension 3

네번째차원이 없다 => 네번째 축이 없다 => axis=3으로 하면 에러가 난다.

(보너스1)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

np.concatenate([a,b],axis=-1)

array([[[  0,   1,   2,   3,   0,  -1,  -2,  -3],
        [  4,   5,   6,   7,  -4,  -5,  -6,  -7],
        [  8,   9,  10,  11,  -8,  -9, -10, -11]],

       [[ 12,  13,  14,  15, -12, -13, -14, -15],
        [ 16,  17,  18,  19, -16, -17, -18, -19],
        [ 20,  21,  22,  23, -20, -21, -22, -23]]])

a.shape, b.shape, np.concatenate([a,b],axis=-1).shape

((2, 3, 4), (2, 3, 4), (2, 3, 8))

마지막 차원이 바뀌었다 => 마지막 축이 바뀌었다 => axis = -1

(보너스2)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

np.concatenate([a,b],axis=-2)

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [  8,   9,  10,  11],
        [  0,  -1,  -2,  -3],
        [ -4,  -5,  -6,  -7],
        [ -8,  -9, -10, -11]],

       [[ 12,  13,  14,  15],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23],
        [-12, -13, -14, -15],
        [-16, -17, -18, -19],
        [-20, -21, -22, -23]]])

a.shape, b.shape, np.concatenate([a,b],axis=-2).shape

((2, 3, 4), (2, 3, 4), (2, 6, 4))

마지막에서 2번째 차원이 바뀌었다 => 마지막에서 2번째 축이 바뀌었다 => axis = -2

(보너스3)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

np.concatenate([a,b],axis=-3)

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [  8,   9,  10,  11]],

       [[ 12,  13,  14,  15],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23]],

       [[  0,  -1,  -2,  -3],
        [ -4,  -5,  -6,  -7],
        [ -8,  -9, -10, -11]],

       [[-12, -13, -14, -15],
        [-16, -17, -18, -19],
        [-20, -21, -22, -23]]])

a.shape, b.shape, np.concatenate([a,b],axis=-3).shape

((2, 3, 4), (2, 3, 4), (4, 3, 4))

마지막에서 3번째 차원이 바뀌었다 => 마지막에서 3번째 축이 바뀌었다 => axis = -3

(보너스3)

a=np.array(range(2*3*4)).reshape(2,3,4) 
b=-a

np.concatenate([a,b],axis=-4)

AxisError: axis -4 is out of bounds for array of dimension 3

마지막에서 4번째 차원은 없다 => 마지막에서 4번째 축이 없다 => axis = -4는 에러가 난다.

- 0차원은 축이 없으므로 concatenate를 쓸 수 없다.

a= np.array(1)
b= np.array(-1)

a.shape, b.shape

((), ())

np.concatenate([a,b])

ValueError: zero-dimensional arrays cannot be concatenated

- 꼭 a,b가 같은 차원일 필요는 없다.

a=np.array(range(4)).reshape(2,2) 
b=np.array(range(2)).reshape(2,1)

np.concatenate([a,b],axis=1)

array([[0, 1, 0],
       [2, 3, 1]])

a.shape, b.shape, np.concatenate([a,b],axis=1).shape

((2, 2), (2, 1), (2, 3))

np.stack

- 혹시 아래가 가능할까?

(3,) 결합 (3,) => (3,2)

a=np.array([1,2,3])
b=-a

a,b

(array([1, 2, 3]), array([-1, -2, -3]))

np.concatenate([a,b],axis=1)

AxisError: axis 1 is out of bounds for array of dimension 1

불가능

- 아래와 같이 하면 해결가능

a=np.array([1,2,3]).reshape(3,1) 
b=-a

a,b

(array([[1],
        [2],
        [3]]),
 array([[-1],
        [-2],
        [-3]]))

np.concatenate([a,b],axis=1)

array([[ 1, -1],
       [ 2, -2],
       [ 3, -3]])

분석: (3) (3) => (3,1) (3,1) => (3,1) concat (3,1)

- 위의 과정을 줄여서 아래와 같이 할 수 있다.

a=np.array([1,2,3])
b=-a

np.stack([a,b],axis=1)

array([[ 1, -1],
       [ 2, -2],
       [ 3, -3]])

- 아래도 가능

np.stack([a,b],axis=0)

array([[ 1,  2,  3],
       [-1, -2, -3]])

- 분석해보고 외우자

(분석1)

a=np.array([1,2,3])
b=-a

a.shape, b.shape, np.stack([a,b],axis=0).shape

((3,), (3,), (2, 3))

1. 1. => 첫 위치에 축을 추가 (axis=0) => (1,3) (1,3) => (2,3)

(분석2)

a=np.array([1,2,3])
b=-a

a.shape, b.shape, np.stack([a,b],axis=1).shape

((3,), (3,), (3, 2))

1. 1. => 두 위치에 축을 추가 (axis=1) => (3,1) (3,1) => (3,2)

- 고차원예제

a=np.arange(3*4*5).reshape(3,4,5) 
b=-a

a.shape, b.shape

((3, 4, 5), (3, 4, 5))

np.stack([a,b],axis=0).shape # (3,4,5) => (1,3,4,5) // 첫 위치에 축이 추가되고 스택

(2, 3, 4, 5)

np.stack([a,b],axis=1).shape # (3,4,5) => (3,1,4,5) // 두번째 위치에 축이 추가되고 스택

(3, 2, 4, 5)

np.stack([a,b],axis=2).shape # (3,4,5) => (3,4,1,5) // 세번째 위치에 축이 추가되고 스택

(3, 4, 2, 5)

np.stack([a,b],axis=3).shape # (3,4,5) => (3,4,5,1) // 네번째 위치에 축이 추가되고 스택

(3, 4, 5, 2)

np.stack([a,b],axis=-1).shape # axis=-1 <=> axis=3

(3, 4, 5, 2)

np.stack([a,b],axis=-2).shape # axis=-2 <=> axis=2

(3, 4, 2, 5)

np.concatenate 는 축의 총 갯수를 유지하면서 결합, np.stack은 축의 갯수를 하나 증가시키면서 결합

sum

- 1차원

a = np.array([1,2,3]) 
a

array([1, 2, 3])

a.sum()

a.sum(axis=0)

- 2차원

a=np.array(range(6)).reshape(2,3)
a

array([[0, 1, 2],
       [3, 4, 5]])

a.sum() # 전체합

a.sum(axis=0)

array([3, 5, 7])

a.sum(axis=1)

array([ 3, 12])

- 2차원 결과 분석

a.shape, a.sum(axis=0).shape

((2, 3), (3,))

첫번째 축이 삭제됨 => axis=0

a.shape, a.sum(axis=1).shape

((2, 3), (2,))

두번째 축이 삭제됨 => axis=1

- 연습

a=np.array(range(10)).reshape(5,2) 
a

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

(문제1) 1열의 합, 2열의 합을 계산하고 싶다면?

(풀이) 차원이 (5,2) => (2,) 로 나와야 한다. (그럼 첫번째 축이 삭제되어야 하네?)

a.sum(axis=0)

array([20, 25])

(문제2) 1행의 합, 2행의 합, … , 5행의 합을 계산하고 싶다면?

(풀이) 차원이 (5,2) => (5,)로 나와야 한다. (그럼 두번째 축이 삭제되어야 하네?)

a.sum(axis=1)

array([ 1,  5,  9, 13, 17])

(문제3) a의 모든원소의 합을 계산하고 싶다면?

(풀이) 차원이 (5,2) => () 로 나와야 한다. (첫번째축, 두번째축이 모두 삭제되어야 하네?)

a.sum(axis=(0,1))

a.sum() # 즉 a.sum(axis=(0,1))이 디폴트값임

mean, std, max, min, prod

- 모두 sum이랑 유사한 논리

a=np.array(range(10)).reshape(5,2)
a

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

a.mean(axis=0), a.std(axis=0), a.max(axis=0), a.min(axis=0), a.prod(axis=0)

(array([4., 5.]),
 array([2.82842712, 2.82842712]),
 array([8, 9]),
 array([0, 1]),
 array([  0, 945]))

a.mean(axis=1), a.std(axis=1), a.max(axis=1), a.min(axis=1), a.prod(axis=1)

(array([0.5, 2.5, 4.5, 6.5, 8.5]),
 array([0.5, 0.5, 0.5, 0.5, 0.5]),
 array([1, 3, 5, 7, 9]),
 array([0, 2, 4, 6, 8]),
 array([ 0,  6, 20, 42, 72]))

- 참고로 std는 분포를 n으로 나눈다.

a=np.array([1,2,3,4])
a.std()

1.118033988749895

np.sqrt(sum((a-a.mean())**2)/4)

1.118033988749895

- 분모를 n-1로 나눌려면?

a=np.array([1,2,3,4])
a.std(ddof=1)

1.2909944487358056

np.sqrt(sum((a-a.mean())**2)/3)

1.2909944487358056

argmax, argmin

- 1차원

a= np.array([1,-2,3,10,4])
a

array([ 1, -2,  3, 10,  4])

a.argmax() # 가장 큰 값이 위치한 원소의 인덱스를 리턴

a.argmin() # 가장 작은 값이 위치한 원소의 인덱스를 리턴

- 2차원

np.random.seed(43052)
a=np.random.randn(4*5).reshape(4,5)
a

array([[ 0.38342049,  1.0841745 ,  1.14277825,  0.30789368,  0.23778744],
       [ 0.35595116, -1.66307542, -1.38277318, -1.92684484, -1.4862163 ],
       [ 0.00692519, -0.03488725, -0.34357323,  0.70895648, -1.55100608],
       [ 1.34565583, -0.05654272, -0.83017342, -1.46395159, -0.35459593]])

a.argmin(), a.min()

(8, -1.9268448358915802)

a.argmax(), a.max()

(15, 1.3456558341738827)

a.argmin(axis=0), a.argmin(axis=1)

(array([2, 1, 1, 1, 2]), array([4, 3, 4, 3]))

a.argmax(axis=0), a.argmax(axis=1)

(array([3, 0, 0, 2, 0]), array([2, 0, 3, 0]))

cumsum, cumprod

- 1차원

a=np.array([1,2,3,4])
a

array([1, 2, 3, 4])

a.cumsum()

array([ 1,  3,  6, 10])

a.cumprod()

array([ 1,  2,  6, 24])

- 2차원

a=np.array(range(3*4)).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

a.cumsum(axis=0), a.cumsum(axis=1)

(array([[ 0,  1,  2,  3],
        [ 4,  6,  8, 10],
        [12, 15, 18, 21]]),
 array([[ 0,  1,  3,  6],
        [ 4,  9, 15, 22],
        [ 8, 17, 27, 38]]))

a.cumprod(axis=0), a.cumprod(axis=1)

(array([[  0,   1,   2,   3],
        [  0,   5,  12,  21],
        [  0,  45, 120, 231]]),
 array([[   0,    0,    0,    0],
        [   4,   20,  120,  840],
        [   8,   72,  720, 7920]]))

diff

- 1차차분

a=np.array([1,2,4,6,7])
a

array([1, 2, 4, 6, 7])

np.diff(a)

array([1, 2, 2, 1])

- 2차차분

np.diff(np.diff(a))

array([ 1,  0, -1])

- prepend, append

a=np.array([1,2,4,6,7])
a

array([1, 2, 4, 6, 7])

np.diff(a,prepend=100)
#np.diff(np.array([100]+a.tolist()) )

array([-99,   1,   2,   2,   1])

[1,2,4,6,7] -> [100,1,2,3,4,6] -> np.diff

np.diff(a,append=100)
#np.diff(np.array(a.tolist()+[100]) )

array([ 1,  2,  2,  1, 93])

(예제) a=[1,2,4,6,7]의 앞에 1을 추가하여 차분하라.

np.diff(a,prepend=a[0])
#np.diff(a,prepend=1)

array([0, 1, 2, 2, 1])

(예제) a=[1,2,4,6,7]의 뒤에 7을 추가하여 차분하라.

np.diff(a,append=a[-1])
#np.diff(a,append=7)

array([1, 2, 2, 1, 0])

- 2차원 array의 차분

a=np.arange(24).reshape(4,6)
a

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

np.diff(a,axis=0)

array([[6, 6, 6, 6, 6, 6],
       [6, 6, 6, 6, 6, 6],
       [6, 6, 6, 6, 6, 6]])

np.diff(a,axis=1)

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

(숙제)

a=np.arange(24).reshape(4,6)
a

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

에서 axis=1 옵션으로 np.diff를 적용하여 (4,5) array를 만들고 왼쪽열에 1이 포함된 column을 추가하여 최종 결과가 아래와 같이 되도록 하라.

array([[1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1]])

imports

numpy공부 3단계: 차원

2차원 배열과 연립 1차 방정식

@의 유연성

차원

numpy공부 4단계: 축

np.concatenate

np.stack

sum

mean, std, max, min, prod

argmax, argmin

cumsum, cumprod

diff

`@`의 유연성