# conda install -c conda-forge python-graphviz -y
해당 강의노트는 전북대학교 최규빈교수님 STBDA2022 자료임
imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
2023-06-20 13:43:31.300590: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
import tensorflow.experimental.numpy as tnp
tnp.experimental_enable_numpy_behavior()
최적화의 문제
-
\(loss=(\frac{1}{2}\beta-1)^2\)
-
기존에 했던 방법은 수식을 알고 있어야 한다는 단점이 있음
tf.keras.optimizers를 이용한 최적화방법
방법1: opt.apply_gradients()를 이용
= 0.01/6 alpha
= tf.Variable(-10.0) beta
= tf.keras.optimizers.SGD(alpha) opt
tf.keras.optimizers
=tp.optimizers
다 똑같은것.
-
iter1
with tf.GradientTape() as tape:
tape.watch(beta)=(beta/2-1)**2
loss= tape.gradient(loss,beta) slope
# beta.assign_sub(slope * alpha)
opt.apply_gradients([(slope,beta)]) beta
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-9.99>
-
iter2
with tf.GradientTape() as tape:
tape.watch(beta)=(beta/2-1)**2
loss= tape.gradient(loss,beta)
slope # beta.assign_sub(slope * alpha)
opt.apply_gradients([(slope,beta)]) beta
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-9.980008>
-
for문으로 정리
= 0.01/6
alpha= tf.Variable(-10.0)
beta= tf.keras.optimizers.SGD(alpha) opt
for epoc in range(10000):
with tf.GradientTape() as tape:
tape.watch(beta)=(beta/2-1)**2
loss= tape.gradient(loss,beta)
slope # beta.assign_sub(slope * alpha)
opt.apply_gradients([(slope,beta)]) beta
beta
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=1.9971251>
- opt.apply_gradients()의 입력은 pair 의 list
방법2: opt.minimize()
= 0.01/6
alpha= tf.Variable(-10.0)
beta= tf.keras.optimizers.SGD(alpha) opt
= lambda: (beta/2-1)**2 loss_fn
lambda x: x**2
<=>lambda(x)=x^2
lambda x,y: x+y
<=>lambda(x,y)=x+y
lambda: y
<=>lambda()=y
, 입력이 없으며 출력은 항상 y인 함수
# 입력은 없고 출력은 뭔가 계산되는 함수 loss_fn()
<tf.Tensor: shape=(), dtype=float32, numpy=36.0>
-
iter 1
- 오류난당..
opt.minimize?
Signature: opt.minimize(loss, var_list, tape=None) Docstring: Minimize `loss` by updating `var_list`. This method simply computes gradient using `tf.GradientTape` and calls `apply_gradients()`. If you want to process the gradient before applying then call `tf.GradientTape` and `apply_gradients()` explicitly instead of using this function. Args: loss: `Tensor` or callable. If a callable, `loss` should take no arguments and return the value to minimize. var_list: list or tuple of `Variable` objects to update to minimize `loss`, or a callable returning the list or tuple of `Variable` objects. Use callable when the variable list would otherwise be incomplete before `minimize` since the variables are created at the first time `loss` is called. tape: (Optional) `tf.GradientTape`. Returns: None File: ~/anaconda3/envs/py38/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py Type: method
opt.minimize(loss_fn, beta)
TypeError: Cannot iterate over a scalar tensor.
beta
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-10.0>
-
iter2
opt.minimize(loss_fn, beta) beta
TypeError: Cannot iterate over a scalar tensor.
-
for문으로 정리하면
= 0.01/6
alpha= tf.Variable(-10.0)
beta= tf.keras.optimizers.SGD(alpha)
opt = lambda: (beta/2-1)**2
loss_fn for epoc in range(10000):
opt.minimize(loss_fn, beta) beta
TypeError: Cannot iterate over a scalar tensor.
회귀분석 문제
-
\({\bf y} \approx 2.5 + 4.0 {\bf x}\)
43052)
tnp.random.seed(= 200
N = tnp.linspace(0,1,N)
x = tnp.random.randn(N)*0.5
epsilon = 2.5+4*x + epsilon
y = 2.5+4*x y_true
'.')
plt.plot(x,y,'r--') plt.plot(x,y_true,
이론적 풀이
풀이1: 스칼라버전
-
포인트 - \(S_{xx}=\), \(S_{xy}=\) - \(\hat{\beta}_0=\), \(\hat{\beta}_1=\)
-
풀이
= sum((x-x.mean())**2)
Sxx = sum((x-x.mean())*(y-y.mean())) Sxy
= Sxy/Sxx
beta1_hat beta1_hat
<tf.Tensor: shape=(), dtype=float64, numpy=3.9330345167331697>
= y.mean() - x.mean()*beta1_hat
beta0_hat beta0_hat
<tf.Tensor: shape=(), dtype=float64, numpy=2.583667211565867>
풀이2: 벡터버전
-
포인트 - \(\hat{\beta}=(X'X)^{-1}X'y\)
-
풀이
=y.reshape(N,1)
y=tf.stack([tf.ones(N,dtype=tf.float64),x],axis=1)
X y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
@ X ) @ X.T @ y tf.linalg.inv(X.T
<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[2.58366721],
[3.93303452]])>
풀이3: 벡터버전, 손실함수의 도함수이용
-
포인트 - \(loss'(\beta)=-2X'y +2X'X\beta\) - \(\beta_{new} = \beta_{old} - \alpha \times loss'(\beta_{old})\)
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tnp.array([-5,10]).reshape(2,1)
beta_hat beta_hat
<tf.Tensor: shape=(2, 1), dtype=int64, numpy=
array([[-5],
[10]])>
= (-2*X.T @ y + 2*X.T @ X @ beta_hat) / N
slope slope
<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[-9.10036894],
[-3.52886113]])>
= 0.1 alpha
= slope*alpha
step step
<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[-0.91003689],
[-0.35288611]])>
for epoc in range(1000):
= (-2*X.T @ y + 2*X.T @ X @ beta_hat)/N
slope = beta_hat - alpha* slope beta_hat
beta_hat
<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[2.58366061],
[3.93304684]])>
GradientTape를 이용
풀이1: 벡터버전
-
포인트
## 포인트코드1: 그레디언트 테입
with tf.GradientTape() as tape:
=
loss ## 포인트코드2: 미분
= tape.gradient(loss,beta_hat)
slope ## 포인트코드3: update
*alph) beta_hat.assign_sub(slope
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
=0.1 alpha
for epoc in range(1000):
with tf.GradientTape() as tape:
= X@beta_hat
yhat= (y-yhat).T @ (y-yhat) / N
loss= tape.gradient(loss,beta_hat)
slope *slope) beta_hat.assign_sub(alpha
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[2.58366061],
[3.93304684]])>
풀이2: 스칼라버전
-
포인트
## 포인트코드: 미분
= tape.gradient(loss,[beta0_hat,beta1_hat]) slope0,slope1
-
풀이
=y.reshape(-1)
y y.shape,x.shape
(TensorShape([200]), TensorShape([200]))
= tf.Variable(-5.0)
beta0_hat = tf.Variable(10.0) beta1_hat
=0.1 alpha
for epoc in range(1000):
with tf.GradientTape() as tape:
= beta0_hat + x*beta1_hat
yhat= tf.reduce_sum((y-yhat)**2)/N #loss= sum((y-yhat)**2)/N (이거로하면 좀 느림)
loss= tape.gradient(loss,[beta0_hat,beta1_hat])
slope0,slope1 *slope0)
beta0_hat.assign_sub(alpha*slope1) beta1_hat.assign_sub(alpha
beta0_hat,beta1_hat
(<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.58366>,
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=3.933048>)
GradientTape + opt.apply_gradients
풀이1: 벡터버전
-
포인트
## 포인트코드: 업데이트
## pair의 list가 입력 opt.apply_gradients([(slope,beta_hat)])
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
with tf.GradientTape() as tape:
= X@beta_hat
yhat= (y-yhat).T @ (y-yhat) / N
loss= tape.gradient(loss,beta_hat)
slope
opt.apply_gradients([(slope,beta_hat)])#beta_hat.assign_sub(alpha*slope)
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[2.58366061],
[3.93304684]])>
풀이2: 스칼라버전
-
포인트
## 포인트코드: 업데이트
## pair의 list가 입력 opt.apply_gradients([(slope0,beta0_hat),(slope1,beta1_hat)])
-
풀이
=y.reshape(-1)
y y.shape,x.shape
(TensorShape([200]), TensorShape([200]))
= tf.Variable(-5.0)
beta0_hat = tf.Variable(10.0) beta1_hat
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
with tf.GradientTape() as tape:
= beta0_hat + beta1_hat*x #X@beta_hat
yhat= tf.reduce_sum((y-yhat)**2) / N
loss= tape.gradient(loss,[beta0_hat,beta1_hat])
slope0,slope1 opt.apply_gradients([(slope0,beta0_hat),(slope1,beta1_hat)])
beta0_hat,beta1_hat
(<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.58366>,
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=3.933048>)
opt.minimize
풀이1: 벡터버전, 사용자정의 손실함수 with lambda
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
= lambda: (y-X@beta_hat).T @ (y-X@beta_hat) / N loss_fn
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,beta_hat)
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
풀이2: 스칼라버전, 사용자정의 손실함수 with lambda
-
포인트
## 포인트코드: 미분 & 업데이트 = minimize
opt.minimize(loss_fn,[beta0_hat,beta1_hat])
-
풀이
=y.reshape(-1)
y y.shape,x.shape
(TensorShape([200]), TensorShape([200]))
= tf.Variable(-5.0)
beta0_hat = tf.Variable(10.0) beta1_hat
= lambda: tf.reduce_sum((y-beta0_hat-beta1_hat*x )**2) / N loss_fn
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,[beta0_hat,beta1_hat])
beta0_hat,beta1_hat
(<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.58366>,
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=3.933048>)
풀이3: 벡터버전, 사용자정의 (짧은) 손실함수
-
포인트
## 포인트코드: 손실함수정의
def loss_fn():
return ??
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
def loss_fn():
return (y-X@beta_hat).T @ (y-X@beta_hat) / N
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,beta_hat)
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
풀이4: 벡터버전, 사용자정의 (긴) 손실함수
-
포인트
## 포인트코드: 손실함수정의
def loss_fn():
??
??return ??
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
def loss_fn():
= X@beta_hat # 컴퓨터한테 전달할 수식1
yhat= (y-yhat).T @ (y-yhat) / N # 컴퓨터한테 전달할 수식 2
loss return loss # tape.gradient(loss,beta_hat) 에서의 미분당하는애
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,beta_hat)
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
풀이5: 벡터버전, 사용자정의 손실함수 <- tf.losses.MSE
-
포인트
## 포인트코드: 미리구현되어있는 손실함수 이용
tf.losses.MSE(y,yhat)
-
풀이
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
def loss_fn():
= X@beta_hat # 컴퓨터한테 전달할 수식1
yhat= tf.keras.losses.MSE(y.reshape(-1),yhat.reshape(-1)) # 컴퓨터한테 전달할 수식 2
loss return loss # tape.gradient(loss,beta_hat) 에서의 미분당하는애
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,beta_hat)
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
풀이6: 벡터버전, 사용자정의 손실함수 <- tf.losses.MeaSquaredError
-
포인트
## 포인트코드: 클래스로부터 손실함수 오브젝트 생성 (함수를 찍어내는 클래스)
= tf.losses.MeanSquaredError()
mse_fn mse_fn(y,yhat)
-
풀이
= tf.losses.MeanSquaredError() mseloss_fn
mseloss_fn
=tf.keras.losses.MSE
라고 보면된다.
=y.reshape(N,1)
y y.shape,X.shape
(TensorShape([200, 1]), TensorShape([200, 2]))
= tf.Variable(tnp.array([-5.0,10.0]).reshape(2,1))
beta_hat beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
def loss_fn():
= X@beta_hat # 컴퓨터한테 전달할 수식1
yhat= mseloss_fn(y.reshape(-1),yhat.reshape(-1)) # 컴퓨터한테 전달할 수식 2
loss return loss # tape.gradient(loss,beta_hat) 에서의 미분당하는애
=0.1
alpha= tf.optimizers.SGD(alpha) opt
for epoc in range(1000):
opt.minimize(loss_fn,beta_hat)
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
beta_hat
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float64, numpy=
array([[-5.],
[10.]])>
tf.keras.Sequential
-
\(\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i\) 의 서로다른 표현
import graphviz
def gv(s): return graphviz.Source('digraph G{ rankdir="LR"'+s + '; }')
'''
gv( "1" -> "beta0_hat + x*beta1_hat, bias=False"[label="* beta0_hat"]
"x" -> "beta0_hat + x*beta1_hat, bias=False"[label="* beta1_hat"]
"beta0_hat + x*beta1_hat, bias=False" -> "yhat"[label="indentity"]
''')
'''
gv("x" -> "x*beta1_hat, bias=True"[label="*beta1_hat"] ;
"x*beta1_hat, bias=True" -> "yhat"[label="indentity"] ''')
'''
gv("X=[1 x]" -> "X@beta_hat, bias=False"[label="@beta_hat"] ;
"X@beta_hat, bias=False" -> "yhat"[label="indentity"] ''')
풀이1: 벡터버전, 사용자정의 손실함수
-
포인트
## 포인트코드1: 네트워크 생성
= tf.keras.Sequential()
net
## 포인트코드2: 네트워크의 아키텍처 설계
1,input_shape=(2,),use_bias=False))
net.add(tf.keras.layers.Dense(
## 포인트코드3: 네트워크 컴파일 = 아키텍처 + 손실함수 + 옵티마이저
compile(opt,loss=loss_fn2)
net.
## 포인트코드4: 미분 & update
=1000,verbose=0,batch_size=N) net.fit(X,y,epochs
-
풀이
= tf.keras.Sequential() net
=1,input_shape=(2,),use_bias=False)) ## yhat을 구하는 방법정의 = 아키텍처가 설계 net.add(tf.keras.layers.Dense(units
- units는 layer의 출력의 차원, 이 경우는 yhat의 차원, yhat은 (200,1) 이므로 1임.
- input_shape는 layer의 입력의 차원, 이 경우는 X의 차원, X는 (200,2) 이므로 2임.
def loss_fn2(y,yhat):
return (y-yhat).T @ (y-yhat) / N
=0.1
alpha=tf.optimizers.SGD(alpha) opt
-5.0],[10.0]],dtype=np.float32)] [np.array([[
[array([[-5.],
[10.]], dtype=float32)]
-5.0],[10.0]],dtype=np.float32)]) net.set_weights([np.array([[
net.weights
[<tf.Variable 'dense/kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[-5.],
[10.]], dtype=float32)>]
compile(opt,loss=tf.losses.MSE)
net.# 아키텍처 + 손실함수 + 옵티마이저 => 네트워크에 다 합치자 => 네트워크를 컴파일한다.
=1000,batch_size=N,verbose=0) # 미분 + 파라메터업데이트 = net.fit net.fit(X,y,epochs
<keras.callbacks.History at 0x7fc7f48f5520>
verbose=0
하면 옵션 도는거 안보여짐.
net.weights
[<tf.Variable 'dense/kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[2.58366 ],
[3.933048]], dtype=float32)>]