김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

기타/WWW

김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

하늘이푸른오늘 2017. 11. 16. 12:03

Lec 04 - 다변수(Multi-variable) Linear Regression

https://www.youtube.com/watch?v=kPxpJY6fRkY

복습

선형 회귀분석을 위해서는 1) 가설(Hypothesis)를 세우고, 2) 비용(Cost/Loss) 함수를 만든 뒤, 3) Gradient descent 알고리듬을 적용한다.
비용함수를 결정하고, 이를 최소로 줄이는 W, b를 찾는 것이 학습을 시키는 과정이다.
단변수 회귀분석에서는, X=[x1, x2, .... , xn], Y=[y1, y2, ... , yn] 의 형태가 됨.

다변수 회귀분석은

X=[[x11, x12, .... , x1m],[x21, x22, .... , x2m], ..., [xn1, xn2, .... , xnm]], Y=[y1, y2, ... , yn] 형태가 됨.

가설(Hypothesis)의 행렬 표현

Instance 가 여러개 있을 경우에도 행렬 표현은 동일

Lab 04-1 다변수 Linear Regression

https://www.youtube.com/watch?v=fZUV3xjoZSM

테스트 소스코드가 있는 곳 : https://github.com/hunkim/DeepLearningZeroToAll

행렬을 사용하지 않을때의 코드

import tensorflow as tf

x1_data = [73., 93., 89., 96., 73.]
x2_data = [80., 88., 91., 98., 66.]
x3_data = [75., 93., 90., 100., 70.]
y_data = [152., 185., 180., 196., 142.]

# placeholders for a tensor that will be always fed.
x1 = tf.placeholder(tf.float32)
x2 = tf.placeholder(tf.float32)
x3 = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

w1 = tf.Variable(tf.random_normal([1]), name='weight1')
w2 = tf.Variable(tf.random_normal([1]), name='weight2')
w3 = tf.Variable(tf.random_normal([1]), name='weight3')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = x1 * w1 + x2 * w2 + x3 * w3 + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize. Need a very small learning rate for this data set
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

#very long routine. because it does not converge well
for step in range(500001):
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={x1: x1_data, x2: x2_data, x3: x3_data, Y: y_data})
if step % 10000 == 0:
print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)

행렬을 사용할 때의 코드

import tensorflow as tf

x_data = [[73., 80., 75.],
[93., 88., 93.],
[89., 91., 90.],
[96., 98., 100.],
[73., 76., 70.]]

y_data = [[152.], [185.], [180.], [196.], [142.]]

X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(500001) :
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={X: x_data, Y: y_data})
if step %10000 == 0 :
print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

Lab 4-2, 파일에서 데이터 불러오기

https://www.youtube.com/watch?v=o2q4QNnoShY

파일에 데이터를 작성하고 읽어서 처리.

import tensorflow as tf
import numpy as np

tf.set_random_seed(777)

xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(100001) :
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={X: x_data, Y: y_data})
if step %10000 == 0 :
print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

# Ask my score
print ("Your score will be ", sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
print ("Our score will be ", sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

Queue Runners : 파일이 아주 커서 한꺼번에 돌리기 힘들 때 사용하는 방법

A,B,C 와 같은 여러개의 파일을 읽어올 수 있다.
이 파일이름들을 Queue에 쌓는다.
Reader 로 연결해서, 디코딩하여 Queue에 쌓는다.
이 Queue에서 부분씩 잘라내어 학습을 시킨다.
이 과정은 Tensorflow가 책임을 진다.

사용법은 간단. 세가지 단계

파일 이름들을 나열함으로써, filename_queue 를 만든다.
Reader를 연결한다.
디코딩 한다. (여기에서는 csv 이므로... decode_csv. (default 값을 정해줌.

그다음에 배치로 넘겨줌

x_data, y_data 대신에 x_batch, y_batch로 만들어 넘겨주고
coord 있는 부분은 그냥 쓴다고 생각하면 됨.

import tensorflow as tf

filename_queue = tf.train.string_input_producer(
['data-01-test-score.csv'], shuffle = False, name = 'filename_queue')

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# column이 비어 있을 경우의 default 값. decoded 결과의 type도 정해줌
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# csv 의 배치를 모음
train_x_batch, train_y_batch = tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

#placeholder
X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

#Start populating the filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(100001) :
x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
                  feed_dict={X: x_batch, Y: y_batch})
if step %10000 == 0 :
      print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

coord.request_stop()
coord.join(threads)

print ("Your score will be ", sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
print ("Our score will be ", sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

data-01-test-score.csv

73,80,75,152
93,88,93,185
89,91,90,180
96,98,100,196
73,66,70,142
53,46,55,101
69,74,77,149
47,56,60,115
87,79,90,175
79,70,88,164
69,70,73,141
70,65,74,141
93,95,91,184
79,80,73,152
70,73,78,148
93,89,96,192
78,75,68,147
81,90,93,183
88,92,86,177
78,83,77,159
82,86,90,177
86,82,89,175
78,83,85,175
76,83,71,149
96,93,95,192

현재글김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

지오캐싱, 위성영상, 3D 빌딩, 3D City, 3D모델, 구글, 구글어스, Sketchup, GPS, Geocaching, Google Earth, 드론, 스트릿뷰, 스케치업, Stable Diffusion, Quadcopter, street view, Drone, 스테이블 디퓨전, google,

Today :
Yesterday :

공간정보와 인터넷지도