김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

기타/WWW

김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

하늘이푸른오늘 2017. 11. 16. 12:03

Lec 04 - 다변수(Multi-variable) Linear Regression

https://www.youtube.com/watch?v=kPxpJY6fRkY

복습

선형 회귀분석을 위해서는 1) 가설(Hypothesis)를 세우고, 2) 비용(Cost/Loss) 함수를 만든 뒤, 3) Gradient descent 알고리듬을 적용한다.
비용함수를 결정하고, 이를 최소로 줄이는 W, b를 찾는 것이 학습을 시키는 과정이다.
단변수 회귀분석에서는, X=[x1, x2, .... , xn], Y=[y1, y2, ... , yn] 의 형태가 됨.

다변수 회귀분석은

X=[[x11, x12, .... , x1m],[x21, x22, .... , x2m], ..., [xn1, xn2, .... , xnm]], Y=[y1, y2, ... , yn] 형태가 됨.

가설(Hypothesis)의 행렬 표현

Instance 가 여러개 있을 경우에도 행렬 표현은 동일

Lab 04-1 다변수 Linear Regression

https://www.youtube.com/watch?v=fZUV3xjoZSM

테스트 소스코드가 있는 곳 : https://github.com/hunkim/DeepLearningZeroToAll

행렬을 사용하지 않을때의 코드

import tensorflow as tf

x1_data = [73., 93., 89., 96., 73.]
x2_data = [80., 88., 91., 98., 66.]
x3_data = [75., 93., 90., 100., 70.]
y_data = [152., 185., 180., 196., 142.]

# placeholders for a tensor that will be always fed.
x1 = tf.placeholder(tf.float32)
x2 = tf.placeholder(tf.float32)
x3 = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

w1 = tf.Variable(tf.random_normal([1]), name='weight1')
w2 = tf.Variable(tf.random_normal([1]), name='weight2')
w3 = tf.Variable(tf.random_normal([1]), name='weight3')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = x1 * w1 + x2 * w2 + x3 * w3 + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize. Need a very small learning rate for this data set
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

#very long routine. because it does not converge well
for step in range(500001):
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={x1: x1_data, x2: x2_data, x3: x3_data, Y: y_data})
if step % 10000 == 0:
print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)

행렬을 사용할 때의 코드

import tensorflow as tf

x_data = [[73., 80., 75.],
[93., 88., 93.],
[89., 91., 90.],
[96., 98., 100.],
[73., 76., 70.]]

y_data = [[152.], [185.], [180.], [196.], [142.]]

X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(500001) :
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={X: x_data, Y: y_data})
if step %10000 == 0 :
print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

Lab 4-2, 파일에서 데이터 불러오기

https://www.youtube.com/watch?v=o2q4QNnoShY

파일에 데이터를 작성하고 읽어서 처리.

import tensorflow as tf
import numpy as np

tf.set_random_seed(777)

xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(100001) :
cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
feed_dict={X: x_data, Y: y_data})
if step %10000 == 0 :
print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

# Ask my score
print ("Your score will be ", sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
print ("Our score will be ", sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

Queue Runners : 파일이 아주 커서 한꺼번에 돌리기 힘들 때 사용하는 방법

A,B,C 와 같은 여러개의 파일을 읽어올 수 있다.
이 파일이름들을 Queue에 쌓는다.
Reader 로 연결해서, 디코딩하여 Queue에 쌓는다.
이 Queue에서 부분씩 잘라내어 학습을 시킨다.
이 과정은 Tensorflow가 책임을 진다.

사용법은 간단. 세가지 단계

파일 이름들을 나열함으로써, filename_queue 를 만든다.
Reader를 연결한다.
디코딩 한다. (여기에서는 csv 이므로... decode_csv. (default 값을 정해줌.

그다음에 배치로 넘겨줌

x_data, y_data 대신에 x_batch, y_batch로 만들어 넘겨주고
coord 있는 부분은 그냥 쓴다고 생각하면 됨.

import tensorflow as tf

filename_queue = tf.train.string_input_producer(
['data-01-test-score.csv'], shuffle = False, name = 'filename_queue')

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# column이 비어 있을 경우의 default 값. decoded 결과의 type도 정해줌
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# csv 의 배치를 모음
train_x_batch, train_y_batch = tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

#placeholder
X = tf.placeholder(tf.float32, shape=[None, 3]) #None 이란 n, 즉, 임의로 바뀔 수 있다는 뜻.
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

#Start populating the filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(100001) :
x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
                  feed_dict={X: x_batch, Y: y_batch})
if step %10000 == 0 :
      print(step, "Cost : ", cost_val, "\nPrediction :\n", hy_val)

coord.request_stop()
coord.join(threads)

print ("Your score will be ", sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
print ("Our score will be ", sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

data-01-test-score.csv

73,80,75,152
93,88,93,185
89,91,90,180
96,98,100,196
73,66,70,142
53,46,55,101
69,74,77,149
47,56,60,115
87,79,90,175
79,70,88,164
69,70,73,141
70,65,74,141
93,95,91,184
79,80,73,152
70,73,78,148
93,89,96,192
78,75,68,147
81,90,93,183
88,92,86,177
78,83,77,159
82,86,90,177
86,82,89,175
78,83,85,175
76,83,71,149
96,93,95,192

현재글김성훈 딥러닝 4 - 다변수(Multi-variable) Linear Regression

Stable Diffusion, 3D 빌딩, Drone, 스트릿뷰, 스테이블 디퓨전, 3D모델, Geocaching, 인공지능 이미지, 3D City, Quadcopter, 지오캐싱, 위성영상, street view, 이미지 생성 AI, GPS, 구글, Google Earth, 드론, google, 구글어스,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

공간정보와 인터넷지도