• Prepare the train set.
  • Define the model: yhat=xwy_{hat} = x * w.
  • Define the loss funciton: loss=(yhaty)2=(xwy)2loss = (y_{hat} - y)^2 = (x * w - y)^2.
  • List w_list save the weights ω\omega
    List mse_list save the cost values of each ω\omega.
  • Compute cost value at (0.0,4.0,0.1)
  • Value of cost function is the sum of loss function

Find the min_loss.

Gradient Descent(梯度下降算法)

α\alpha is for learning rate.

Use y=xwy = x * w as example:

code and result:

  • Define the cost function:cost=i=1n(yhaty)2cost = \sum_{i=1}^n (y_{hat} - y)^2
  • Define the gardient function:

    costω=1Nn=1N2xn(xnωyn)\frac{\partial cost}{\partial \omega} = \frac{1}{N} \sum_{n=1}^{N} 2 \cdot x_n \cdot (x_n \cdot \omega - y_n)

  • Do the update:

    ω=ωαcostω\omega = \omega - \alpha\frac{\partial cost}{\partial \omega}

Use Stochastic Gradient Descent(随机梯度下降) to replace the normal: escape hte Saddle Points(but high Time Complexity)

Use Batch or Mini-Batch(批量梯度下降) to balance Time Complexity and Correct Rate.

Back Propagation(反向传播)

Neural Networks

Complicated net work:

Still a simple example:

Derivative

Nonlinear Function(激活函数):Introduce non-linearity into neural networks:

  • Sigmoid
  • ReLU
  • softplus

Backward

Actually:Derivative’s Chain Rules

Still a simple example:

in PyTorch

In PyTorch, Tensor is the important component in constructing dynamic computational graph.

It contains Data and Grad, which storage the value of node and gradient w.r.t loss respectively.

If autograd mechanics are required, the element variable requires_grad of Tensor has to be True

Model:

  • Define the linear model and the loss function.
  • Forward, compute the loss.
  • Backward, compute the grad for Tensor whose requires_grad set to True.
  • The grad is utilized to update weight.
  • NOTICE:
    • The grad computed by .backward() will be accumulated.
    • So after update, remember set the grad to ZERO!

Linear Regression with PyTorch

  1. Prepare dataset
  2. Design model using Class: inherit form nn.Module
  3. Construct loss and optimizer: using PyTorch API
  4. Training cycle: forward, backward, update

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import torch

# Prepare dataset
x_data = torch.Tensor([[1.0],[2.0],[3.0]])
y_data = torch.Tensor([[2.0],[4.0],[6.0]])

# Design model using Class
class LinearModel(torch.nn.Module):
def __init__(self):
super(LinearModel,self).__init__()
self.linear = torch.nn.Linear(1,1)
# Class nn.Linear contains two members Tensors: Weight and Bias

def forward(self,x):
y_pred = self.linear(x)
return y_pred
model = LinearModel()


# Construct loss and optimizer
criterion = torch.nn.MSELoss(size_average=False,reduce=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

# Training cycle
for epoch in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred,y_data) # forward
print(epoch,loss.item())

optimizer.zero_grad()
loss.backward() # backward
optimizer.step() # update

print('w = ',model.linear.weight.item())
print('b = ',model.linear.bias.item())

x_test = torch.Tensor([[4.0]])
y_test = model(x_test)
print('y_pred = ',y_test.data)

optimizer in PyTorch:

  • torch.optim.Adagrad
  • torch.optim.Adam
  • torch.optim.Adamax
  • torch.optim.ASGD
  • torch.optim.LBFGS
  • torch.optim.RMSprop
  • torch.optim.Rprop
  • torch.optim.SGD

Logistic Regression

For classification(Yes/No).

torchvision:

1
2
3
import torchvision
train_set = torchvision.datasets.MNIST(root='../dataset.mnist',train=True,download=True)
test_set = torchvision.datasets.MNIST(root='../dataset.mnist',train=False,download=True)

example library:

  • MNIST/Fashion-MNIST
  • CIFAR-10/CIFAR-100
  • ImageNet
  • COCO

σ(x)=11+ex\sigma(x) = \frac{1}{1+e^{-x}}

Loss function and optimizer:

loss = J(\theta)= -\frac{1}{N}\sum[y\log(\^y)+(1-y)\log(1-\^y)]

BCEloss(交叉熵):

  • if y=1y=1: bigger \^y is greater.
  • if y=0y=0: smaller \^y is greater.

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import torch
import numpy as np
import matplotlib.pyplot as plt

# Prepare dataset
x_data = torch.Tensor([[1.0],[2.0],[3.0]])
y_data = torch.Tensor([[0],[0],[1]])

# Design model using Class
class LogisticRegressionModel(torch.nn.Module):
def __init__(self):
super(LogisticRegressionModel,self).__init__()
self.linear = torch.nn.Linear(1,1)

def forward(self,x):
y_pred = F.sigmoid(self.linear(x))
return y_pred
model = LogisticRegressionModel()

# Construct loss and optimizer
criterion = torch.nn.BCELoss(size_average=False,reduce=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

# Training cycle
for epoch in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred,y_data) # forward
print(epoch,loss.item())

optimizer.zero_grad()
loss.backward() # backward
optimizer.step() # update

x = np.linspace(0,10,200)
x_t = torch.Tensor(x).view((200,1))
y_t = model(x_t)
y = y_t.data.numpy()
plt.plot(x,y)
plt.plot([0,10],[0.5,0.5],c='r')
plt.xlabel('Hours')
plt.ylabel('Probability of Pass')
plt.grid()
plt.show()

Multiple Dimension

A variable is called a Feature.

8-tensor as an example.

change dimension:

1
2
3
4
5
6
7
8
9
10
11
12
class Model(torch.nn.Module):
def __init__(self):
super(Model.self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,2)
self.linear3 = torch.nn.Linear(2,1)
self.sigmoid= torch.nn.Sigmoid()

def forward(self,x):
x = self.sigmoid(self.linear1(x))
x = self.sigmoid(self.linear2(x))
x = self.sigmoid(self.linear3(x))

Dataset and DataLoader

1
2
3
4
# Training cycle
for epoch in range(training_epochs):\
# Loop over all batches
for i in range(total_batch):

Epoch: One forward pass and one backward pas of all the training examples.
Batch-Size: The number of training examples in one forward backward pass.
Iteration: Number of passes, each pass using [batch-size] number of examples.

Define Dataset:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
def __init__(self):
pass

def __getitem__(self,index): # support index
pass

def __len__(self):
pass

dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,batch_size=32,shuffle=True,num_workers=2)
# shuffle:打乱 num_workers:并行进程数
...
for epoch in range(100):
for i,data in enumerate(train_loader,0):
...

Example:Diabetes Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy as np
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
def __init__(self,filepath):
xy = np.loadtxt(filepath,delimiter=',',dtype-np.float32)
self.len = xy.shape[0]
self.x_data = torch.from_numpy(xy[:,:-1])
self.y_data = torch.from_numpy(xy[:,[-1]])

def __getitem__(self,index):
return self.x_data[index], self.y_data[index]

def __len__(self):
return self.len

dataset = DeabetesDataset('diabetes.csv.gz')
train_loader = DataLoader(dataset=dataset,batch_size=32,shuffle=True,num_workers=2)

class Model(torch.nn.Module):
def __init__(self):
super(Model.self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,2)
self.linear3 = torch.nn.Linear(2,1)
self.sigmoid= torch.nn.Sigmoid()

def forward(self,x):
x = self.sigmoid(self.linear1(x))
x = self.sigmoid(self.linear2(x))
x = self.sigmoid(self.linear3(x))

model = Model()

criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

for epoch in range(100):
for i,data in enumerate(train_loader,0):
# 1.prepare data
inputs, labels = data
# 2.forward
y_pred = model(inputs)
loss = criterion(y_pred,labels)
print(epoch,i,loss.item())
# 3.backward
optimizer.zero_grad()
loss.backward()
# 4.update
optimizer.step()

Focus on Prepare Dataset and Training Cycle.

Another example: MINST Dataset in torchvision.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets

train_dataset = datasets.MNIST(root='dataset/mnist',train=True,transform=transforms.ToTensor(),download=True)
test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,transform=transforms.ToTensor(),download=True)

train_loader = DataLoader(dataset=train_dataset,batch_size=32,shuffle=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=32,shuffle=False)

for epoch in range(100):
for batch_idx,(inputs,target) in enumerate(train_loader):
...

Softmax Classifier

Suppose ZlRKZ^l\in \mathcal{R}^K is the ouput of the last linear layer, the Softmax function:

P(y=i)=eZij=0K1eZj,i0,...,K1P(y=i)=\frac{e^{Z_i}}{\sum^{K-1}_{j=0}e^{Z_j}},i\in 0,...,K-1

Negative Log Likelihood Loss:NLLLoss=YlogY^NLLLoss = -Y\log \hat Y

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
criterion = torch.nn.CrossEntropyLoss()
Y = torch.LongTensor([2,0,1])

Y_pred1 = torch.Tensor([[0.1,0.2,0.9],
[1.1,0.1,0.2],
[0.2,2.1,0.1]])
Y_pred2 = torch.Tensor([[0.8,0.2,0.3],
[0.2,0.3,0.5],
[0.2,0.2,0.5]])

l1 = criterion(Y_pred1,Y)
l2 = criterion(Y_pred2,Y)
print("Batch Loss1 = ",l1.data,"\nBatch Loss2 = ",l2.data)

CrossEntropyLoss <==> LogSoftmax + NLLLoss

MNIST as example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import torch
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim

batch_size = 64
transform = transforms.Compose([
transforms.ToTensor(), # Convert the Pillow Image to Tensor
transforms.Normalize((0.1307,),(0.3081,)) # mean and std:to N(0,1)
])

train_dataset = datasets.MNIST(root='dataset/mnist',train=True,transform=transform,download=True)
test_dataset = datasets.MNIST(root='dataset/mnist',train=False,transform=transform,download=True)

train_loader = DataLoader(dataset=train_dataset,batch_size=32,shuffle=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=32,shuffle=False)

class Net(torch.nn.Module):
def __init__(self):
super(Net,self).__init__()
self.l1 = torch.nn.Linear(784,512)
self.l2 = torch.nn.Linear(512,256)
self.l3 = torch.nn.Linear(256,128)
self.l4 = torch.nn.Linear(128,64)
self.l5 = torch.nn.Linear(64,10)

def forward(self,x): # use ReLU
x = x.view(-1,784) # change shape
x = F.relu(self.l1(x))
x = F.relu(self.l2(x))
x = F.relu(self.l3(x))
x = F.relu(self.l4(x))
return self.l5(x)

model = Net()

criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)

def train(epoch):
running_loss = 0.0
for batch_idx,data in enumerate(train_loader,0):
inputs,target = data
optimizer.zero_grad()

outputs = model(inputs)
loss = criterion(outputs,target)
loss.backward()
optimizer.step()

running_loss += loss.item()
if batch_idx % 300 == 299:
print('[%d,%5d] loss: %.3f' % (epoch+1,batch_idx+1,running_loss/300))
running_loss = 0.0

def test():
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images,labels = data
outputs = model(images)
_,predicted = torch.max(outputs.data,dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy on test set: %d %%' %(100*correct/total))

if __name__ == '__main__':
for epoch in range(10):
train(epoch)
test()

Convert the Pillow Image to Tensor:Z28×28,pixel0,...,255R1×28×28,pixle[0,1]\mathcal{Z}^{28\times 28},pixel\in 0,...,255 \rightarrow \mathcal{R}^{1\times 28\times 28},pixle\in[0,1]