深度学习概览

From Linear Regression: Exhaustive Search

Prepare the train set.
Define the model: $y_{hat} = x * w$ .
Define the loss funciton: $loss = (y_{hat} - y)^2 = (x * w - y)^2$ .
List w_list save the weights $\omega$
List mse_list save the cost values of each $\omega$ .
Compute cost value at (0.0,4.0,0.1)
Value of cost function is the sum of loss function

Find the min_loss.

Gradient Descent(梯度下降算法)

$\alpha$ is for learning rate.

Use $y = x * w$ as example:

code and result:

Define the cost function: $cost = \sum_{i=1}^n (y_{hat} - y)^2$
Define the gardient function: $\frac{\partial cost}{\partial \omega} = \frac{1}{N} \sum_{n=1}^{N} 2 \cdot x_n \cdot (x_n \cdot \omega - y_n)$
Do the update: $\omega = \omega - \alpha\frac{\partial cost}{\partial \omega}$

Use Stochastic Gradient Descent(随机梯度下降) to replace the normal: escape hte Saddle Points(but high Time Complexity)

Use Batch or Mini-Batch(批量梯度下降) to balance Time Complexity and Correct Rate.

Back Propagation(反向传播)

Neural Networks

Complicated net work:

Still a simple example:

Derivative

Nonlinear Function(激活函数):Introduce non-linearity into neural networks:

Sigmoid
ReLU
softplus
…

Backward

Actually:Derivative’s Chain Rules

Still a simple example:

in PyTorch

In PyTorch, Tensor is the important component in constructing dynamic computational graph.

It contains Data and Grad, which storage the value of node and gradient w.r.t loss respectively.

If autograd mechanics are required, the element variable requires_grad of Tensor has to be True

Model:

Define the linear model and the loss function.

Forward, compute the loss.
Backward, compute the grad for Tensor whose requires_grad set to True.
The grad is utilized to update weight.
NOTICE:
- The grad computed by .backward() will be accumulated.
- So after update, remember set the grad to ZERO!

Linear Regression with PyTorch

Prepare dataset
Design model using Class: inherit form nn.Module
Construct loss and optimizer: using PyTorch API
Training cycle: forward, backward, update

Code:

import torch

# Prepare dataset
x_data = torch.Tensor([[1.0],[2.0],[3.0]])
y_data = torch.Tensor([[2.0],[4.0],[6.0]])

# Design model using Class
class LinearModel(torch.nn.Module):
  def __init__(self):
    super(LinearModel,self).__init__()
    self.linear = torch.nn.Linear(1,1)
# Class nn.Linear contains two members Tensors: Weight and Bias

  def forward(self,x):
    y_pred = self.linear(x)
    return y_pred
model = LinearModel()


# Construct loss and optimizer
criterion = torch.nn.MSELoss(size_average=False,reduce=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

# Training cycle
for epoch in range(1000):
  y_pred = model(x_data)
  loss = criterion(y_pred,y_data) # forward
  print(epoch,loss.item())

  optimizer.zero_grad()
  loss.backward() # backward
  optimizer.step() # update

print('w = ',model.linear.weight.item())
print('b = ',model.linear.bias.item())

x_test = torch.Tensor([[4.0]])
y_test = model(x_test)
print('y_pred = ',y_test.data)

optimizer in PyTorch:

torch.optim.Adagrad
torch.optim.Adam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RMSprop
torch.optim.Rprop
torch.optim.SGD

Logistic Regression

For classification(Yes/No).

torchvision：

1
2
3

import torchvision
train_set = torchvision.datasets.MNIST(root='../dataset.mnist',train=True，download=True)
test_set = torchvision.datasets.MNIST(root='../dataset.mnist',train=False，download=True)

example library：

MNIST/Fashion-MNIST
CIFAR-10/CIFAR-100
ImageNet
COCO

\sigma(x) = \frac{1}{1+e^{-x}}

Loss function and optimizer:

loss = J(\theta)= -\frac{1}{N}\sum[y\log(\^y)+(1-y)\log(1-\^y)]

BCEloss(交叉熵):

if $y=1$ : bigger $\^y$ is greater.
if $y=0$ : smaller $\^y$ is greater.

Code：

import torch
import numpy as np
import matplotlib.pyplot as plt

# Prepare dataset
x_data = torch.Tensor([[1.0],[2.0],[3.0]])
y_data = torch.Tensor([[0],[0],[1]])

# Design model using Class
class LogisticRegressionModel(torch.nn.Module):
  def __init__(self):
    super(LogisticRegressionModel,self).__init__()
    self.linear = torch.nn.Linear(1,1)

  def forward(self,x):
    y_pred = F.sigmoid(self.linear(x))
    return y_pred
model = LogisticRegressionModel()

# Construct loss and optimizer
criterion = torch.nn.BCELoss(size_average=False,reduce=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

# Training cycle
for epoch in range(1000):
  y_pred = model(x_data)
  loss = criterion(y_pred,y_data) # forward
  print(epoch,loss.item())

  optimizer.zero_grad()
  loss.backward() # backward
  optimizer.step() # update

x = np.linspace(0,10,200)
x_t = torch.Tensor(x).view((200,1))
y_t = model(x_t)
y = y_t.data.numpy()
plt.plot(x,y)
plt.plot([0,10],[0.5,0.5],c='r')
plt.xlabel('Hours')
plt.ylabel('Probability of Pass')
plt.grid()
plt.show()

Multiple Dimension

A variable is called a Feature.

8-tensor as an example.

change dimension:

class Model(torch.nn.Module):
  def __init__(self):
    super(Model.self).__init__()
    self.linear1 = torch.nn.Linear(8,6)
    self.linear2 = torch.nn.Linear(6,2)
    self.linear3 = torch.nn.Linear(2,1)
    self.sigmoid= torch.nn.Sigmoid()
  
  def forward(self,x):
    x = self.sigmoid(self.linear1(x))
    x = self.sigmoid(self.linear2(x))
    x = self.sigmoid(self.linear3(x))

Dataset and DataLoader

# Training cycle
for epoch in range(training_epochs):\
  # Loop over all batches
  for i in range(total_batch):

Epoch: One forward pass and one backward pas of all the training examples.
Batch-Size: The number of training examples in one forward backward pass.
Iteration: Number of passes, each pass using [batch-size] number of examples.

Define Dataset:

import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
  def __init__(self):
    pass

  def __getitem__(self,index): # support index
    pass

  def __len__(self):
    pass

dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,batch_size=32,shuffle=True,num_workers=2)
# shuffle:打乱  num_workers:并行进程数
...
for epoch in range(100):
  for i,data in enumerate(train_loader,0):
    ...

Example:Diabetes Dataset

import numpy as np
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
  def __init__(self,filepath):
    xy = np.loadtxt(filepath,delimiter=',',dtype-np.float32)
    self.len = xy.shape[0]
    self.x_data = torch.from_numpy(xy[:,:-1])
    self.y_data = torch.from_numpy(xy[:,[-1]])

  def __getitem__(self,index):
    return self.x_data[index], self.y_data[index]

  def __len__(self):
    return self.len

dataset = DeabetesDataset('diabetes.csv.gz')
train_loader = DataLoader(dataset=dataset,batch_size=32,shuffle=True,num_workers=2)

class Model(torch.nn.Module):
  def __init__(self):
    super(Model.self).__init__()
    self.linear1 = torch.nn.Linear(8,6)
    self.linear2 = torch.nn.Linear(6,2)
    self.linear3 = torch.nn.Linear(2,1)
    self.sigmoid= torch.nn.Sigmoid()
  
  def forward(self,x):
    x = self.sigmoid(self.linear1(x))
    x = self.sigmoid(self.linear2(x))
    x = self.sigmoid(self.linear3(x))

model = Model()

criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

for epoch in range(100):
  for i,data in enumerate(train_loader,0):
    # 1.prepare data
    inputs, labels = data
    # 2.forward
    y_pred = model(inputs)
    loss = criterion(y_pred,labels)
    print(epoch,i,loss.item())
    # 3.backward
    optimizer.zero_grad()
    loss.backward()
    # 4.update
    optimizer.step()

Focus on Prepare Dataset and Training Cycle.

Another example: MINST Dataset in torchvision.

import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets

train_dataset = datasets.MNIST(root='dataset/mnist',train=True,transform=transforms.ToTensor(),download=True)
test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,transform=transforms.ToTensor(),download=True)

train_loader = DataLoader(dataset=train_dataset,batch_size=32,shuffle=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=32,shuffle=False)

for epoch in range(100):
  for batch_idx,(inputs,target) in enumerate(train_loader):
    ...

Softmax Classifier

Suppose $Z^l\in \mathcal{R}^K$ is the ouput of the last linear layer, the Softmax function:

P(y=i)=\frac{e^{Z_i}}{\sum^{K-1}_{j=0}e^{Z_j}},i\in 0,...,K-1

Negative Log Likelihood Loss: $NLLLoss = -Y\log \hat Y$

import torch
criterion = torch.nn.CrossEntropyLoss()
Y = torch.LongTensor([2,0,1])

Y_pred1 = torch.Tensor([[0.1,0.2,0.9],
                        [1.1,0.1,0.2],
                        [0.2,2.1,0.1]])
Y_pred2 = torch.Tensor([[0.8,0.2,0.3],
                        [0.2,0.3,0.5],
                        [0.2,0.2,0.5]])

l1 = criterion(Y_pred1,Y)
l2 = criterion(Y_pred2,Y)
print("Batch Loss1 = ",l1.data,"\nBatch Loss2 = ",l2.data)

CrossEntropyLoss <==> LogSoftmax + NLLLoss

MNIST as example:

import torch
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim

batch_size = 64
transform = transforms.Compose([
  transforms.ToTensor(), # Convert the Pillow Image to Tensor
  transforms.Normalize((0.1307,),(0.3081,)) # mean and std:to N(0,1)
]) 

train_dataset = datasets.MNIST(root='dataset/mnist',train=True,transform=transform,download=True)
test_dataset = datasets.MNIST(root='dataset/mnist',train=False,transform=transform,download=True)

train_loader = DataLoader(dataset=train_dataset,batch_size=32,shuffle=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=32,shuffle=False)

class Net(torch.nn.Module):
  def __init__(self):
    super(Net,self).__init__()
    self.l1 = torch.nn.Linear(784,512)
    self.l2 = torch.nn.Linear(512,256)
    self.l3 = torch.nn.Linear(256,128)
    self.l4 = torch.nn.Linear(128,64)
    self.l5 = torch.nn.Linear(64,10)

  def forward(self,x): # use ReLU
    x = x.view(-1,784) # change shape
    x = F.relu(self.l1(x))
    x = F.relu(self.l2(x))
    x = F.relu(self.l3(x))
    x = F.relu(self.l4(x))
    return self.l5(x)

model = Net()

criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)

def train(epoch):
  running_loss = 0.0
  for batch_idx,data in enumerate(train_loader,0):
    inputs,target = data
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = criterion(outputs,target)
    loss.backward()
    optimizer.step()

    running_loss += loss.item()
    if batch_idx % 300 == 299:
      print('[%d,%5d] loss: %.3f' % (epoch+1,batch_idx+1,running_loss/300))
      running_loss = 0.0

def test():
  correct = 0
  total = 0
  with torch.no_grad():
    for data in test_loader:
      images,labels = data
      outputs = model(images)
      _,predicted = torch.max(outputs.data,dim=1)
      total += labels.size(0)
      correct += (predicted == labels).sum().item()
  print('Accuracy on test set: %d %%' %(100*correct/total))

if __name__ == '__main__':
  for epoch in range(10):
    train(epoch)
    test()

Convert the Pillow Image to Tensor: $\mathcal{Z}^{28\times 28},pixel\in 0,...,255 \rightarrow \mathcal{R}^{1\times 28\times 28},pixle\in[0,1]$