APS 非重点复习课程之智能计算系统
Intelligent Computing System is a combination of Deep Learning, Parallel Programming, Computer Organization and Computer Architecture.
Neural Network Basis
Loss function
Gradient Descent:
Activate Function
Back Propagation: Chain Rule
Neural Network structure: input layer, latent layer, output layer
CNN
convolution layer
pooling
fully connect + softmax
z.B. alexnet, VGG, Inception, ResNet
How to judge CNN?
IoU aka Jaccard index 交并比
if IoU>0.5, location accepted.
mAP aka mean average precision
mAP
recall
precision
Object detective
R-CNN, YOLO
RNN
sequence, recurrent, memory
LSTM
GRU
GAN
generator, judger
CGAN, ConditionGAN
Deep Learning Framework
Tensorflow
Computation are expressed as stateful dataflow graphs.
All data is modelled as Tensor.
Computing operations running in Session.
Asynchronization execute stateful data flow graph through Queue.
Automatic differentiation
PyTorch
- flexible
- Python and C++
- In research area
MXNet
- R, Julia, Go
- Efficiency & flexibility
Caffe
- The earlist
- lack flexibility
- No longer maintain
Deep Learning Processor
aka deep learning accelerator
DLP is an electronic circuit designed for deep learning algorithms, usually with separate data memory and dedicated instruction set architecture.
Aim to optimize:
- Data level parallel
- Vectorize operation
DLP Instruction Set
Other accelerator
GPU
FPGA
Deep Learning Language
Heterogeneous computing 异构计算
- Task division
- Data distribution
- Data communication
- Parallel and Synchronization
How to develop a new operator? 如何开发一个新算子