Intelligent Computing System is a combination of Deep Learning, Parallel Programming, Computer Organization and Computer Architecture.

Neural Network Basis

Loss function L(w)=12i(H(xi)yi)2=12i(wTxyi)L(w)=\frac{1}{2}\sum_i(H(x_i)-y_i)^2=\frac{1}{2}\sum_i(w^Tx-y_i)

Gradient Descent: w=wαL(w)ww=w-\alpha\frac{\partial L(w)}{\partial w}

Activate Function

Back Propagation: Chain Rule Lw=Wyyzzw\frac{\partial L}{\partial w}=\frac{\partial W}{\partial y}\frac{\partial y}{\partial z}\frac{\partial z}{\partial w}

Neural Network structure: input layer, latent layer, output layer

CNN

convolution layer

pooling

fully connect + softmax f(zj)=ezjiezif(z_j)=\frac{e^{z_j}}{\sum_ie^{z_i}}

z.B. alexnet, VGG, Inception, ResNet

How to judge CNN?

  • IoU aka Jaccard index 交并比

    IoU=ABABIoU=\frac{A\bigcap B}{A\bigcup B}

    if IoU>0.5, location accepted.

  • mAP aka mean average precision

    mAP=q=1QAveP(q)Q=\frac{\sum_{q=1}^QAveP(q)}{Q}

    recall=TPTP+FN=\frac{TP}{TP+FN}

    precision=TPTP+FP=\frac{TP}{TP+FP}

Object detective

R-CNN, YOLO

RNN

sequence, recurrent, memory

LSTM

GRU

GAN

generator, judger

CGAN, ConditionGAN

Deep Learning Framework

Tensorflow

Computation are expressed as stateful dataflow graphs.

All data is modelled as Tensor.

Computing operations running in Session.

Asynchronization execute stateful data flow graph through Queue.

Automatic differentiation

PyTorch

  • flexible
  • Python and C++
  • In research area

MXNet

  • R, Julia, Go
  • Efficiency & flexibility

Caffe

  • The earlist
  • lack flexibility
  • No longer maintain

Deep Learning Processor

aka deep learning accelerator

DLP is an electronic circuit designed for deep learning algorithms, usually with separate data memory and dedicated instruction set architecture.

Aim to optimize:

  • Data level parallel
  • Vectorize operation

DLP Instruction Set

Other accelerator

  • GPU

  • FPGA

Deep Learning Language

Heterogeneous computing 异构计算

  • Task division
  • Data distribution
  • Data communication
  • Parallel and Synchronization

How to develop a new operator? 如何开发一个新算子