第二门课-第三周

建议重要性
- 学习步长α
- β(Momentum),hide units,mini-batch size
- layouts,α的衰减率
- β1,β2,ϵ(Adam)
传统ml一般使用grid精确化的选参方法。dl进行随机选择(因为不确定哪些参数重要),采用由粗精确到细的方法，范围逐渐变小

比如说隐藏层层数，可以在坐标轴上2，3，4…选择
对数标尺：$[10^a,10^b]$->(a,b)的均值的选择，例如$\beta$(0.9, 0.999)->$1-\beta$(0.1,0.001)即($10^{-1},10^{-3}$)
需要根据不同表达式来判断各个超参数的敏感度，比如表达式$\frac{1}{1-\beta}$在$\beta$快接近0的时候，就很敏感

softmax与之前的Sigmoid,Relu不同是因为输入为一个n维向量，输出也为n维向量，并且输出是各个C类的概率
计算方法：
直观理解：多个线性边界函数进行划分
和hardmax做比较，hardmax是输入一向量，输出为(0,1)的向量组，1为最大概率
Loss Fun：L(y^,y)=-∑j=1cyjlogyj^ —> L(y^,y)=-logy^c : 就是你的训练集中的真实类别，然后试图使该类别相应的概率尽可能地高
应用到神经网络中:其实就是用你的Softmax激活函数来得到a[l]或者说y

Tensorflow is a programming framework used in deep learning
The two main object classes in tensorflow are Tensors and Operators.
When you code in tensorflow you have to take the following steps:
- Create a graph containing Tensors (Variables, Placeholders …) and Operations (tf.matmul, tf.add, …)
- Create a session
- Initialize the session
- Run the session to execute the graph
You can execute the graph multiple times as you’ve seen in model()
The backpropagation and optimization is automatically done when running the session on the “optimizer” object

Writing and running programs in TensorFlow has the following steps:

computation graph