Coursera – MachineLearning[Note]

https://github.com/ngavrish/coursera-machine-learning-1

Week2

https://github.com/DragonflyStats/Coursera-ML/blob/master/QuizNotebook/Week02Quiz.tex

https://github.com/ngavrish/coursera-machine-learning-1/blob/master/quiz/week2-quiz4.md

Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed
• Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn  from experience E with respect to some task T and some performance measure P, if its
performance on T, as measured by P, improves with experience E.

regression-vs-classfication

Linear regression  with one variable

cost-function

 

gradient-descent

 

Coursera上的machine learning中有讲,Gradient descent 与 Normal equation方法的优缺点。
Gradient descent 需要设定learning rate,Iteration和 Feature scaling;
Normal equation 不需要Feature scaling,可以直接得到最小值,简单高效。但当样本量过大(比如说超过10,000),矩阵求逆就会比较困难。所以,小样本量可以用Normal equation;大样本量用Gradient descent。

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:

−1 ≤ x(i) ≤ 1

or

−0.5 ≤ x(i) ≤ 0.5

These aren’t exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:

xi:=(xiμi)/si

Where μi is the average of all the values for feature (i) and si is the range of values (max – min), or si is the standard deviation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s