**<< Previous
page TOC INDEX Next
page >>**

## Gradient Descent

Concavity is valuable in connection with the

Gradient Methodof minimizing with respect to .

Definition.Thegradientof the error measure is defined as the column vector

Definition.TheGradient Method(Cauchy) is defined as follows.Given , compute

where is thegradientof at , and is chosen as the smallest nonnegative local minimizer of

Cauchy originally proposed to find the value of which gave a global minimum of . This, however, is not always feasible in practice.Some general results regarding the Gradient Method are given below.

Theorem.If is a local minimizer of , and exists, then .

Theorem.The gradient method is adescentmethod, i.e., .

Definition., , is said to be in the class if all th order partial derivatives of with respect to the components of are continuous on .

Definition.TheHessianof at is defined as the matrixof second-order partial derivatives,

where denotes the th component of , , and denotes the matrix entry at the th row and th column.The Hessian of every element of is a

symmetric matrix[Williamsonet al.1972]. This is because continuous second-order partials satisfy

Theorem.If , then any cluster point of the gradient sequence is necessarily astationary point, i.e., .

Theorem.Let denote the concave hull of . If , and there exist positive constants and such that

for all and for all , then the gradient method beginning with any point in converges to a point . Moreover, is the unique global minimizer of in .By the norm equivalence theorem [Ortega 1972], Eq. (5) is satisfied whenever is a

normon for each . Since belongs to , it is a symmetric matrix. It is also bounded since it is continuous over a compact set. Thus a sufficient requirement is that bepositive definiteon . Positive definiteness of can be viewed as ``positive curvature'' of at each point of which corresponds tostrict concavityof on.