Optimization in ML/DL— 1 (Gradient Descent and its Variant)

Aman Jaiswal
3 min readApr 30, 2022

In this blog post, we’ll talk about recognizing, solving, and analyzing optimization problems in machine learning and deep learning.

So Let’s Start…….

Recognizing optimization problems in machine learning and deep learning generally involves finding a way to minimize or maximize a function. This can be done through various methods such as gradient descent, stochastic gradient descent, and conjugate gradient. Once an optimization problem has been recognized, the next step is to solve it. This usually involves using some kind of optimization algorithm to find the best solution. Finally, once a solution has been found, it is important to analyze it to make sure that it is the best possible solution. This can be done by looking at the objective function value, the number of iterations, and the time it took to find the solution.

Let's understand the difference between gradient descent, stochastic gradient descent, and conjugate gradient.

  • Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of the steepest descent as defined by the negative of the gradient.
  • Stochastic gradient descent is a variation of gradient descent that calculates the gradient of the objective function with respect to the parameters for each training example and updates the parameters after each training example.
  • The conjugate gradient is an optimization algorithm for solving problems of the form: min f(x), where f is a twice-differentiable function. The algorithm is similar to gradient descent, but instead of using the gradient at each iteration, it uses a direction that is conjugate with the previous direction.

In machine learning and deep learning, optimization problems are often solved by gradient descent. This is a mathematical algorithm that finds the optimum solution to a problem by iteratively moving in the direction of the steepest descent of the objective function. The objective function is a mathematical function that quantifies the goal of the optimization problem. In gradient descent, the optimum solution is found by taking small steps in the direction of the steepest descent until the global optimum is reached. The size of the steps is determined by the learning rate, which is a hyperparameter of the algorithm.

The following are the mathematical and programming aspects of gradient descent and its variants: -

cost function = 1/2m * sum((h(x^(i)) — y^(i))²)

h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 …

theta_j := theta_j — alpha * (1/m) * summation((h(x^(i)) — y^(i)) * x_j^(i))

updating theta_0 variants

for i in range(m):

theta_0 := theta_0 — alpha * (1/m) * summation((h(x^(i)) — y^(i)))

theta_1 := theta_1 — alpha * (1/m) * summation((h(x^(i)) — y^(i)) * x_1^(i))

gradient_descent implementation in python

Stochastic Gradient Descent, on the other hand, maybe accomplished as follows:

SGD implementation in python

Conjugate gradient descent may also be achieved in Python as follows:

Conjugate_Gradient_Descent-Part1
Conjugate_Gradient_Descent-Part2

There is no built-in gradient descent function in the scipy library. However, there is an optimization function called minimize that can be used to find the minimum of a function. The minimize function has a number of different algorithms that can be used, one of which is gradient descent. Kindly check out the link below:-

Thanks for reading!!!!!!!!!

If you have any questions, please do not hesitate to contact me.

--

--