Optimization in ML/DL— 1 (Gradient Descent and its Variant)

3 min readApr 30, 2022

In this blog post, we’ll talk about recognizing, solving, and analyzing optimization problems in machine learning and deep learning.

So Let’s Start…….

Recognizing optimization problems in machine learning and deep learning generally involves finding a way to minimize or maximize a function. This can be done through various methods such as gradient descent, stochastic gradient descent, and conjugate gradient. Once an optimization problem has been recognized, the next step is to solve it. This usually involves using some kind of optimization algorithm to find the best solution. Finally, once a solution has been found, it is important to analyze it to make sure that it is the best possible solution. This can be done by looking at the objective function value, the number of iterations, and the time it took to find the solution.

Let's understand the difference between gradient descent, stochastic gradient descent, and conjugate gradient.

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of the steepest descent as defined by the negative of the gradient.
Stochastic gradient descent is a variation of gradient descent that calculates the gradient of the objective function with respect to the parameters for each training example and updates the parameters after each training example.
The conjugate gradient is an optimization algorithm for solving problems of the form: min f(x), where f is a twice-differentiable function. The algorithm is similar to gradient descent, but instead of using the gradient at each iteration, it uses a direction that is conjugate with the previous direction.

In machine learning and deep learning, optimization problems are often solved by gradient descent. This is a mathematical algorithm that finds the optimum solution to a problem by iteratively moving in the direction of the steepest descent of the objective function. The objective function is a mathematical function that quantifies the goal of the optimization problem. In gradient descent, the optimum solution is found by taking small steps in the direction of the steepest descent until the global optimum is reached. The size of the steps is determined by the learning rate, which is a hyperparameter of the algorithm.

The following are the mathematical and programming aspects of gradient descent and its variants: -

cost function = 1/2m * sum((h(x^(i)) — y^(i))²)
h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 …
theta_j := theta_j — alpha * (1/m) * summation((h(x^(i)) — y^(i)) * x_j^(i))
updating theta_0 variants
for i in range(m):
theta_0 := theta_0 — alpha * (1/m) * summation((h(x^(i)) — y^(i)))
theta_1 := theta_1 — alpha * (1/m) * summation((h(x^(i)) — y^(i)) * x_1^(i))
…

gradient_descent implementation in python

Stochastic Gradient Descent, on the other hand, maybe accomplished as follows:

Conjugate gradient descent may also be achieved in Python as follows:

There is no built-in gradient descent function in the scipy library. However, there is an optimization function called minimize that can be used to find the minimum of a function. The minimize function has a number of different algorithms that can be used, one of which is gradient descent. Kindly check out the link below:-

scipy.optimize.minimize - SciPy v1.8.0 Manual

Minimization of scalar function of one or more variables. Parameters The objective function to be minimized. where is…

docs.scipy.org

Thanks for reading!!!!!!!!!

If you have any questions, please do not hesitate to contact me.

Optimization in ML/DL— 1 (Gradient Descent and its Variant)

scipy.optimize.minimize - SciPy v1.8.0 Manual

Minimization of scalar function of one or more variables. Parameters The objective function to be minimized. where is…

Aman Jaiswal - Machine Learning Engineer - CRMNEXT | LinkedIn

View Aman Jaiswal's profile on LinkedIn, the world's largest professional community. Aman has 3 jobs listed on their…

Written by Aman Jaiswal