• Researchers have discovered that gradient descent has a theory that is not fully understood.
• Gradient descent uses the cost function to find the lowest value on the graph.
• Gradient descent algorithms pick a point and calculate the slope of the curve around it.
• Conventional wisdom is to take small steps, but automated testing techniques have shown that large steps can be optimal.
• Research has shown that the optimal stride length depends on the number of steps in the sequence.
• The cyclic approach to gradient descent may be more efficient, but will not change its current use.
• The results of the study raise a theoretical puzzle about the structure that governs the best decisions.