• Renneder@sh.itjust.worksOPM
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago
    •  Researchers have discovered that gradient descent has a theory that is not fully understood. 
    
    •  Gradient descent uses the cost function to find the lowest value on the graph. 
    
    •  Gradient descent algorithms pick a point and calculate the slope of the curve around it. 
    
    •  Conventional wisdom is to take small steps, but automated testing techniques have shown that large steps can be optimal. 
    
    •  Research has shown that the optimal stride length depends on the number of steps in the sequence. 
    
    •  The cyclic approach to gradient descent may be more efficient, but will not change its current use. 
    
    •  The results of the study raise a theoretical puzzle about the structure that governs the best decisions.