Distributing Frank-Wolfe via Map-Reduce
Stratis Ioannidis
12 October 2017, 10h30 - 12 October 2017, 12h00 Salle/Bat : 455/PCRI-N
Contact :
Activités de recherche : Gestion de données du Web
Résumé :
Large-scale optimization problems abound in data mining and
machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank-Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection
to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: we solve problems with 10 million variables using 350 cores in 44 minutes; the same operation takes 133 hours when executed serially.