文摘
Convolutional neural networks (CNNs) are state-of-the-art machine learning algorithm in low-resolution vision tasks and are widely applied in many applications. However, the training process of them is very time-consuming. As a result, many approaches have been proposed in which parallelization is one of the most effective. In this article, we parallelized a classic CNN on a new platform of Intel \(^{{\textregistered }}\) Xeon Phi \(^{{{\text {TM}}}}\) Coprocessor with OpenMP. Our implementation acquired 131 \(\times \) speedup against the serial version running on the coprocessor itself and 8.3 \(\times \) speedup against the serial baseline on the Xeon \(^{{\textregistered }}\) E5-2697 CPU.