A Generic TCP/IP and File-based Communication Library for Heterogeneous Parallel Computer

Hiroya Matsuba (Information Technology Center, The University of Tokyo)

Integration of simulation, data analytics, and machine learning requires communication between different types of parallel computers, such as ones with a massive number of nodes and ones with accelerators. Parallel computers of different types are usually installed as independent systems. MPI handles communication in each system, but it cannot do for communication across the systems. WaitIO is a new communication library that provides a communication infrastructure for applications that utilize multiple parallel computers. While providing similar APIs with MPI, it provides communication functionalities between parallel computers where TCP/IP or shared file systems are only the way of communication. This talk introduces the basic design of WaitIO and reports its performance and scalabilities.