Python hybrid multiprocessing / MPI with shared memory in the same node -
i have python application needs load same large array (~4 gb) , parallel function on chunks of array. array starts off saved disk.
i typically run application on cluster computer like, say, 10 nodes, each node of has 8 compute cores , total ram of around 32gb.
the easiest approach (which doesn't work) n=80 mpi4py. reason doesn't work each mpi core load 4gb map, , exhaust 32gb of ram resulting in memoryerror.
an alternative rank=0 process loads 4gb array, , farms out chunks of array rest of mpi cores -- approach slow because of network bandwidth issues.
the best approach if 1 core in each node loads 4gb array , array made available shared memory (through multiprocessing?) remaining 7 cores on each node.
how can achieve this? how can have mpi aware of nodes , make coordinate multiprocessing?
the multiprocessing module not have shared memory.
you @ joblib way share large numpy arrays, using memory views. use manual memory mapping avoid duplicating data.
to find way pass data once on each node, go launching 1 mpi process per node , use joblib remaining computation, automatically uses memmaping large numpy array input.
Comments
Post a Comment