A systolic algorithm rhythmically computes and passes data through a network of processors . We investigate the performance of systolic algorithms for implementing the gravitational N -body problem on distributed-memory computers . Systolic algorithms minimize memory requirements by distributing the particles between processors . We show that the performance of systolic routines can be greatly enhanced by the use of non-blocking communication , which allows particle coordinates to be communicated at the same time that force calculations are being carried out . The performance enhancement is particularly great when block sizes are small , i.e . when only a small fraction of the N particles need their forces computed in each time step . Hyper-systolic algorithms reduce the communication complexity from O ( Np ) , with p the processor number , to O ( N \sqrt { p } ) , at the expense of increased memory demands . We describe a hyper-systolic algorithm that will work with a block time step algorithm and analyze its performance . As an example of an application requiring large N , we use the systolic algorithm to carry out direct-summation simulations using 10 ^ { 6 } particles of the Brownian motion of the supermassive black hole at the center of the Milky Way galaxy . We predict a 3D random velocity of \sim 0.4 km s ^ { -1 } for the black hole .