Accelerating MPI collective communications through hierarchical algorithms with flexible inter-node communication and imbalance awareness