Maintaining High Performance Across All Problem Sizes and Parallel Scales Using Microkernel-based Linear Algebra