I have done a few testings following your suggestions and seem to understand what was going on.
I am not a computer hardware expert. Please forgive me if I say anything wrong.
Our CPU on the server is Intel Xeon E5-2695 V2. It has 48 processors in total. We have been running simulations with the number of processes set to 48, which means that it will use all processes. If another simulation with 48 processes is being run at the same time, each processor will be responsible for both simulations. However, one processor can only handle one task at a time so it has to switch between two simulations. The switching may be slow which causes both simulations to also slow down significantly.
I tried running 2 simulations with each one limited to 24 processes. It turned out that the speed for each process (in Mnode/s) is not significantly slower than one sim with 48 processes.
I think our best strategy moving forward is to limit 24 processes per simulation.
Thanks for your help again.