When we parallelise the model we split the load over two (or more) nodes. If we split with equal cells that's likely to give a good speed up. Now, for a pipe, if we split across the axis the partition has fewest cells: if we split along the axis the partition has most cells. The former results in less data transfer between nodes so is most efficient.
However, to test your code you want the partition to cut the boundary of interest. Otherwise it's not a very good test. Hence using the manual partition and reviewing the options other than Metis.