If the penetration you see is acceptable and this contact patch is not in an area of concern for stress or positional accuracy, then it is fine to allow higher penetration to fix a convergence problem. The example where the penetration tolerance was too large was an interference fit problem: a post in a hole that is smaller in diameter than the post. There are textbook equations to predict the stress of that geometry. The default normal stiffness may leave too much penetration and result in a few percent error in the value of stress compared with the theoretical result. The solution is either to reduce the penetration tolerance to a specific value or increase the normal stiffness. I expect that under-the-hood, the solver is simply increasing the normal stiffness and checking if the penetration tolerance is met.
St. Venant's principle is useful for stress, but it is not useful for rigid body rotations. Say you have a 2 m long beam that has a hinge on one end and a rotation stop that is only 0.02 m away. That rotation stop is a frictional contact, but because the motion of the tip of the beam is 100 times greater than the motion of the contact point with the rotation stop, a small penetration at the stop is magnified by a factor of 100 at the tip. Since the penetration is not real, but an artifact of the solution, it is important in that model (like the interference fit) to ensure that the penetration is very small. Yes, that will cause extra iterations to occur to reach this small value, but that is necessary in this example. In other cases, the penetration may not matter.
I don't know what the program controlled default value is for penetration tolerance, but it's not going to be a fixed value, because that wouldn't work when the geometry the solver is given can be measured in meters or microns. It is likely a value computed from the local stiffness. Just like in the Normal Stiffness, you can enter a Factor to increase or decrease the automatically computed value, or you can enter a value in distance units.