I have worked with nonlinear models for about 10 years without needing to pay attention to the magnitude of the numbers in the NR residual plots or force convergence plots.
I have successfully resolved hundreds of convergence problems without knowing why the value of a residual was high relative to the applied load.
I paid attention to what had to be done to resolve the convergence failure and get to a fully converged solution, when that was possible.
In the images of the NR plots, I can tell that the elements are too large and the contact stiffness too large.
I would first try to soften the contact, which will allow more nodes on the large elements to share the contact load, at the cost of larger penetration.
If that fixes the convergence error with acceptable penetration, declare success.
If softened contact doesn't fix the convergence error, or the penetration when it does is unacceptable, I would try smaller, better shaped elements, and not so much reduction in contact stiffness.