Ok that model is very, very small with respect to GPU acceleration (of a solve). Without knowing anything about the cluster, other than the compute nodes have some model of Intel i7 cpus, I'd suggest to run a test to compare using one GPU. First change the model so that it is not solving for all 3050 sub-steps. We only need to solve for a few in order to compare compute performance. So change the loading set up to solve for maybe 10 sub-steps. Or maybe just the first 2-3 load steps. Next I usually start with 50,000 degrees of freedom per CPU core as a baseline test. If the CPU was a leading edge model then I'd take that down to around 30,000. But with 50k dof per core I'd try solving on 8 CPU cores to start (I also prefer even numbers!). When done make a copy of the output and pcs files, then solve again on 8 CPU cores plus 1 of the GPUs. Make a copy of the resulting output and pcs files. Lastly try using 4 CPU cores and 1 GPU. Save the files then report back the total CPU time and the total elapsed time for each solution.