General Mechanical

General Mechanical

Out of memory on HPC cluster

    • helen.durand
      Subscriber

      Hello, I am running transient structural simulations on a HPC cluster (using input .dat files) and have been running into 'out of memory' issues. I am aware that the mesh can be coarsened to alliviate this issue (especially away from areas of interest), but I was wondering if there are other techniques to address this problem.

      I am also implementing APDL command blocks in the simulation. Could reducing the calulations in these codes (by rewriting the code in a more optimal way, for example) help?

      Could adjusting 'Analysis Setings' > 'Output Controls' or 'Analysis Setings' > 'Analysis Data Management' help?

      Other ideas or feedback would be apprciated. Thank you!

    • Mike Rife
      Ansys Employee

      Hi Helen

      Can you post the specific warning/error message(s) that you have been receiving?  Also is there a message about pivoting having been activated?

      Mike

    • helen.durand
      Subscriber

       

      Here is the error. I do not think I have gotten any errors about pivoting.

      /XXX/el7/pre-compiled/ansys/2020r1/v201/ansys/bin/ansysdis201: line 77:  2367 Killed                  /XXXel7/pre-compiled/ansys/2020r1/v201/ansys/bin/linx64/ansys.e -dis -mpi INTELMPI -j “file” -s read -b -i “./v19_1_struct.dat” -o “./outputfile.out”
      srun: error: YYY8: task 0: Out Of Memory
      slurmstepd: error: Detected 484 oom-kill event(s) in StepId=11323733.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
      [mpiexec@YYY8] HYDT_bscu_wait_for_completion (../../tools/bootstrap/utils/bscu_wait.c:151): one of the processes terminated badly; aborting
      [mpiexec@YYY8] HYDT_bsci_wait_for_completion (../../tools/bootstrap/src/bsci_wait.c:36): launcher returned error waiting for completion
      [mpiexec@YYY8] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:540): launcher returned error waiting for completion
      [mpiexec@YYY8] main (../../ui/mpich/mpiexec.c:1149): process manager error waiting for completion
      slurmstepd: error: Detected 484 oom-kill event(s) in StepId=11323733.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

      Also, these are the warnings and errors from the Solver Output. I do not suspect these are the cause of any issues.

      /COM,ANSYS RELEASE 2020 R1           BUILD 20.1      UP20191203       13:33:04

       *** WARNING ***                         CP =     180.521   TIME= 13:36:34
       No *DO trips needed, enter *ENDDO .                                     

       *** WARNING ***                         CP =  700635.438   TIME= 05:21:47
       Element shape checking is currently inactive.  Issue SHPP,ON or         
       SHPP,WARN to reactivate, if desired.    

       

       

    • Mike Rife
      Ansys Employee

      Hi Helen

      Well the Cgroup is outside of Ansys' control.  Are there any messages in the mapdl output file regarding an increase in database size?  At what step was the job when it was killed?  I.E. had it reached the point of solving?  Mike

Viewing 3 reply threads
  • You must be logged in to reply to this topic.