Fluids

Fluids

Fluent MPI Error

    • soloviev
      Subscriber

      Hello,


      I am running a model which a support person at ANSYS helped create and successfully ran on their personal computer. When I try to run it does not move past 13 time steps before crashing with the following error:


       *** Error in `/cm/shared/apps/ansys_inc/v194/fluent/fluent19.4.0/lnamd64/3ddp_node/fluent_mpi.19.4.0': double free or corruption (!prev): 0x00000000058dd3e0 ***


      MPI was killed with signal 9.


       


      I had IT look at our workstation and HPC and everything was fine. 


       


      I also already tried switching the MPI type in the launcher, which did not change anything. 


       


      What could be causing this and what could be a solution?


       


      Thanks,


      Alex

    • Rob
      Ansys Employee

      If it's run for some steps it's usually either diverged (the mpi errors are triggered as the node(s) fail) or the hardware has done something interesting.  Can you check where it's saving monitors etc to make sure they're OK and there is still disc space. Also try running on +/- one node incase it's a parallel issue. 


      Does anything happen at 12-14 timesteps into the calculation? Eg mesh motion. 

    • soloviev
      Subscriber

      Mesh adaption, periodic boundaries, and evaporation were all turned on at the 100 time step mark, and this error occurs 13 time steps after that. The ansys contact said it ran past this on their personal computer. The files I am loading are the exact same as those which he ran. 


      I have two error files from two runs, which have different error outputs. Please see below:


       


       





       


      Thanks,


      Alex

    • Rob
      Ansys Employee

      Turn off (dynamic?) adaption and run the model.  If you adapt after 13 timesteps how localised would the increase in cell count be? How much RAM have you got per node (and will it be enough)? 

    • soloviev
      Subscriber

      We have 192GB of RAM per node, and currently running on 12 nodes. 


      I turned off dynamic mesh adaption and the model ran past 13 time steps.


       


      Thanks,
      Alex

    • DrAmine
      Ansys Employee
      Which adaptive mesh method is used? How many levels if refinements? I would highly recommend to switch to more actual release when it comes to dynamic grid adaption and vof2dpm as mentioned in other posts.
    • soloviev
      Subscriber

      I am running on 2019R2. 


       


      These are the current adaption settings:



      Calculation activities: 



       



      Execute commands: defined commands: 3 



       



      Command-1 every 50 time steps par part meth metis 



       



      Command-2 every 50 time steps par part reorder-partitions-to-arch 



       



      Command-3 every 50 time steps par part use-stored-part 



       



      Mesh adaption: 



       



      Refinement criterion: dynamic adapt refine 



       



      -field value; more than 1e-08 water vof  



       



      Coarsening criterion: dynamic adapt coarsen 



       



      -field value; less than 1e-14 water vof 



       



      Minimum cell volume: 1e-14 



       



      Maximum refinement level: 3



       



      Dynamic adaption: frequency 4 



       


      Thanks,


      Alex


       
    • soloviev
      Subscriber

      Hello,


       


      Is there any update to this issue?


       


      Thanks,
      Alex

    • puh69
      Subscriber
      Alex,nnDid you get anywhere with this. I'm having a similar MPI problem when running my overset mesh.n
    • Rob
      Ansys Employee
      Check the error right at the top of the message. It may be memory related or could be the solver:the MPI can just mean the node crashed as a result of something happening which is hopefully contained in the first line or so of the message. n
    • puh69
      Subscriber
      ,nnThanks for the response on this thread. I actually started my own discussion on the matter: https://forum.ansys.com/discussion/22932/hpc-failure-fluent#latest. I attached the entire output message file in this comment. My 15 million cell case runs with 16Gb of PPN (4N/20PPN) which is crazy but wont run with anything less.nnThanks,nnPiercenn
    • Rob
      Ansys Employee
      OK, thanks. I'll leave that thread open and leave this one for Alex to comment. n
Viewing 11 reply threads
  • You must be logged in to reply to this topic.