Fluids

Fluids

Topics relate to Fluent, CFX, Turbogrid and more

Discrepancy Between Wall Clock Time per Iteration and Actual Time per Iteration

    • rohant
      Subscriber

      I am working on a 2-phase flow simulation on an HPC cluster. The simplest case uses a single node with 24 cores and 8gb RAM per core (reserved the entire node). When I start the simulation I see a wall-clock time per iteration of around .017. Visually, the console shows quick residual printouts per iteration that matches this speed. However, over time this visual printout in the console slows down. When I check the Parallel -> Usage tab I see roughly the same wall clock time per iteration. When I time it with a stop watch I get a completely different value (much higher). If I pause and re-start the simulation the console printouts appear to speed up again. 

      What's the issue here? I originially had monitors running every 100 iterations - I ran a test without monitors for some time and saw the same phenomenon. I have a workstation that I use and did not see any noticable visual slowdown in residual printouts in the console. Is there something to look out for when running on HPC that could cause this? Any insight would be helpful, thank you. 

    • DrAmine
      Ansys Employee

      Can you please add screenshots describing the issue you are currently facing? 

      What do you mean with pause and re-start: are you re-starting in new Fluent session?

      Which release are you using?

    • rohant
      Subscriber

      It is tough to show exactly what I mean since the time/iteration and the Parallel -> Timer -> Usage -> "Avereage wall-clock time per iteration" are roughly the same throughout the simulation. However, when I watch the console in real time it is clear that there is a slow down.

      "Slowed Down" Speed:

      Initial Speed: 

      Pause and re-start to me means hitting "stop at the end of time-step" waiting for the simulation to stop and then pressing calculate again. I am sure that we would see the same speed increase by saving the case and data and re-opening another instance of fluent.

      I am using 2021/R2. 

      I am not sure if this is a linux cache issue as well, have you seen anything similar for Linux HPCs? 

    • DrAmine
      Ansys Employee

      When referring to slowing down speed: are you referrin to that column time/iter?

       

      Please do not rely on that if yes. The ultimative test will be to run the case via journal in batch and introduce some time stamps to compare between certain parts in the journal. You might there also rely on the benchmark command or just create scheme variables to store time before and after and it prints you the acculated time. Slow down might have several reasons: software, OS and hardware!

    • rohant
      Subscriber

      I attempted to output /parallel/timer/usage and (benchmark'(iterate 10)) every 1000 timesteps to see where the issue could be occurring. I also compared the difference between running in batch mode vs. running in the cluster OpenOnDemand GUI. Typically I have been running interactive Fluent in the cluster GUI since it makes it easier to see simulation progress over time. It seems like the GUI has a similar parallel usage time, but higher benchmark values. The following are done after the same number of timesteps (1999)

      GUI:

      Batch:

      I am not sure on the meaning of the benchmark values (cpu-time, solver, elapsed) - can you please let me know? I haven't found much documentation on this online. 

    • DrAmine
      Ansys Employee

      That is the time including all I/O and solver time -> Elapsed time. It seems that the batch solution requires much less time compared to the GUI one for the 10 iterations. The other output is not really helpful as you have done it for different number of iterations (or your time steps required completely different number of outer iterations) but does not depict huge differences.

      You might use a standard case which you can then benchmark on your ressources to check if the timing you are getting are appropriate or not.

      Back to your first question regarding starting, then getting slow, stopping then continuing: it is hard for me from here to debug on that.

Viewing 5 reply threads
  • You must be logged in to reply to this topic.