Fluids

Fluids

SIGSEGV error occurs in DPM calculation

Tagged: , ,

    • azhao
      Subscriber

      I encountered the following error while doing a spray cooling simulation via using the DPM model in ANSYS Fluent (No UDF is used in the computation. The error also occurs if it is run with only one thread.):

      -----------------------------------------------------------------------------------------------------------------------------------------------------------------

      iter continuity x-velocity y-velocity z-velocity energy k epsilon time/iter

      133 2.6313e-02 2.0725e-03 2.1301e-03 2.9644e-03 8.5001e-07 2.9960e-03 5.3946e-03 89:00:31 19867

      134 2.6324e-02 1.9835e-03 2.0422e-03 2.6846e-03 7.7135e-07 2.7004e-03 4.4066e-03 87:45:30 19866

      135 2.6077e-02 1.9195e-03 1.9805e-03 2.4792e-03 7.2051e-07 2.5194e-03 3.7644e-03 87:51:39 19865

      136 2.5807e-02 1.8758e-03 1.9460e-03 2.3321e-03 6.8302e-07 2.3802e-03 3.3358e-03 86:50:19 19864

      137 2.5589e-02 1.8433e-03 1.9237e-03 2.2167e-03 6.5023e-07 2.2649e-03 3.0162e-03 86:01:11 19863

      138 2.5368e-02 1.8132e-03 1.8980e-03 2.1193e-03 6.1784e-07 2.1476e-03 2.7179e-03 85:21:51 19862

      139 2.5120e-02 1.7800e-03 1.8689e-03 2.0325e-03 5.8637e-07 2.0338e-03 2.4348e-03 84:50:19 19861

       

      Advancing DPM injections ....

       

      ==============================================================================

       

      Node 0: Process 3148: Received signal SIGSEGV.

       

      ==============================================================================

       

      999999: mpt_accept: error: accept failed: No such file or directory

      ---------------------------------------------------------------------------------------------------------------------------------------------

      But the residuals, monitored wall temperature, and liquid film thickness all seem fine as follows:

    • Rob
      Ansys Employee
      It's not what I'd describe as well converged, try dropping the time step as I suspect the film heat transfer as diverged.
    • azhao
      Subscriber
      Hi Rob I decreased the time step from 2e-5 to 1e-6, this time the solution stopped at 1449 iterations instead of 139 iterations. The following error shows up in the Fluent console when the solution suddenly stops, no signature of divergence is observed before that


      Warning: Injection injection-1: LOST 20 out of 20 injection locations, probably outside the domain. This will inject 50% LESS MASS.

      Injecting 40 particle parcels with mass 5.736e-08
      number tracked = 9678452, trapped = 3, incomplete = 6, splashed = 36

      Warning: 0.0053% of the total discrete phase mass was not tracked for the expected residence time:
      1.6e-05 s less on a mass-weighted average (which is 0.7497% of their total age or 79.8320% of the last time step).

      1440 1.5359e-02 2.3080e-04 2.3941e-04 2.6095e-04 4.4453e-09 1.0656e-03 1.4061e-03 169:18:59 18560
      1441 1.5082e-02 4.5490e-04 4.5574e-04 7.5568e-04 2.1468e-08 2.5880e-03 6.5880e-03 142:57:30 18559

      iter continuity x-velocity y-velocity z-velocity energy k epsilon time/iter
      1442 1.7895e-02 3.8043e-04 3.8361e-04 6.0055e-04 2.3125e-08 2.1936e-03 5.2801e-03 121:58:52 18558
      1443 1.8441e-02 3.3093e-04 3.3607e-04 4.8733e-04 1.4269e-08 1.7974e-03 4.0082e-03 105:10:59 18557
      1444 1.7894e-02 2.9664e-04 3.0237e-04 4.0901e-04 9.7675e-09 1.5475e-03 3.2187e-03 91:44:38 18556
      1445 1.7010e-02 2.7161e-04 2.7733e-04 3.5385e-04 7.2876e-09 1.4972e-03 2.8270e-03 80:58:03 18555
      1446 1.6129e-02 2.5263e-04 2.5850e-04 3.1497e-04 5.9950e-09 1.4424e-03 2.4597e-03 72:34:48 18554
      1447 1.5399e-02 2.3815e-04 2.4289e-04 2.8693e-04 5.1359e-09 1.2345e-03 1.9432e-03 65:38:04 18553
      1448 1.4812e-02 2.2689e-04 2.3153e-04 2.6714e-04 4.5982e-09 1.1365e-03 1.6922e-03 60:23:22 18552
      1449 1.4393e-02 2.1860e-04 2.2246e-04 2.5256e-04 4.2619e-09 1.0698e-03 1.4896e-03 55:57:51 18551

      Advancing DPM injections ....

      ===================================================================================
      = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
      = PID 26680 RUNNING AT kth-7894.ug.kth.se
      = EXIT CODE: 9
      = CLEANING UP REMAINING PROCESSES
      = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
      ===================================================================================
      *** Error in `/home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/lnamd64/3ddp_host/fluent.21.2.0': corrupted double-linked list: 0x0000000005b261c0 ***
      ======= Backtrace: =========
      /lib64/libc.so.6(+0x7f474)[0x7f3d6ae8d474]
      /lib64/libc.so.6(+0x813bd)[0x7f3d6ae8f3bd]
      /home/wmflw/ansys_inc/v212/fluent/lib/lnamd64/libSysC.SystemCouplingParticipant.so(_ZNSt5stackIPKcSt5dequeIS1_SaIS1_EEED2Ev+0x2d)[0x7f3d6b6700fd]
      /lib64/libc.so.6(+0x39ce9)[0x7f3d6ae47ce9]
      /lib64/libc.so.6(+0x39d37)[0x7f3d6ae47d37]
      /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/multiport/lnamd64/net/shared/libmport.so(+0x7b981)[0x7f3d7b4ab981]
      /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/multiport/lnamd64/net/shared/libmport.so(+0x7ba9c)[0x7f3d7b4aba9c]
      /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/multiport/lnamd64/net/shared/libmport.so(+0x83b12)[0x7f3d7b4b3b12]
      /lib64/libpthread.so.0(+0x7ea5)[0x7f3d78578ea5]
      /lib64/libc.so.6(clone+0x6d)[0x7f3d6af0cb0d]
      ======= Memory map: ========
      00400000-028cf000 r-xp 00000000 fd:02 287049513 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/lnamd64/3ddp_host/fluent.21.2.0
      02acf000-02ad2000 r--p 024cf000 fd:02 287049513 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/lnamd64/3ddp_host/fluent.21.2.0
      02ad2000-02e2a000 rw-p 024d2000 fd:02 287049513 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/lnamd64/3ddp_host/fluent.21.2.0
      02e2a000-046cc000 rw-p 00000000 00:00 0
      057af000-05faf000 rw-p 00000000 00:00 0 [heap]
      7f3d50000000-7f3d50021000 rw-p 00000000 00:00 0
      7f3d50021000-7f3d54000000 ---p 00000000 00:00 0
      7f3d551f6000-7f3d5540f000 rw-p 00000000 00:00 0
      7f3d5540f000-7f3d5541b000 r-xp 00000000 fd:02 176185492 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Profiler.Probe.so
      7f3d5541b000-7f3d5561a000 ---p 0000c000 fd:02 176185492 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Profiler.Probe.so
      7f3d5561a000-7f3d5561b000 r--p 0000b000 fd:02 176185492 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Profiler.Probe.so
      7f3d5561b000-7f3d5561c000 rw-p 0000c000 fd:02 176185492 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Profiler.Probe.so
      7f3d5561c000-7f3d55621000 r-xp 00000000 fd:02 176185497 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.so
      7f3d55621000-7f3d55821000 ---p 00005000 fd:02 176185497 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.so
      7f3d55821000-7f3d55822000 r--p 00005000 fd:02 176185497 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.so
      7f3d55822000-7f3d55823000 rw-p 00006000 fd:02 176185497 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.so
      7f3d55823000-7f3d55842000 r-xp 00000000 fd:02 176185494 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Timer.so
      7f3d55842000-7f3d55a41000 ---p 0001f000 fd:02 176185494 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Timer.so
      7f3d55a41000-7f3d55a42000 r--p 0001e000 fd:02 176185494 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Timer.so
      7f3d55a42000-7f3d55a43000 rw-p 0001f000 fd:02 176185494 /home/wmflw/ansys_inc/v212/fluent/fluent21.2.0/addons/afd/lnamd64/libAFD.Kernel.Timer.so
      7f3d55a43000-7f3d55a7f000 r-xp 00000000 fd:02 286305859 /home/wmflw/ansys_inc/v212/fluent/lib/lnamd64/libboost_serialization.so.1.63.0
      7f3d55a7f000-7f3d55c7e000 ---p 0003c000 fd:02 286305859 /home/wmflw/ansys_inc/v212/fluent/lib/lnamd64/libboost_serialization.so.1.63.0
      7f3d55c7e000-7f3d55c81000 r--p 0003b000 fd:02 286305859 /home/wmflw/ansys_inc/v212/fluent/lib/lnamd64/libboost_serialization.so.1.63.0


    • Rob
      Ansys Employee
      Can you check the RAM usage. Over 9M particles may be a little excessive!
    • azhao
      Subscriber
      Hi Rob Sure, actually I have also run this case on a HPC. And the following message is the memory consumption when I used 128 cores distributed on 4 nodes.
      ---------------------------------------------------------------------------------------------------------------------------------
      Job ID: 18844882
      Cluster: tetralith
      User/Group: x_anzha/x_anzha
      State: OUT_OF_MEMORY (exit code 0)
      Nodes: 4
      Cores per node: 32
      CPU Utilized: 09:49:37
      CPU Efficiency: 47.73% of 20:35:12 core-walltime
      Job Wall-clock time: 00:09:39
      Memory Utilized: 355.42 GB (estimated maximum)
      Memory Efficiency: 97.91% of 363.00 GB (2.84 GB/core)

    • Rob
      Ansys Employee
      Should cope unless all the particles finished up on very few cores and can't see enough RAM.
    • azhao
      Subscriber
      But the solution did stop and reported OUT_OF_MEMORY error. May I ask the way of checking how the calculated particles are distributed in different cores?
    • Rob
      Ansys Employee
      They're stored on the cores where the cells are: we generally divide the mesh up and the physics follows. With DPM and VOF that's not always the case. I'd missed the OUT_OF_MEMORY (I need a holiday). Do you need that many particles, and can you reduce the injection or trap any that aren't doing anything useful?
    • azhao
      Subscriber
      :-)
      Yes the DPM boundary of walls is all set to "trap", so you mean even the DPM particles are trapped on the wall, they are still occupying the memory? Sure these particles are not doing anything useful anymore. Is there a way to simply eliminate them?

    • Rob
      Ansys Employee
      Trapped are removed (all three of them), so you're losing some to trap and STILL have over 9M..... ! Assuming this is the same model as the other thread there's something really odd going on. Assuming your supervisor is familiar with the software go through the whole model with them, step by step. It'll be something simple that's causing all of this, but as I can't access the files I'm reliant on pictures which makes it hard to spot.
    • azhao
      Subscriber
      It is unfortunate that I am now in the EECS department, my supervisor is an expert on electric machines instead of CFD.
      I agree with you that 9 M seems too much. It is even more insane that, actually, I assigned only 2 particle streams for the spray injector.
      P.S. may I ask a basic question about a steady Eulerian phase simulation + transient Lagrangian particle tracking setting problem, the settings for DPM are as shown in the figure below. As my understanding, it means every 10 iterations for the Eulerian phase iterations, the Lagrangian particles will be injected once and the injected particles are tracked 10 time steps whose each size is 2e-5s. But I am not sure what do the settings of maximum number of steps and step length factor indicate, I know they are two important parameters in the steady Lagrangian particle tracking, according to the information in this thread:https://www.cfd-online.com/Forums/fluent/226617-dpm-iteration-fluent.html. But why are they useful in the transient Lagrangian particle tracking if we already have the particle time step size and number of time steps?


    • Rob
      Ansys Employee
      Please don't use the Eulerian multiphase with DPM as the particles won't see the secondary phase(s). DDPM does that, but you probably don't need that.
      For a steady flow and transient particle we assume the flow is altered by the particles (if interaction is on as you have set) but we move the particles with time. It's to save computational time, and you need to check carefully to ensure the particles reach a pseudo steady state. Read the theory guide - the maths will either make you sleep or explain everything!
    • azhao
      Subscriber
      Hi, Rob
      Sorry for my inappropriate description, actually my continuous phase part is single phase air, I just used the word Eulerian to discriminate it from the Lagrangian particles tracking.
      Actually, I was working on reading the theory guide and its original references for months, though not finished yet. It is pretty well written and helpful as I did not find any textbook that did the same explanation job on DPM CFD simulation...
    • DrAmine
      Ansys Employee
      Why not using the same time step as the flow? Or injecting less frequently? Or even reducing the particles by concatenation ( that is advanced and require some hidden information for that reason start with the other suggestion) you have a lot of particles suspended in the domain which did not reach a final fate
    • DrAmine
      Ansys Employee
      I see the flow is steady so why using expensive transient particle tracking? I assume because of wall film if yes perhaps first assume trap walls or work with ewf film.
    • azhao
      Subscriber
      Hi, DrAmine By injecting less frequently, do you mean increasing the DPM iteration interval, or decreasing the particle time step size?
      It was exactly as you said, I used the transient particle tracking only because I wanted to turn on the wall film model. I did not use the EWF model since I found it was even more difficult to get it converged. Rob in another thread told me EWF is very sensitive to the quick change of the mesh size. I think this model also has such issue due to the complicated geometry, which is shown in the figure below. The blue surface is the heating surface, I am not sure if EWF can still work on such complicated geometric surface. The ANSYS Fluent theory guide and the corresponding original references seem to have the engine wall as the target and did not have such high curvature and sharp turn surface in their mind...
      I will try with the Lagrangian trap wall first
    • azhao
      Subscriber
      If the DPM trap wall condition is used, this SEGSEGV error does not occur
Viewing 16 reply threads
  • You must be logged in to reply to this topic.