March 12, 2021 at 11:57 amheisenmechSubscriberHi all,nI've been experiencing MPI problems on clusters, leading to failure of simulations. It is odd because it is very inconsistent as it sometimes runs for days then fails, and sometimes it fails almost instantly. The cases run on the cluster were tested on a different cluster and it was all fine. nI usually have a mesh with 30+ million nodes-structured for LES. I was curious if anyone else was also experiencing such parallelisation problems with Fluent (v20.1).nPlease see below for the error. nBest,nOguzhafluent_mpi.20.1.0: Rank 0:84: MPI_Bcast: 863: IBV connection to 96 (pid 20275) on channel 0 is broken. ibv_poll_cq(): bad status 12nfluent_mpi.20.1.0: Rank 0:84: MPI_Bcast: self cnode1033 peer cnode1034 (rank: 96)nfluent_mpi.20.1.0: Rank 0:84: MPI_Bcast: error message: transport retry exceeded errornfluent_mpi.20.1.0: Rank 0:84: MPI_Bcast: Internal MPI errornsrun: forcing job terminationnsrun: Job step aborted: Waiting up to 32 seconds for job step to finish.nslurmstepd: error: *** STEP 2498796.0 ON cnode1005 CANCELLED AT 2021-03-06T06:56:26 ***nsrun: error: cnode1101: tasks 147,156,158: Killednsrun: Terminating job step 2498796.0nsrun: error: cnode1101: task 146: Killednsrun: error: cnode1005: tasks 2,5-6,13: Killednsrun: error: cnode1100: task 140: Killednsrun: error: cnode1034: task 102: Killednsrun: error: cnode1101: tasks 150,157: Killednsrun: error: cnode1005: tasks 3,7,9-11,14: Killednsrun: error: cnode1006: tasks 17-19,22,24,27,31: Killednsrun: error: cnode1100: tasks 133,135,141-143: Killednsrun: error: cnode1024: task 45: Killednsrun: error: cnode1032: tasks 65-66,74,77: Killednsrun: error: cnode1033: task 84: Exited with exit code 16nsrun: error: cnode1033: tasks 87,89,92,94-95: Killednsrun: error: cnode1101: task 152: Killednsrun: error: cnode1025: tasks 48,51,59-60: Killednsrun: error: cnode1005: tasks 1,4: Killednsrun: error: cnode1034: tasks 98,108,110: Killednsrun: error: cnode1060: task 116: Killednsrun: error: cnode1032: tasks 67-68,73,75: Killednsrun: error: cnode1033: tasks 85,91: Killednsrun: error: cnode1101: task 155: Killednsrun: error: cnode1025: tasks 52,61: Killednsrun: error: cnode1034: task 101: Killednsrun: error: cnode1100: tasks 131,134,136,138: Killednsrun: error: cnode1032: task 79: Killedn The fluent process could not be started.nnrealt1476m53.643snusert5m24.318snsyst2m54.340snn
March 12, 2021 at 2:14 pmRobAnsys EmployeeIf it's random check on the system side. You're looking for RAM leaks (I'm not aware of any issues) and random acts of IT. Is the head node also on the cluster? n
March 13, 2021 at 9:52 pmheisenmechSubscriberWe've been testing different solvers as well, and no issues with them at all. IT people wanted me to check with ANSYS if it's some sort of bug with parallelisation. Yes, the head node is on the cluster. n
- You must be logged in to reply to this topic.
Earth Rescue – An Ansys Online Series
- Suppress Fluent to open with GUI while performing in journal file
- Floating point exception in Fluent
- What are the differences between CFX and Fluent?
- Heat transfer coefficient
- Getting graph and tabular data from result in workbench mechanical
- The solver failed with a non-zero exit code of : 2
- Difference between K-epsilon and K-omega Turbulence Model
- Time Step Size and Courant Number
- Mesh Interfaces in ANSYS FLUENT
- error in cfd post
© 2023 Copyright ANSYS, Inc. All rights reserved.