July 9, 2021 at 11:13 pmmatthe18Subscriber
I receive this error when my case reaches the solution stage. I suspect memory allocation as the root-cause, but there is no other information given so I cannot be certain. The only hint I have seen to go on, so far, was an older post suggesting to increase the catalogue memory allocation.
My recent case submissions have had the following memory modification flags and all returned code 9:
-size 2.5 -single -large
-size 2.9 -single -large
-size-cat 10.0x -size-nr 2.0x -size-ni 2.0x -single -large
-size-cat 10.0x -size-nr 2.5x -size-ni 2.5x -single -large
-size-cat 10.0x -size-nr 3.0x -size-ni 3.0x -single -large
Any advice or experience with this return code would be very helpful to me. Thank you for considering my issue.July 12, 2021 at 10:56 pmSurya DebAnsys EmployeeHello,
Can you check the out file to see if there are any other details printed out?
Can you embed an image of the out file with relevant errors/warnings?
July 13, 2021 at 2:46 pmmatthe18SubscriberHi SD Thank you for your response. I have attached images of the out files of a few of my most recent attempted runs. Some specifics on memory allocation modifications for each run, hopefully in the correct order of the images:
run15: -size 2.9 -single -large
run18: -size-cat 10.0x -size-nr 2.5x-size-ni 2.5x -single -large
run19: -size-cat 20.0x -size-nr 2.5x-size-ni 2.5x -single -large
run20: -size-cat 30.0x -size-nr 2.5x-size-ni 2.5x -single -large
I have been certainly throwing a lot at the wall to see what sticks. I am noticing the -size modifier has allowed the solution to reach coefficient loop iteration phase on a few occasions (run15 posted as example).
July 13, 2021 at 2:56 pmJuly 14, 2021 at 3:24 pmSurya DebAnsys EmployeeHello,
Since this happens immediately the calculation starts, I would also suspect the initial conditions and/or setup issues.
Could you double check your initial conditions and the setup? Also could you test this by running on different core count?
July 15, 2021 at 1:44 pmmatthe18SubscriberHi SD Thank you for the suggestion. I had a different core count job queuing and was waiting on the results (25% increase in requested nodes/allocatable memory). This job has also returned code 9 on exiting. I had also reached out to someone within the research computing department and they mentioned the signal 9 is most commonly associated with an out-of-memory watchdog. In fact, when digging into it a bit more, they were able to tell me that run20 from above had actually returned a different exit code associated with an out of memory state within our job scheduler that was not reported to either the CFX out file or to the console* out file.
Run20, according to the CFX out file, had a maximum node usage of 75% memory, with an overall total 70% across all nodes. When querying the job information, the scheduler reports a maximum node memory usage of 242.78G (of 256G available per node) which is closer to 95% of the node memory. I was told this maximum memory is the last valid (within boundaries) value returned before violating the maximum memory available. Research computing is suspecting that a possible reason is CFX is somehow requesting more memory than allocated, although the out file does tell me that memory usage may be less than the reported allocation.
As far as my setup goes, I believe it should be valid. I ran the same model as steady-state to initialize this transient model without similar issues. I will see if there are any glaring errors in the setup.
With everything above, my plan moving forward is to try and limit CFX memory to some valid floor limit unless another reason for the exit errors becomes apparent.
Any further advice or recommendations are, of course, welcome.
Thank you again matthe18
*Edit: Looking at the console output again, I missed a line informing me "slurmstepd: error: Detected 939 oom-kill event(s) in step 3979731.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler."
Viewing 5 reply threads
Ansys Innovation Space
- You must be logged in to reply to this topic.
Simulation World 2022
Check out more than 70 different sessions now available on demand. Get inspired as you hear from visionary companies, leading researchers and educators from around the globe on a variety of topics from life-saving improvements in healthcare, to bold new realities of space travel. Take a leap of certainty and check out a session today here.
Earth Rescue – An Ansys Online Series
The climate crisis is here. But so is the human ingenuity to fight it. Earth Rescue reveals what visionary companies are doing today to engineer radical new ideas in the fight against climate change. Click here to watch the first episode.
Subscribe to the Ansys Blog to get great new content about the power of simulation delivered right to your email on a weekly basis. With content from Ansys experts, partners and customers you will learn about product development advances, thought leadership and trends and tips to better use Ansys tools. Sign up here.Trending discussions
- Suppress Fluent to open with GUI while performing in journal file
- Heat transfer coefficient
- What are the differences between CFX and Fluent?
- Floating point exception in Fluent
- The solver failed with a non-zero exit code of : 2
- Difference between K-epsilon and K-omega Turbulence Model
- Getting graph and tabular data from result in workbench mechanical
- Time Step Size and Courant Number
- Mesh Interfaces in ANSYS FLUENT
- error in cfd post
Top Rated Tags
© 2022 Copyright ANSYS, Inc. All rights reserved.Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.