December 9, 2019 at 4:53 pmshamptonSubscriber
We run AFS with RHEL7 as a shared file system on our HPC clusters. We have a group that heavily uses Fluent. When a job exits abnormally, there is a chance that some search path of the Fluent executable will become inaccessible for all users. Any attempts to run Fluent after this event fail as the directory is no longer visible. The only solution is to have root manually go in and clean out the system cache. This is not reproducible with any regularity, but frequent enough that the group is continually frustrated. I'm writing to see if you may have any insight into file system calls that may be related? Particularly, are there any parallel I/O operations that you're aware of as AFS does not support MPI I/O.
December 10, 2019 at 5:59 pmJakeCAnsys Employee
Unfortunately I don't know anything abut the AFS file system.
When you refer to "System Cache" what are you referring to exactly? What does root need to do exactly to get things working again?
Is it safe to assume that if you run the job on a traditional file system and fluent crashes, that everything in the file structure stays the same and is still accessible?
In other words are you are you only seeing this when using the AFS file system?
I am fairly certain that the MPI processes themselves write out to the disk, but I don't know if multiple processes write out to the same file at the same time.
December 10, 2019 at 6:28 pmMangesh BhideAnsys Employee
- check if it works when the FLUENT installation can be mounted ready only to avoid anything writing to that path or install FLUENT to a path not on the cached file system but onto NFS
- is this openafs, if so then check the openAFS manual if Forcing the Update of Cached Data is helpful or is that what is already being done ? if some other AFS then please refer to its manual for flushing cache. in any case a program exiting abnormally should not affect the operating system's file cache to that program. what is the exact error message seen executing FLUENT subsequently and what is the exact command line used to start FLUENT?
- refer to this and 3 sub topics for information on parallel I/O
see if writing regular dat files instead of parallel dat files helps get around the issue.
December 10, 2019 at 7:30 pmshamptonSubscriber
Thanks for the feedback. I should clarify that the jobs running are primarily writing to our scratch file system and not on AFS. What happens is that a directory completely goes missing on a compute node and that directory is always within the Ansys hierarchy. This is obviously an AFS issue, we just don't understand what about Ansys is causing it as there are no other programs that do this. We'll try moving Ansys to a read-only volume and see if that alleviates the issue. Thanks again.
December 11, 2019 at 10:13 pmMangesh BhideAnsys Employee
thank you for the update, hope moving ANSYS to a different mount helps. I doubt that just running a program could cause an issue with the file system, (especially if mounted read-only).
If flushing the file system cache or rebooting is helping then it would appear that it is getting remounted or re-cached correctly.
If only portions of the path go "missing" after an abort then I wonder if it is a function of cache size and either number of files or size of files or a combination or number and sizes of files that is causing this issue.
Thank you for the understanding and I hope moving FLUENT installation to a different location resolves the issue with this file system. do let us know
- You must be logged in to reply to this topic.
Boost Ansys Fluent Simulations with AWS
Computational Fluid Dynamics (CFD) helps engineers design products in which the flow of fluid components is a significant challenge. These different use cases often require large complex models to solve on a traditional workstation. Click here to join this event to learn how to leverage Ansys Fluids on the cloud, thanks to Ansys Gateway powered by AWS.
Earth Rescue – An Ansys Online Series
The climate crisis is here. But so is the human ingenuity to fight it. Earth Rescue reveals what visionary companies are doing today to engineer radical new ideas in the fight against climate change. Click here to watch the first episode.
Subscribe to the Ansys Blog to get great new content about the power of simulation delivered right to your email on a weekly basis. With content from Ansys experts, partners and customers you will learn about product development advances, thought leadership and trends and tips to better use Ansys tools. Sign up here.
- License Error
- Error with workbench SceneGraphChart
- How can I renew ANSYS student version license?
- Workbench not opening
- Workbench error
- Sizing on Ansys Workbench 19.2
- Error: Exception of type ‘Ansys.Fluent.Cortex.Cortex not availableException’ was thrown
- Ansys2021R2 ansys212 seg faults immediately on RHEL8.2
- Licensing error while opening ANSYS Mechanical
- An error occurred when the post processor attempted to load a specific result.
© 2023 Copyright ANSYS, Inc. All rights reserved.