TAGGED: batch-hpc, batch-script
March 22, 2023 at 11:43 amGloriaSubscriber
I am submitting Fluent batch Jobs to an HPC cluster that works with SLURM as job scheduler. When setting a simulation with 16 processes and defining in my job submission script (.sh file) 16 ntasks, I do get a reduction in time from operating in my local machine. However, when I try to increase the number of nodes I am using, the required time to solve the simulation does not decrease. I am setting 32 processes in the Fluent Launcher and 2 nodes y 16 ntasks per node in the .sh file.
Does anybody know how can I use multiple nodes efficiently and getting a reduction in the time required to complete the simulation? Thank you in advance!
March 31, 2023 at 9:58 amNilay PedramSubscriber
Only increasing the processes and nodes alone won't help. This depends on the mesh count too. For each core the number of cells should be 10k maximum; if it is less than that you won't see much difference in the time required to complete the simulation. So if the mesh is same then at one point you won't see a huge difference in the time even if you use multiple nodes. Also, if the core count is increased too much, it will take significant time to pass messages across the cores. and it may end up taking more time.
For more information, please refer to Fluent User's Guide - 40.1. Introduction to Parallel Processing (ansys.com)
If you are not able to access the above link, please follow this forum discussion - How to access the ANSYS Online Help
Hope this helps you. Thank you!
April 21, 2023 at 8:38 amGloriaSubscriber
Thank you for your reply, Nilay!
I think that right now my bottleneck is in the correct assignment of the selected processes into the desired number of nodes. In my job submission script (.sh file), I am fixing #SBATCH — ntasks=32 and #SBATCH – ntasks-per-node=16. Using the “squeue” command I verify that my job is being allocated into two nodes, however, Fluent’s output file shows that all the processes are being allocated to a single node (image attached).
How can I solve this problem? Anyone has faced this issue before? Thank you in advance for your help!
May 5, 2023 at 8:55 amGloriaSubscriber
Hello! I have received some feedback from my cluster support and it seems that I am not being able to launch my multinode simulations because the ssh configuration is not enabled. Is there any other way to connect the nodes together? They also use infiniband to communicate data between nodes and it is the standard practice for sending data as well as gpfs mounted on all nodes so that data can be seen from any point in the cluster.
Has anybody faced this issue before or has any idea how to approach it? Any reply will be very much appreciated!
- You must be logged in to reply to this topic.
Boost Ansys Fluent Simulations with AWS
Computational Fluid Dynamics (CFD) helps engineers design products in which the flow of fluid components is a significant challenge. These different use cases often require large complex models to solve on a traditional workstation. Click here to join this event to learn how to leverage Ansys Fluids on the cloud, thanks to Ansys Gateway powered by AWS.
Earth Rescue – An Ansys Online Series
The climate crisis is here. But so is the human ingenuity to fight it. Earth Rescue reveals what visionary companies are doing today to engineer radical new ideas in the fight against climate change. Click here to watch the first episode.
Subscribe to the Ansys Blog to get great new content about the power of simulation delivered right to your email on a weekly basis. With content from Ansys experts, partners and customers you will learn about product development advances, thought leadership and trends and tips to better use Ansys tools. Sign up here.
- Suppress Fluent to open with GUI while performing in journal file
- Floating point exception in Fluent
- What are the differences between CFX and Fluent?
- Heat transfer coefficient
- Getting graph and tabular data from result in workbench mechanical
- Difference between K-epsilon and K-omega Turbulence Model
- The solver failed with a non-zero exit code of : 2
- Time Step Size and Courant Number
- Mesh Interfaces in ANSYS FLUENT
- error: Received signal SIGSEGV
© 2023 Copyright ANSYS, Inc. All rights reserved.