Fluids

Fluids

Topics relate to Fluent, CFX, Turbogrid and more

Using multiple nodes in HPC cluster

    • Gloria
      Subscriber

      I am submitting Fluent batch Jobs to an HPC cluster that works with SLURM as job scheduler. When setting a simulation with 16 processes and defining in my job submission script (.sh file) 16 ntasks, I do get a reduction in time from operating in my local machine. However, when I try to increase the number of nodes I am using, the required time to solve the simulation does not decrease. I am setting 32 processes in the Fluent Launcher and 2 nodes y 16 ntasks per node in the .sh file.

      Does anybody know how can I use multiple nodes efficiently and getting a reduction in the time required to complete the simulation? Thank you in advance!

    • Nilay Pedram
      Subscriber

      Hello,

      Only increasing the processes and nodes alone won't help. This depends on the mesh count too. For each core the number of cells should be 10k maximum; if it is less than that you won't see much difference in the time required to complete the simulation. So if the mesh is same then at one point you won't see a huge difference in the time even if you use multiple nodes. Also, if the core count is increased too much, it will take significant time to pass messages across the cores. and it may end up taking more time. 

      For more information, please refer to Fluent User's Guide - 40.1. Introduction to Parallel Processing (ansys.com)

      If you are not able to access the above link, please follow this forum discussion - How to access the ANSYS Online Help

      Hope this helps you. Thank you!

    • Gloria
      Subscriber

       

      Thank you for your reply, Nilay!

      I think that right now my bottleneck is in the correct assignment of the selected processes into the desired number of nodes. In my job submission script (.sh file), I am fixing #SBATCH — ntasks=32 and #SBATCH – ntasks-per-node=16. Using the “squeue” command I verify that my job is being allocated into two nodes, however, Fluent’s output file shows that all the processes are being allocated to a single node (image attached).

      How can I solve this problem? Anyone has faced this issue before? Thank you in advance for your help!

       

    • Gloria
      Subscriber

       

      Hello! I have received some feedback from my cluster support and it seems that I am not being able to launch my multinode simulations because the ssh configuration is not enabled. Is there any other way to connect the nodes together? They also use infiniband to communicate data between nodes and it is the standard practice for sending data as well as gpfs mounted on all nodes so that data can be seen from any point in the cluster.

      Has anybody faced this issue before or has any idea how to approach it? Any reply will be very much appreciated!

       

Viewing 3 reply threads
  • You must be logged in to reply to this topic.