-
-
November 17, 2018 at 2:02 am
Xingchun Wang
SubscriberHi,
Does anyone have experience on running Fluent on a cluster with Torque as the job scheduler?
I know Fluent is integrated with PBS Pro job scheduler, however, we have Torque, basically, they are very similar, but when starting fluent with [-pbs] flag, the system returns errors, if we start fluent with PBS script, then we are only able to use only one computing node which includes 36 processors.
Now we want to use more computing nodes, how should we do that?
or is there any way to manually control spawning nodes?
-
November 27, 2018 at 2:08 am
tsiriaks
Ansys EmployeeHey Xingchun,
Sorry for the delay. I rarely look at this section. I would recommend to submit your future questions about cluster/RSM setup in 'Installation and Licensing' section. This 'systems' section is for Physics-systems , ref
https://www.ansys.com/products/systems
Now for your question, let me ask around about this and I will let you know what I find out tomorrow.
Thanks,
Win
-
November 28, 2018 at 5:23 pm
Xingchun Wang
SubscriberHi Tsiriaks,
Thank you for your suggestions, and help I just move it to the section.
Let me know if you need any other necessary information.
All the best
Xingchun
-
November 28, 2018 at 5:29 pm
tsiriaks
Ansys EmployeeHi Xingchun,
Thank you for moving it here.
I'm still asking around. Someone will help you soon.
One info that might be useful at some point is what is the OS of the cluster ? If I remember correctly from past SRs, your systems are usually running Ubuntu (which is not supported as we discussed)
Thanks,
Win
-
November 28, 2018 at 5:52 pm
JakeC
Ansys EmployeeHi Xingchun,
Can you paste in the command you are using to launch using the -pbs flag, and post what the errors are exactly that you are getting?
Also what does the PBS script do that you mention? Can you paste in the contents of that script? If you request more than 36 cores what happens? Can you also paste in the final submission command that is called from the PBS script?
Depending on how that script works, you may need to provide a machine list to fluent to use, as opposed to setting the number of cores.
Thank you,
Jake
-
November 28, 2018 at 9:35 pm
Xingchun Wang
SubscriberNo, it's running CentOS 7.3
-
November 28, 2018 at 10:09 pm
Xingchun Wang
SubscriberHi, Jack
Thanks for replying,
The PBS script is : [fluent -g -t72 3ddp -mpi=openmpi -pdefault -pbs] when using the command, it automatically submit a job on the scheduler, and basically the process won't continue, so an alternative way is providing the machine list as you mentioned, please see the following image
This way will generate the same result as using -pbs flag, here I also attached a screenshot to show the result
If we request more than 36 cores, which means more than 1 computation node, the cluster will be spawning all the process on single computation node, lets say if we request 72 cores, the result is on one single node, it spawns 72 fluent processes, which in fact low down the computing speed.
Hope this information helps
All the best
Xingchun Wang
-
November 29, 2018 at 8:13 pm
JakeC
Ansys EmployeeHi Xingchun,
Can you print out the contents of pnodes.txt and ncpus in that script, and post the results?
Also can you confirm that passwordless ssh is set up for your user? Meaning you can ssh between compute nodes without it asking you for a password.
Lastly what type of interconnect do you have between the compute nodes? Right now it is trying to use ethernet, but do you have infiniband or something similar?
Thank you,
Jake
-
November 29, 2018 at 9:06 pm
Xingchun Wang
SubscriberHi Jake,
No problem!
For the first question, pnodes is a file be generated every time the script runs, for example, it looks like:
and for ncpus (number of cpus), I copied that part of code from the user manual, originally it was exported as a system variable and was passed to -t as [-t$ncpus], but I didn't adopt that method.
and yes, I can ssh between compute nodes without password, for the interconnection, I'm not very sure, but I actually tried all of them, and I don't think it help so I just keep it as default.
All the best
Xingchun
-
November 30, 2018 at 3:12 pm
JakeC
Ansys EmployeeHi Xingchun,
That all looks correct to me.
Have you been able to run other types of workloads on this cluster and distribute across nodes?
Do the compute nodes have Hyperthreading enabled?
Lastly can you try IBM mpi instead of OpenMPI?
Thank you,
Jake
-
November 30, 2018 at 4:29 pm
tsiriaks
Ansys EmployeeHi Xingchun,
Aside from what Jake has asked, I have heard from a Fluent cluster setup expert that
"Torque, specified version, with MOAB is supported only via RSM, that is well documents. Additionally, there are issue with Torque in that PBS Torque does not accept core allocation using '-l select=n', it needs to be changed to use -l ppn format.
qsub -q batch -l select=32:ncpus=1:mpiprocs=1 -V -o stdout.out -e stderr.out."
Please try with the ppn format but if that doesn't help, you would need to either use RSM to submit jobs via Torque/MOAB or hire a third party Ansys channel partner to assist in customization.
Thank you,
Win
-
November 30, 2018 at 4:35 pm
Xingchun Wang
SubscriberHi Jake and Win
Thank you for your reply, please allow me some time to try your solution, I have to discuss with our Cluster administrator about the details.
Hope this time it works, I will report back ASAP.
All the best
Xingchun
-
December 3, 2018 at 2:46 pm
Xingchun Wang
SubscriberHi Jake,
I just confirmed we have Hyperthreading enabled, and actually, we don't have IBM mpi on the cluster.
All the best
Xingchun
-
December 3, 2018 at 2:52 pm
Xingchun Wang
SubscriberHi Win
I'm a little confused about what you said, it sounds like [qsub -q batch -l select=32:ncpus=1:mpiprocs=1 -V -o stdout.out -e stderr.out.] won't work on TORQUE and you want me to try use ppn format.
From my understanding, I think we are using the ppn format, for normal submitting, we use [qsub -l nodes=2
pn=36,walltime=4:00:00 ], this gives you 2 nodes with 36 processors on each node and 4 hours to use them. Is this the ppn format you are talking about?
Also, I checked with our administrator, he suggested me start fluent with lower level command, and using mpirun to start the process.
Hope this information helpful
All the best
Xingchun
-
December 3, 2018 at 5:17 pm
tsiriaks
Ansys EmployeeHi Xingchun,
Yes, have you tried that command that you use ppn instead ?
What is the script that you specify when using qsub ? Please provide the full/actual command that you've tried.
As mentioned, if this doesn't work, you would need to setup RSM for it.
Thanks,
Win
-
December 3, 2018 at 8:28 pm
-
December 3, 2018 at 8:35 pm
tsiriaks
Ansys EmployeeHi Xingchun,
Ah yea, I missed that. Sorry.
In that case, I think you would need to follow one of the two ways that the Fluent cluster setup expert has mentioned
1. Setup RSM to submit jobs via Torque/MOAB
2. Hire a third party Ansys channel partner to assist in customization to work with your current setup (which is not supported).
Thanks,
Win
-
December 4, 2018 at 12:12 am
Xingchun Wang
SubscriberHi, Win
I checked with the administrator, we don't actually have MOAB, instead, we use maui, so the question is, can RSM be configured with this combination?
and again? is there a way to start fluent process by using lower level command and by using mpirun?
All the best
Xingchun
-
December 4, 2018 at 8:55 pm
tsiriaks
Ansys EmployeeHi Xingchun,
I got the answer
"you shouldn't need to use the mpiexec or mpirun with Fluent"
As for the Maui vs RSM, this is not supported, so it needs some kind of customization.
Thanks,
Win
-
December 4, 2018 at 9:07 pm
Xingchun Wang
SubscriberOK, I see, thank you for your information
All the best
Xingchun
-
- You must be logged in to reply to this topic.

Earth Rescue – An Ansys Online Series
The climate crisis is here. But so is the human ingenuity to fight it. Earth Rescue reveals what visionary companies are doing today to engineer radical new ideas in the fight against climate change. Click here to watch the first episode.

Ansys Blog
Subscribe to the Ansys Blog to get great new content about the power of simulation delivered right to your email on a weekly basis. With content from Ansys experts, partners and customers you will learn about product development advances, thought leadership and trends and tips to better use Ansys tools. Sign up here.
- Error with workbench SceneGraphChart
- Workbench error
- Workbench not opening
- How can I renew ANSYS student version license?
- License Error
- Sizing on Ansys Workbench 19.2
- Error: Exception of type ‘Ansys.Fluent.Cortex.Cortex not availableException’ was thrown
- Licensing error while opening ANSYS Mechanical
- An error occurred when the post processor attempted to load a specific result.
- Problem with FlexNet Licensing
-
2630
-
2110
-
1335
-
1110
-
461
© 2023 Copyright ANSYS, Inc. All rights reserved.