Facing issues while setting up Distributed memory simulations in ANSYS EDT HFSS 2020R1
Hello
I have two machines that are connected through a router with ANSYS EDT 2020R1 installed. I want to perform distributed memory simulations. For this I need to install MPI software(intel or IBM) on both machines. I have installed the intel MPI firstly but it didn't work for me and simulation stopped with progress window indicating
[project name] - HFSSDesign1 - Setup1: Determining memory availability on distributed machines on [target machine name]
Don't know why it hasn't worked for me. When googled about this issue I came across some interesting threads
https://forum.ansys.com/discussion/14155/hpc-setup-for-ansys-2020r1
https://forum.ansys.com/discussion/7313/hfss-hpc-setup-issues
All of these contain a magical six step procedure which says to use IBM Platform computing MPI. So I have removed the intel MPI libraries from pc and installed the IBM MPI which comes with the installation.
In order to check whether this one helps in setting up distributed simulation I have followed the test mentioned in one of the above mentioned threads.
"%MPI_ROOT%\bin\mpirun" -hostlist localhost:2,<the_other_node>:2 "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"
But this method too didn't work for me and throw some more errors which I didn't saw in the forum.
C:\Program Files (x86)\IBM\Platform-MPI\bin>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,<other-pc>:2 "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"
Password for MPI runs:
mpirun: Drive is not a network mapped - using local drive.
mpid: PATH=C:\Program Files (x86)\IBM\Platform MPI\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;D:\Matlab install\runtime\win64;D:\Matlab install\bin;C:\Users\HP\AppData\Local\Microsoft\WindowsApps;
mpid: PWD=C:\Program Files (x86)\IBM\Platform-MPI\bin
mpid: CreateProcess failed: Cannot execute C:\Program Files (x86)\IBM\Platform-MPI\bin\%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe
mpirun: Unknown error
I want to know if I have to perform any user registration for the Platform MPI to work on my machines. If yes please let me know how to do it.
If someone knows the solution please reply to this question.
Thanks
Mahesh
Answers
Hello @tsiriaks , @rounaksingh , @learner , @apr37 , @MaxL
An update on my question. Among the two machines (DESKTOP-CLH2LM1-->(A), DESKTOP-B4I9FQ7-->(B)).
When I run the test with A as localhost and B as the other machine the MPI testing command results in Hello world! output indicating a good connection between A & B.
C:\Users\Mahesh>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,DESKTOP-B4I9FQ7:2
"%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"
Password for MPI runs:
mpirun: Drive is not a network mapped - using local drive.
Hello world! I'm rank 0 of 4 running on DESKTOP-CLH2LM1
Hello world! I'm rank 1 of 4 running on DESKTOP-CLH2LM1
Hello world! I'm rank 2 of 4 running on DESKTOP-B4I9FQ7
Hello world! I'm rank 3 of 4 running on DESKTOP-B4I9FQ7
But when I tried to run the same MPI testing command with B as localhost and A as other machine, following output is obtained in command prompt window.
C:\Users\HP>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,DESKTOP-CLH2LM1:2
"%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"
Password for MPI runs:
mpirun: Drive is not a network mapped - using local drive.
ERR-Client: InitializeSecurityContext failed (0x80090308)
ERR - Client Authorization of socket failed.
Command sent to service failed.
mpirun: ERR: Error adding task to job (-1).
mpirun: mpirun_mpid_start: thread 19792 exited with code -1
mpirun: mpirun_winstart: unable to start all mpid processes.
mpirun: Unable to contact remote service or mpid
mpirun: An mpid process may still be running on DESKTOP-CLH2LM1
I want to know why the output is like this and what settings do I have to make for getting same output as described earlier in this comment.
For testing this distributed simulation feature I have started simulation of Helical_Antenna { available in examples (it is advised to consider this simulation as test case ANSYS 2020 R1 Help) } on Machine A. I have setup analysis configuration consisting of two machine with Machine B being the first one among the list followed by localhost.
But the simulation steps like meshing and solving are only performed in Machine B and didn't used any of the hardware available in Machine A. Why this occurred ?
What settings do I need to modify for using both machines in the simulation ?
P.S: Machine A has Windows 10 Pro OS while Machine B has Windows 10 Home OS installed. Also there is one generation difference between processors on both machines. I have disabled the firewalls completely on both machines. They are on the Domain "WorkGroup"
Thanks
Mahesh
@mahesh2444 Hello Mahesh, do you have the same username & password on each machine? Please note, MPI requires machines to be on a domain, it does not support a Workgroup environment so this may not work exactly as expected.
Regarding using B first and localhost second in the list, the first listed machine will be responsible for meshing and adaptive passes prior to distribution to the other machines listed.
Thanks,
Matt
Hi @mmadore
Yes, all the machines have same username & password.
May I know what should be the Workgroup name ?
Also I have observed that sweep frequencies are getting solved locally rather than distributed. Isn't this feature available ?
Thanks
Mahesh
@mahesh2444 There is no special requirement for the workgroup name. Can you share a screenshot of the HPC and Analysis Settings you are currently using? Please also click on each of the machines listed in the settings and select "Test Machines" and share the output.
Thank you,
Matt
Hi @mmadore
"Can you share a screenshot of the HPC and Analysis Settings you are currently using?"
"Please also click on each of the machines listed in the settings and select "Test Machines" and share the output."
Checking Machine DESKTOP-B4I9FQ7
Pinging machine DESKTOP-B4I9FQ7
32 bytes from 192.168.0.115:icmp_seq=0 time=1 ms TTL=192
32 bytes from 192.168.0.115:icmp_seq=1 time=2 ms TTL=192
32 bytes from 192.168.0.115:icmp_seq=2 time=2 ms TTL=192
32 bytes from 192.168.0.115:icmp_seq=3 time=2 ms TTL=192
4 packets transmitted,4 packets received, 0%packet loss. Round-trip (ms) min/avg/max = 1/1/2
Checking AnsoftRSMService availability on machine DESKTOP-B4I9FQ7
AnsoftRSMService is alive
AnsoftRSMService is listening at 169.254.236.1 : 32958
AnsoftRSMService is configured to allow remote solves
Checking list of registered engines on machine DESKTOP-B4I9FQ7
enscomengine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ENSCOMENGINE.exe
EXTRACTOR2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/EXTRACTOR2DCOMENGINE.exe
HFIECOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFIECOMENGINE.exe
HFSSCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFSSCOMENGINE.exe
ICEPAKCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ICEPAKCOMENGINE.exe
MAXWELL2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELL2DCOMENGINE.exe
MAXWELLCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELLCOMENGINE.exe
MECHANICALCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MechanicalComEngine.exe
Q3DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/Q3DCOMENGINE.exe
RMXPRTCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/RMXPRTCOMENGINE.exe
SimplorerCOMEngine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/SimplorerCOMEngine.exe
desktopjob:ElectronicsDesktop2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/desktopjob.exe
nexxim:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/nexxim.exe
spexporter:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/spexporter.exe
Checking for jobs running registered engines on machine DESKTOP-B4I9FQ7
There are no jobs running registered product engines on this machine
Checking Machine localhost
Pinging machine localhost
32 bytes from 127.0.0.1:icmp_seq=0 time=0 ms TTL=127
32 bytes from 127.0.0.1:icmp_seq=1 time=0 ms TTL=127
32 bytes from 127.0.0.1:icmp_seq=2 time=0 ms TTL=127
32 bytes from 127.0.0.1:icmp_seq=3 time=0 ms TTL=127
4 packets transmitted,4 packets received, 0%packet loss. Round-trip (ms) min/avg/max = 0/0/0
Checking AnsoftRSMService availability on machine localhost
AnsoftRSMService is alive
AnsoftRSMService is listening at 192.168.0.108 : 32958
AnsoftRSMService is configured to allow remote solves
Checking list of registered engines on machine localhost
enscomengine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ENSCOMENGINE.exe
EXTRACTOR2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/EXTRACTOR2DCOMENGINE.exe
HFIECOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFIECOMENGINE.exe
HFSSCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFSSCOMENGINE.exe
ICEPAKCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ICEPAKCOMENGINE.exe
MAXWELL2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELL2DCOMENGINE.exe
MAXWELLCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELLCOMENGINE.exe
MECHANICALCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MechanicalComEngine.exe
Q3DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/Q3DCOMENGINE.exe
RMXPRTCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/RMXPRTCOMENGINE.exe
SimplorerCOMEngine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/SimplorerCOMEngine.exe
desktopjob:ElectronicsDesktop2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/desktopjob.exe
nexxim:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/nexxim.exe
spexporter:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/spexporter.exe
Checking for jobs running registered engines on machine localhost
There are no jobs running registered product engines on this machine
Testing Completed.
Success! All tests successful.
Hope this helps you in answering my question related to solving sweep frequencies locally rather than distributed.
I would like to hear from you to solve the problem as soon as possible.
Thanks
Mahesh
Hi @mmadore
I would like to know whether it is possible for solving a single sweep frequency in distributed manner on two machines simultaneously. I will try to convey my need through the following scenario.
I am trying to simulate an array antenna at 25GHz having dimensions of 70 x 20 mm. I have unchecked the automatic settings in HPC and Analysis Settings and set one task per each machines shown in above image. During adaptive meshing it has used both machines and computed the mesh passes as per the convergence criteria. (Total memory used by 2 Distributed Processes : 9.2GB memory). But before starting sweep frequencies it has stopped the simulation with the message similar to following :
sweep frequencies require 5.9GB memory per task and requires 11GB memory in total.
But I have a total of 12.6GB memory available in combined. When I tried to re simulate the above design, simulation got completed by consuming 6.27GB memory per each sweep frequency stating that "switching to mixed precision to save memory". During re simulation only one machine (first one in the list) was used for solving sweep frequencies.
Why the second machine in the list hasn't been used for solving sweep frequencies ?
When automatic settings were enabled simulation never got completed stating more memory is needed.
So my another question is to know whether it is possible for HFSS to solve the sweep frequency that would require 12GB memory per each one in a distributed manner just as happened with adaptive meshing process.
Thanks
Mahesh
@mahesh2444 Can you try solving C:\Program Files\AnsysEM\AnsysEM20.1\Win64\schedulers\diagnostics\Projects\HFSS\OptimTee-DiscreteSweep.aedt to test your setup? This will confirm if the Sweep in Setup1 will distribute across both machines.
Thanks,
Matt
Thanks @mmadore it's working with direct solver, Would this distribution work the same way with domain solver too ?
Thanks
Mahesh
@mahesh2444 Yes, it should.
Hi @mmadore
I am performing reflectarray simulations using Domain solver. Array and Horn are separated by creating FE-BI boxes surrounding each of them. For clear picture of my simulation setup see this Youtube Video
Video summary
"HFSS using Hybrid technique to implemented here. The entire simulation domain is divided into two FEBI region and we can avoid mesh between horn antenna and reflectarray. Hence reducing simulation time and memory consumption."
When I tried to simulate the above described setup, only first machine in the list is getting used for adaptive meshing and second machine remains idle. Eventually, this causes "Out of memory" issue leading to abrupt termination of simulation.
How could I make my two machines to be used for adaptive meshing just happened with the case of direct solver ? Please help me.
Thanks
Mahesh
Hi @mmadore
Could you please look at my other question
Issue with Domain solver still persists in case of MPI computing.
Thanks
Mahesh
Hi @mmadore
My analysis setup is shown below along with the message displayed in analysis configuration.
In preview job distribution, its showing analysis will be made locally, and same happens with the simulation of "almond_DDM" found in C:\Program Files\AnsysEM\AnsysEM20.1\Win64\schedulers\diagnostics\Projects\HFSS. How can I fix this problem ?
I am waiting for your reply Matt, please respond as soon as possible.
Thanks
Mahesh
@mahesh2444 I've asked a colleague to review and comment further on this.
Thanks,
Matt
@mahesh2444 I have received this feedback. In short that is the way HFSS works. DDM allows the whole problem to be "divided" after an initial mesh is created that is why we see meshing on only 1 compute node. After meshing is completed HFSS now knows where to divide the objects for further analysis and solve. Very very top level the objects are "divided" where mesh is minimal. The objects are not divided by geometry parts but electrically through the mesh. The mesh is generated by determination of the Electric field so this initial mesh in necessary on one node before it can be split.
Please let me know if this helps to explain the difference.
Thanks
Matt
I will check this and get back to you matt
Hi @mmadore
Could you please look at my other question
Thanks
Mahesh