Facing issues while setting up Distributed memory simulations in ANSYS EDT HFSS 2020R1

Hello

I have two machines that are connected through a router with ANSYS EDT 2020R1 installed. I want to perform distributed memory simulations. For this I need to install MPI software(intel or IBM) on both machines. I have installed the intel MPI firstly but it didn't work for me and simulation stopped with progress window indicating

[project name] - HFSSDesign1 - Setup1: Determining memory availability on distributed machines on [target machine name]

Don't know why it hasn't worked for me. When googled about this issue I came across some interesting threads

https://forum.ansys.com/discussion/5534/best-way-to-create-a-cluster-of-4-computers-for-ansys-electronics-desktopto-share-memory-and-cores

https://forum.ansys.com/discussion/14155/hpc-setup-for-ansys-2020r1

https://forum.ansys.com/discussion/10353/mpi-authentication-in-hpc-using-multiple-nodes-in-ansys-electronics

https://forum.ansys.com/discussion/7313/hfss-hpc-setup-issues

All of these contain a magical six step procedure which says to use IBM Platform computing MPI. So I have removed the intel MPI libraries from pc and installed the IBM MPI which comes with the installation.

In order to check whether this one helps in setting up distributed simulation I have followed the test mentioned in one of the above mentioned threads.

"%MPI_ROOT%\bin\mpirun" -hostlist localhost:2,<the_other_node>:2 "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"

But this method too didn't work for me and throw some more errors which I didn't saw in the forum.

C:\Program Files (x86)\IBM\Platform-MPI\bin>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,<other-pc>:2 "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"                              

Password for MPI runs:                                                                                                            

mpirun: Drive is not a network mapped - using local drive.                                                                                          

mpid: PATH=C:\Program Files (x86)\IBM\Platform MPI\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;D:\Matlab install\runtime\win64;D:\Matlab install\bin;C:\Users\HP\AppData\Local\Microsoft\WindowsApps;       

mpid: PWD=C:\Program Files (x86)\IBM\Platform-MPI\bin                                                                                            

mpid: CreateProcess failed: Cannot execute C:\Program Files (x86)\IBM\Platform-MPI\bin\%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe                                             

mpirun: Unknown error  

I want to know if I have to perform any user registration for the Platform MPI to work on my machines. If yes please let me know how to do it.

If someone knows the solution please reply to this question.

Thanks

Mahesh

Answers

  • Hello @tsiriaks , @rounaksingh , @learner , @apr37 , @MaxL

    An update on my question. Among the two machines (DESKTOP-CLH2LM1-->(A), DESKTOP-B4I9FQ7-->(B)).

    When I run the test with A as localhost and B as the other machine the MPI testing command results in Hello world! output indicating a good connection between A & B.

    C:\Users\Mahesh>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,DESKTOP-B4I9FQ7:2

    "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"

    Password for MPI runs:

    mpirun: Drive is not a network mapped - using local drive.

    Hello world! I'm rank 0 of 4 running on DESKTOP-CLH2LM1

    Hello world! I'm rank 1 of 4 running on DESKTOP-CLH2LM1

    Hello world! I'm rank 2 of 4 running on DESKTOP-B4I9FQ7

    Hello world! I'm rank 3 of 4 running on DESKTOP-B4I9FQ7

    But when I tried to run the same MPI testing command with B as localhost and A as other machine, following output is obtained in command prompt window.

    C:\Users\HP>"%MPI_ROOT%\bin\mpirun" -pass -hostlist localhost:2,DESKTOP-CLH2LM1:2

    "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"                                                

    Password for MPI runs:                                                 

    mpirun: Drive is not a network mapped - using local drive.                               

    ERR-Client: InitializeSecurityContext failed (0x80090308)                                

    ERR - Client Authorization of socket failed.                                      

    Command sent to service failed.                                             

    mpirun: ERR: Error adding task to job (-1).                                       

    mpirun: mpirun_mpid_start: thread 19792 exited with code -1                              

    mpirun: mpirun_winstart: unable to start all mpid processes.                              

    mpirun: Unable to contact remote service or mpid                                    

    mpirun: An mpid process may still be running on DESKTOP-CLH2LM1  

    I want to know why the output is like this and what settings do I have to make for getting same output as described earlier in this comment.

    For testing this distributed simulation feature I have started simulation of Helical_Antenna { available in examples (it is advised to consider this simulation as test case ANSYS 2020 R1 Help) } on Machine A. I have setup analysis configuration consisting of two machine with Machine B being the first one among the list followed by localhost.

    But the simulation steps like meshing and solving are only performed in Machine B and didn't used any of the hardware available in Machine A. Why this occurred ?

    What settings do I need to modify for using both machines in the simulation ?

    P.S: Machine A has Windows 10 Pro OS while Machine B has Windows 10 Home OS installed. Also there is one generation difference between processors on both machines. I have disabled the firewalls completely on both machines. They are on the Domain "WorkGroup"

    Thanks

    Mahesh

  • mmadoremmadore Forum Coordinator

    @mahesh2444 Hello Mahesh, do you have the same username & password on each machine? Please note, MPI requires machines to be on a domain, it does not support a Workgroup environment so this may not work exactly as expected.

    Regarding using B first and localhost second in the list, the first listed machine will be responsible for meshing and adaptive passes prior to distribution to the other machines listed.


    Thanks,

    Matt

  • mahesh2444mahesh2444 Member
    edited December 2020

    Hi @mmadore

    Yes, all the machines have same username & password.

    May I know what should be the Workgroup name ?

    Also I have observed that sweep frequencies are getting solved locally rather than distributed. Isn't this feature available ?

    Thanks

    Mahesh

  • mmadoremmadore Forum Coordinator

    @mahesh2444 There is no special requirement for the workgroup name. Can you share a screenshot of the HPC and Analysis Settings you are currently using? Please also click on each of the machines listed in the settings and select "Test Machines" and share the output.


    Thank you,

    Matt

  • Hi @mmadore

    "Can you share a screenshot of the HPC and Analysis Settings you are currently using?"


    "Please also click on each of the machines listed in the settings and select "Test Machines" and share the output."

    Checking Machine DESKTOP-B4I9FQ7

    Pinging machine DESKTOP-B4I9FQ7

    32 bytes from 192.168.0.115:icmp_seq=0 time=1 ms TTL=192

    32 bytes from 192.168.0.115:icmp_seq=1 time=2 ms TTL=192

    32 bytes from 192.168.0.115:icmp_seq=2 time=2 ms TTL=192

    32 bytes from 192.168.0.115:icmp_seq=3 time=2 ms TTL=192

    4 packets transmitted,4 packets received, 0%packet loss. Round-trip (ms) min/avg/max = 1/1/2

    Checking AnsoftRSMService availability on machine DESKTOP-B4I9FQ7

    AnsoftRSMService is alive

    AnsoftRSMService is listening at 169.254.236.1 : 32958

    AnsoftRSMService is configured to allow remote solves

    Checking list of registered engines on machine DESKTOP-B4I9FQ7

    enscomengine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ENSCOMENGINE.exe

    EXTRACTOR2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/EXTRACTOR2DCOMENGINE.exe

    HFIECOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFIECOMENGINE.exe

    HFSSCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFSSCOMENGINE.exe

    ICEPAKCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ICEPAKCOMENGINE.exe

    MAXWELL2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELL2DCOMENGINE.exe

    MAXWELLCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELLCOMENGINE.exe

    MECHANICALCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MechanicalComEngine.exe

    Q3DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/Q3DCOMENGINE.exe

    RMXPRTCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/RMXPRTCOMENGINE.exe

    SimplorerCOMEngine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/SimplorerCOMEngine.exe

    desktopjob:ElectronicsDesktop2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/desktopjob.exe

    nexxim:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/nexxim.exe

    spexporter:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/spexporter.exe

    Checking for jobs running registered engines on machine DESKTOP-B4I9FQ7

    There are no jobs running registered product engines on this machine


    Checking Machine localhost

    Pinging machine localhost

    32 bytes from 127.0.0.1:icmp_seq=0 time=0 ms TTL=127

    32 bytes from 127.0.0.1:icmp_seq=1 time=0 ms TTL=127

    32 bytes from 127.0.0.1:icmp_seq=2 time=0 ms TTL=127

    32 bytes from 127.0.0.1:icmp_seq=3 time=0 ms TTL=127

    4 packets transmitted,4 packets received, 0%packet loss. Round-trip (ms) min/avg/max = 0/0/0

    Checking AnsoftRSMService availability on machine localhost

    AnsoftRSMService is alive

    AnsoftRSMService is listening at 192.168.0.108 : 32958

    AnsoftRSMService is configured to allow remote solves

    Checking list of registered engines on machine localhost

    enscomengine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ENSCOMENGINE.exe

    EXTRACTOR2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/EXTRACTOR2DCOMENGINE.exe

    HFIECOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFIECOMENGINE.exe

    HFSSCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/HFSSCOMENGINE.exe

    ICEPAKCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/ICEPAKCOMENGINE.exe

    MAXWELL2DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELL2DCOMENGINE.exe

    MAXWELLCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MAXWELLCOMENGINE.exe

    MECHANICALCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/MechanicalComEngine.exe

    Q3DCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/Q3DCOMENGINE.exe

    RMXPRTCOMENGINE:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/RMXPRTCOMENGINE.exe

    SimplorerCOMEngine:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/SimplorerCOMEngine.exe

    desktopjob:ElectronicsDesktop2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/desktopjob.exe

    nexxim:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/nexxim.exe

    spexporter:2020.1 is registered and available from C:/Program Files/AnsysEM/AnsysEM20.1/Win64/spexporter.exe

    Checking for jobs running registered engines on machine localhost

    There are no jobs running registered product engines on this machine

    Testing Completed.

    Success! All tests successful.

    Hope this helps you in answering my question related to solving sweep frequencies locally rather than distributed.

    I would like to hear from you to solve the problem as soon as possible.

    Thanks

    Mahesh

  • mahesh2444mahesh2444 Member
    edited December 2020

    Hi @mmadore

    I would like to know whether it is possible for solving a single sweep frequency in distributed manner on two machines simultaneously. I will try to convey my need through the following scenario.

    I am trying to simulate an array antenna at 25GHz having dimensions of 70 x 20 mm. I have unchecked the automatic settings in HPC and Analysis Settings and set one task per each machines shown in above image. During adaptive meshing it has used both machines and computed the mesh passes as per the convergence criteria. (Total memory used by 2 Distributed Processes : 9.2GB memory). But before starting sweep frequencies it has stopped the simulation with the message similar to following :

    sweep frequencies require 5.9GB memory per task and requires 11GB memory in total.

    But I have a total of 12.6GB memory available in combined. When I tried to re simulate the above design, simulation got completed by consuming 6.27GB memory per each sweep frequency stating that "switching to mixed precision to save memory". During re simulation only one machine (first one in the list) was used for solving sweep frequencies.

    Why the second machine in the list hasn't been used for solving sweep frequencies ?

    When automatic settings were enabled simulation never got completed stating more memory is needed.

    So my another question is to know whether it is possible for HFSS to solve the sweep frequency that would require 12GB memory per each one in a distributed manner just as happened with adaptive meshing process.

    Thanks

    Mahesh

  • mmadoremmadore Forum Coordinator

    @mahesh2444 Can you try solving C:\Program Files\AnsysEM\AnsysEM20.1\Win64\schedulers\diagnostics\Projects\HFSS\OptimTee-DiscreteSweep.aedt to test your setup? This will confirm if the Sweep in Setup1 will distribute across both machines.


    Thanks,

    Matt

  • Thanks @mmadore it's working with direct solver, Would this distribution work the same way with domain solver too ?

    Thanks

    Mahesh

  • mmadoremmadore Forum Coordinator
  • mahesh2444mahesh2444 Member
    edited January 1

    Hi @mmadore

    I am performing reflectarray simulations using Domain solver. Array and Horn are separated by creating FE-BI boxes surrounding each of them. For clear picture of my simulation setup see this Youtube Video

    Video summary
    "HFSS using Hybrid technique to implemented here. The entire simulation domain is divided into two FEBI region and we can avoid mesh between horn antenna and reflectarray. Hence reducing simulation time and memory consumption."

    When I tried to simulate the above described setup, only first machine in the list is getting used for adaptive meshing and second machine remains idle. Eventually, this causes "Out of memory" issue leading to abrupt termination of simulation.

    How could I make my two machines to be used for adaptive meshing just happened with the case of direct solver ? Please help me.

    Thanks

    Mahesh

  • mahesh2444mahesh2444 Member
    edited January 2

    Hi @mmadore

    Could you please look at my other question

    Issue with Domain solver still persists in case of MPI computing.

    Thanks

    Mahesh

  • Hi @mmadore

    My analysis setup is shown below along with the message displayed in analysis configuration.


    In preview job distribution, its showing analysis will be made locally, and same happens with the simulation of "almond_DDM" found in C:\Program Files\AnsysEM\AnsysEM20.1\Win64\schedulers\diagnostics\Projects\HFSS. How can I fix this problem ?

    I am waiting for your reply Matt, please respond as soon as possible.

    Thanks

    Mahesh

  • mmadoremmadore Forum Coordinator

    @mahesh2444 I've asked a colleague to review and comment further on this.


    Thanks,

    Matt

  • mmadoremmadore Forum Coordinator

    @mahesh2444 I have received this feedback. In short that is the way HFSS works. DDM allows the whole problem to be "divided" after an initial mesh is created that is why we see meshing on only 1 compute node. After meshing is completed HFSS now knows where to divide the objects for further analysis and solve. Very very top level the objects are "divided" where mesh is minimal. The objects are not divided by geometry parts but electrically through the mesh. The mesh is generated by determination of the Electric field so this initial mesh in necessary on one node before it can be split.


    Please let me know if this helps to explain the difference.


    Thanks

    Matt

  • I will check this and get back to you matt

  • Hi @mmadore

    Could you please look at my other question

    Thanks

    Mahesh

Sign In or Register to comment.