Ansys Products

Ansys Products

HPC setup for ANSYS 2020R1

    • apr37
      Subscriber

      Hello ANSYS,

      New to the community, here.

      I have installed ANSYS ETD 2020R1 on a few locally networked machines. I would like to set up my HPC options so that analysis of a HFSS model on one computer is assisted by all the other machines which have ANSYS ETD installed. I can successfully outsource the processing to all of my machines, but only one at a time (head node is Windows 10, compute nodes are Windows 7). If I try to outsource the processing to multiple machines, ANSYS hangs with the following message displayed in the Progress pane:

      [project name] - HFSSDesign1 - Setup1: Determining memory availability on distributed machines on [target machine name]

      While stepping through setup, I referred to these posts for inspiration/guidance:
      https://studentcommunity.ansys.com/thread/hfss-hpc-setup-issues/
      https://studentcommunity.ansys.com/thread/mpi-authentication-in-hpc-using-multiple-nodes-in-ansys-electronics/
      https://studentcommunity.ansys.com/thread/hpc-problem/?order=all#comment-1b85f8c4-d5ee-4cc6-a1fe-aa6d007fc442

      I've also found some help pages,
      https://ansyshelp.ansys.com/Views/Secured/Electronics/v201/home.htm#../Subsystems/EMIT/Content/HPC/HighPerformanceComputingHPCIntegration.htm?Highlight=hpc
      https://ansyshelp.ansys.com/Views/Secured/Electronics/v201/Subsystems/EMIT/Content/HPC/ConfiguringANSYSEMInstallationForHPC.htm

      ...but I don't know how to systematically move forward in troubleshooting my problem.

      Step-by-step instructions updated for version 2020R1 would be appreciated as a starting point, to see if there is anything I've obviously done incorrectly in my setup process.

      Thank you!

      Alex

    • tsiriaks
      Ansys Employee

      Hi Alex,


      This is the issue with MPI, which is used to handle job/memory distributions when solving on more than 1 machines.


      Below is the general step-by-step instructions to setup AEDT with Ansoft RSM and MPI :


       


      1. Install same version of AEDT on each machine (make sure it's installed on the same path on all machines , or you can also install on one machine and use a shared directory. Note: This requirement is needed only if you are solving on multiple machines)


      https://www.youtube.com/embed/nDYclZgt-Ts


       


      2. Install AnsoftRSM on each solving machine


      https://www.youtube.com/embed/OqOJ3W91sSU


       


       


      3. Register AEDT with AnsoftRSM on each solving machine (if you are trying to setup multiple versions of AEDT, you have to register each of the version you want to setup)


      This was part of the install AnsoftRSM video


       


       4. Install IBM Platform MPI on each solving machine (Note: The installer is included in Electronics installation package, please take all defaults during the installation.)


       


      5. If you are setting up to solve on multiple machines but doesn't include the local machine as the solving machine, you must set this in AEDT GUI on the local machine: 


      Make sure username/password is specified in Tools > options > general options > remote analysis


       


       6. The temp directory on all solving machines must be the same path but local to each solving machine. We can set the path in, default,  . C:Program FilesAnsysEMAnsysEM###Win64configdefault.cfg


      Specify path that can be access by all users, such as tempdirectory='C: /Temp' Note: even if it's windows, please use forward slash and not backward slash 


       


      Then, if you are still having issue, it's very likely due to firewall. If you do have 3rd party firewall (e.g., from your security software), you must add exceptions for this as well (aside from Windows firewall). Please exclude the entire C:Program FilesAnsysEM directory. If you can't control this, please contact your IT to do it.


       


      Hope this helps.


      Thanks,


      Win


       


       

    • apr37
      Subscriber

      Hi Win,

      Thanks for the feedback. I have some questions. Can you explain steps 5 and 6 in a little more detail?

      For step 5: I have navigated to the 'Remote Analysis' portion of the 'general options', but I don't know what I am supposed to do now that I am here.

      For step 6: Do I go to 'general options', 'Directories' to change this option?
      To change the 'Temp:' directory, I need to check the 'override' box.
      When I type in 'C: /Temp', I get the error...

      "C: /Temp/"
      Directory specified could not be created.

      Is the space suppose to be there?
      I tried it without the space. I don't get an error, but I do see that when I click 'OK', it changes the text I entered in the field from
      "CTemp"
      to
      "C:Temp"
      before the window closes.
      (I did this on all computers)

      Is that supposed to happen? What is the importance of the Unix-style slash? (Just curious for my own understanding)

      For firewall settings: Do I need to add this exception to the local (head) machine, or the computer nodes, or both?

      As a status update, after turning off the firewalls on all involved computers and trying again, I still get the same error:

      [project name] - HFSSDesign1 - Setup1: Determining memory availability on distributed machines on [target machine name]

      Thanks again for your time,

      Alex

    • apr37
      Subscriber

      For the sake of amplifying information and search engine hits, here is the subsequent error message I get in Message manager if I let the program timeout after being stuck on "Determining memory availability...":

      Could not start the memory inquiry solver: check distributed installations, MPI availability, MPI authentication and firewall settings.

      (Suggesting more or less the path we are already going down.)

      -Alex

    • tsiriaks
      Ansys Employee

      Hi Alex,


      For step 5, once you are on that Remote Analysis, select specify username/password, then put in your Windows credentials used to login on the remote machine that you connect to via Ansoft RSM. You may need to specify the domain as well. This can be found from Windows by right click on 'This PC' -> Properties


      For step 6, I'd recommend to not do it from the GUI because that would apply only to you (your user account). If you modify it in this file C:Program FilesAnsysEMAnsysEM###Win64configdefault.cfg  , it will apply to all. But if you are the only one using this, then the GUI is ok too. Yes, I had to put space otherwise this forum changes to the smiley. The one without space is correct. I'm not sure what is done in the back-end that requires the path to be Unix style.


      Yes, this is generic MPI error message. For MPI communications , they are always between compute (solving) nodes, so you will need to open ports between these nodes/machines. There is an MPI test command that you can try. You can start by just including two compute nodes first. Run the following command locally from one of the solving


      "%MPI_ROOT%binmpirun" -hostlist localhost:2,:2 "%ANSYSEM_ROOT201%schedulersdiagnosticsUtilspcmpi_test.exe"


      Note: replace   by the hostname (or IP) of the other compute node. 


      Thanks,


      Win


       

    • apr37
      Subscriber

      Ah yes, I see the smiley now.

      Thank you for all of your detailed responses.

      I followed your steps and I was able to successfully distribute multiple variations to multiple remote machines in parallel, but I do have some follow-up questions.

      First, one detail for the benefit of future readers: At first selecting 'specified user' (and specifying a user) stopped the program from hanging on the "...Determining memory availability on distributed machines..." error, but variations kept getting assigned to the same machine one at a time. To solve multiple variations in parallel on multiple machines, I needed to go to my HPC analysis configuration and change "Num variations to distribute" to something higher than 1 (but also not more than the number of solving machines, or else ANSYS would get stuck on the "...Determining memory availability on distributed machines..." error again).

      Another question about step 5 above: What do I do if I want to utilize multiple compute nodes which do not have the same username and password? Is there a way to specify multiple username-password pairs, or do I need my network to be set up such that all the username-password pairs are the same for all compute nodes?

    • apr37
      Subscriber

      I ran your diagnostic command on one of the remote compute nodes and it returned:

      mpirun: Drive is not a network mapped - using local drive.
      WARNING: No cached password or password provided.
          use '-pass' or '-cache' to provide password
      Hello world! I'm rank 0 or 4 running on
      Hello world! I'm rank 1 or 4 running on
      Hello world! I'm rank 2 or 4 running on
      Hello world! I'm rank 3 or 4 running on

      Is this the expected, error-free response?


      A problem I am still having is that when I try to add local host to the list of compute nodes, ANSYS goes back to hanging on "...Determining memory availability on distributed machines...". Do I need to change something about my setup in order to have local host be one of multiple solving machines?

    • tsiriaks
      Ansys Employee

      For


      "Another question about step 5 above: What do I do if I want to utilize multiple compute nodes which do not have the same username and password? Is there a way to specify multiple username-password pairs, or do I need my network to be set up such that all the username-password pairs are the same for all compute nodes?"


      You can't do it that way. It's a requirement from MPI to have same credentials on all solving machines.


       


      For


      "A problem I am still having is that when I try to add local host to the list of compute nodes, ANSYS goes back to hanging on "...Determining memory availability on distributed machines...". Do I need to change something about my setup in order to have local host be one of multiple solving machines?"


      Let's make sure that in your AEDT solving machines list, you only list two (local host and another one) for now. Then, try MPI test with this pair and with this order. It seems your test above used localhost as second machine and seems fine. So, you could try specifying the localhost as the second in your AEDT solving machines list and try again.


       


      Thanks,


      Win

    • apr37
      Subscriber

      Thanks for that clarification, Win.

      Now that I understand the login credential matching requirements for MPI, I think I'm almost on my way.

      I've gotten parallel solving to work on various combinations of machines, but I've only been able to do it by turning off firewalls of the remote compute nodes.

      You mentioned in your initial response that I need to exclude the entire directory C:Program FilesAnsysEM from the firewall, but I haven't figured out how to exclude a directory from the firewall. All the websites I have found only tell me how to make exceptions for individual processes. I would consider going this route, but it is not clear to me which individual processes are associated with C:Program FilesAnsysEM.

      Can you refer me to a procedure for excluding an entire directory from Windows 10's firewall? (I am not running any 3rd-party software.)


      Almost there! Thanks so much for your help.


      -Alex

    • tsiriaks
      Ansys Employee

      Hi Alex,


      Sounds good. You can check this out to add Windows Firewall exclusion to entire AnsysEM folder.


      https://support.microsoft.com/en-us/help/4028485/windows-10-add-an-exclusion-to-windows-security


      Thanks,


      Win


       

    • mahesh2444
      Subscriber
      nI would like to know how you configured the IBM MPI for successful simulations. I am having two pc's and I described my issue atnhttps://forum.ansys.com/discussion/22480/facing-issues-while-setting-up-distributed-memory-simulations-in-ansys-edt-hfss-2020r1#latestnDo I need to do anything post installation of IBM MPI apart from the above mentioned things ? Do I need to register my credentials anywhere else for the mpirun to work ?nI have done the credential registration part as specified user in remote analysis available in AEDT GUI. nCan you please help me to get out of this problem ?nThanksnMaheshn
    • mahesh2444
      Subscriber
      Hello Array ,Array Array nAn update on my question. Among the two machines (DESKTOP-CLH2LM1-->(A), DESKTOP-B4I9FQ7-->(B)). nWhen I run the test with A as localhost and B as the other machine the MPI testing command results in Hello world! output indicating a good connection between A & B.nC:\Users\Mahesh>%MPI_ROOT%\bin\mpirun -pass -hostlist localhost:2,DESKTOP-B4I9FQ7:2 n%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exenPassword for MPI runs:nmpirun: Drive is not a network mapped - using local drive.nHello world! I'm rank 0 of 4 running on DESKTOP-CLH2LM1nHello world! I'm rank 1 of 4 running on DESKTOP-CLH2LM1nHello world! I'm rank 2 of 4 running on DESKTOP-B4I9FQ7nHello world! I'm rank 3 of 4 running on DESKTOP-B4I9FQ7nBut when I tried to run the same MPI testing command with B as localhost and A as other machine, following output is obtained in command prompt window.nC:\Users\HP>%MPI_ROOT%\bin\mpirun -pass -hostlist localhost:2,DESKTOP-CLH2LM1:2n%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe                                                nPassword for MPI runs:                                                 nmpirun: Drive is not a network mapped - using local drive.                               nERR-Client: InitializeSecurityContext failed (0x80090308)                                nERR - Client Authorization of socket failed.                                      nCommand sent to service failed.                                             nmpirun: ERR: Error adding task to job (-1).                                       nmpirun: mpirun_mpid_start: thread 19792 exited with code -1                              nmpirun: mpirun_winstart: unable to start all mpid processes.                              nmpirun: Unable to contact remote service or mpid                                    nmpirun: An mpid process may still be running on DESKTOP-CLH2LM1  nI want to know why the output is like this and what settings do I have to make for getting same output as described earlier in this comment. nFor testing this distributed simulation feature I have started simulation of Helical_Antenna { available in examples (it is advised to consider this simulation as test case in ANSYS 2020 R1 Help) } on Machine A. I have setup analysis configuration consisting of two machine with Machine B being the first one among the list followed by localhost. nBut the simulation steps like meshing and solving are only performed in Machine B and didn't used any of the hardware available in Machine A. Why this occurred ?nWhat settings do I need to modify for using both machines in the simulation ?nP.S: Machine A has Windows 10 Pro OS while Machine B has Windows 10 Home OS installed. Also there is one generation difference between processors on both machines. I have disabled the firewalls completely on both machines.nThanksnMaheshn
Viewing 11 reply threads
  • You must be logged in to reply to this topic.