Ansys Learning Forum Maintenance

NOTICE: We will be performing backend maintenance of our Learning Forum from April 5 to April 12, 2021. The result will be a new infrastructure but with little impact to user experience and design. Currently the forum is accessible in read-only mode as we complete our final migration. Thank you for your patience. For urgent issues please visit HERE.


Troubleshooting RSM problems (for Intel MPI) — Ansys Learning Forum

Troubleshooting RSM problems (for Intel MPI)

apr37apr37 Member Posts: 30

Hello,

I have ANSYS EDT 2021 R1 installed on a few computers, along with RSM service and the Intel MPI. Last week I was able to successfully analyze a simulation using multiple machines in a HPC configuration, but this week some of the compute node machines are not cooperating. While running an analysis, message manager tells me

[error] Unable to locate or start COM engine on '[compute node]' : Unable to reach AnsoftRSMService.

And when I go to 'Test Machines' in the analysis configuration, the problem machines say ANS_CANNOT_CONNECTTO_ANSOFTRSMSERVICE

I check for proper MPI behavior on the compute nodes using the following commands in command prompt:

C:\Windows\system32>hydra_service -status

Response: hydra service running on [compute node]

and

mpiexec -validate

Response: SUCCESS

Firewalls on all machines have been turned off.

What troubleshooting steps can I take from here to try and narrow down the problem? I remember when I used to use the IBM MPI, I could use the following command to check for MPI function between two machines:

"%MPI_ROOT%\bin\mpirun" -hostlist localhost:2,[compute node DNS name]:2 "%ANSYSEM_ROOT201%\schedulers\diagnostics\Utils\pcmpi_test.exe"

Is there a similar command for use with the Intel MPI?

Any other suggestions are also appreciated.

Thank you!

-Alex

Tagged:

Best Answer

  • mmadoremmadore Ansys Employee Posts: 823
    Accepted Answer

    @apr37 Please make sure you have installed Ansoft RSM on all machines and registered the engines.

    http://storage.ansys.com/doclinks/videos.html?code=InsElecRSMonWindows-VLU-K0a

    To test Intel MPI:

    Try this - first with existing computer and then add the new one - you can add/remove by adding from ': -n 2 computername "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe"' onward for each computer.

    "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\common\fluent_mpi\multiport\mpi\win64\intel\bin\mpiexec" -n 2 -host computer1 "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe" : -n 2 -host computer2 "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe"


    Output should be something like:

    Intel MPI

    Hello world! I'm rank 0 of 4 running on computer1

    Hello world! I'm rank 1 of 4 running on computer1

    Hello world! I'm rank 2 of 4 running on computer2

    Hello world! I'm rank 3 of 4 running on computer2


    Thank you,

    Matt

Answers

  • mmadoremmadore Ansys Employee Posts: 823
    Accepted Answer

    @apr37 Please make sure you have installed Ansoft RSM on all machines and registered the engines.

    http://storage.ansys.com/doclinks/videos.html?code=InsElecRSMonWindows-VLU-K0a

    To test Intel MPI:

    Try this - first with existing computer and then add the new one - you can add/remove by adding from ': -n 2 computername "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe"' onward for each computer.

    "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\common\fluent_mpi\multiport\mpi\win64\intel\bin\mpiexec" -n 2 -host computer1 "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe" : -n 2 -host computer2 "C:\Program Files\AnsysEM\AnsysEM21.1\Win64\schedulers\diagnostics\Utils\intelmpi_test.exe"


    Output should be something like:

    Intel MPI

    Hello world! I'm rank 0 of 4 running on computer1

    Hello world! I'm rank 1 of 4 running on computer1

    Hello world! I'm rank 2 of 4 running on computer2

    Hello world! I'm rank 3 of 4 running on computer2


    Thank you,

    Matt

  • apr37apr37 Member Posts: 30

    Alright! Looks like even though I had installed and registered RSM on all the computers, somewhere along the line something happened that required me to do it again. At first I attempted to go straight to registering with RSM and on the affected computers several errors were returned along the lines of

    "C:/Program Fiels/AnsysEM/AnsusEM21.1/Win64/RXPRTCOMENGINE.exe: Error obtaining status

    > Please make sure that Remote Simulation Manager is installed and running"

    I did a repair installation of RSM, re-registered successfully, and now I am once again able to do distributed solves with those machines.

    Also, your suggested Intel MPI test command worked as expected for me. Thanks for that, too.

Sign In or Register to comment.