Installing Ansys 2019R1 on Cray/CLE environment.

Hi,

We have configured the arc master on login node and arcnode on compute node. When we test the RSM it gives error as, cannot submit job and failed to start the arcnode.



Thank you

Answers

  • RSM Version: 19.3.328.0, Build Date: 11/18/2018 14:06:25

    Job Name: RSM Queue Test Job

      Type: SERVERTEST

      Client Directory: /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/2r2bm361.x9s

      Client Machine: login

      Queue: RSM Queue [localhost, default]

      Template: SERVERTEST

    Cluster Configuration: localhost [localhost]

      Cluster Type: ARC

      Custom Keyword: blank

      Transfer Option: None

      Staging Directory: blank

      Local Scratch Directory: /workarea/workarea/ansys

      Platform: Linux

      Using SSH for inter-node communication on cluster

      Cluster Submit Options: blank

      Normal Inputs: [*,commands.xml,*.in]

      Cancel Inputs: [-]

      Excluded Inputs: [-]

      Normal Outputs: [*]

      Failure Outputs: [-]

      Cancel Outputs: [-]

      Excluded Outputs: [-]

      Inquire Files:

       normal: [*]

       inquire: [*.out]

    Submission in progress...

    Runtime Settings:

      Job Owner: hpcadmin

      Submit Time: Thursday, 17 December 2020 15:26

      Directory: /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/2r2bm361.x9s

    2.67 KB, .05 sec (55.81 KB/sec)

    Submission in progress...

    JobType is: SERVERTEST

    Final command platform: Linux

    RSM_PYTHON_HOME=/workarea/athenasoftwares/ansys2019R1/ansys_inc/v193/commonfiles/CPython/2_7_15/linx64/Release/python

    RSM_HPC_JOBNAME=RSMTest

    Distributed mode requested: True

    RSM_HPC_DISTRIBUTED=TRUE

    Running 5 commands

    Job working directory: /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/2r2bm361.x9s

    Number of CPU requested: 1

    AWP_ROOT193=/workarea/athenasoftwares/ansys2019R1/ansys_inc/v193

    Testing writability of working directory...

    /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/2r2bm361.x9s

    If you can read this, file was written successfully to working directory

    Writability test complete

    Checking queue default exists ...

    Job will run locally on each node in: /workarea/workarea/ansys/ack7tpu4.7hx

    JobId was parsed as: 1

    External operation: 'queryStatus' has failed. This may or may not become a fatal error

    Status parsing failed to parse the primary command output: '

    External operation: 'parseStatus' has failed. This may or may not become a fatal error

    Parser could not parse the job status. Checking for completed job exitcode...

    Status Failed

    Problem during Status. The parser was unable to parse the output and did not output the variable: RSM_HPC_OUTPUT_STATUS.

    Error: Please check that the master service is started and that there is no firewall blocking access on ports 11193, 12193, or 13193

    Output:

    The status command failed to get single job status to the master service on: login:11193.

    External operation: 'queryStatus' has failed. This may or may not become a fatal error

    Status parsing failed to parse the primary command output: '

    External operation: 'parseStatus' has failed. This may or may not become a fatal error

    Parser could not parse the job status. Checking for completed job exitcode...

    Status Failed

    Problem during Status. The parser was unable to parse the output and did not output the variable: RSM_HPC_OUTPUT_STATUS.

    Error: Please check that the master service is started and that there is no firewall blocking access on ports 11193, 12193, or 13193

    Output:

    The status command failed to get single job status to the master service on: login:11193.

  • RSM_PYTHON_HOME=/workarea/athenasoftwares/ansys2019R1/ansys_inc/v193/commonfiles/CPython/2_7_15/linx64/Release/python

    RSM_HPC_JOBNAME=RSMTest

    Distributed mode requested: True

    RSM_HPC_DISTRIBUTED=TRUE

    Running 5 commands

    Job working directory: /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/gdqlpi3q.1d8

    Number of CPU requested: 1

    AWP_ROOT193=/workarea/athenasoftwares/ansys2019R1/ansys_inc/v193

    Testing writability of working directory...

    /hpchome/hpcadmin/.local/share/Temp/RsmConfigTest/gdqlpi3q.1d8

    If you can read this, file was written successfully to working directory

    Writability test complete

    Checking queue local exists ...

    Submit parsing failed to parse the primary command output: 'ArcMaster process running as rsmadmin

    External operation: 'parseSubmit' has failed. This may or may not become a fatal error

    ArcMaster process running as rsmadmin

    ARCNode Process could not be reached

    Skipping autostart because Master processes is started as a service.

    Job not submitted. Error: Job is too big to fit in the queue local with the currently assigned machines.

    Failed to submit job to cluster

    Submit Failed

    Problem during Submit. The parser was unable to parse the output and did not output the variable: RSM_HPC_OUTPUT_JOBID.

    Error:

    Output: ArcMaster process running as rsmadmin

    ARCNode Process could not be reached

    Skipping autostart because Master processes is started as a service.

    Job not submitted. Error: Job is too big to fit in the queue local with the currently assigned machines.

  • Exec Node Name     Associated Master     State     Service User   Avail  Max  Avail  Max  Avail  Max

    ===============================================================================================================================


         nid00033         login       Running      root      72   50   *   *   *   * 

         nid00034         login       Running      root      72   50   *   *   *   * 


      * Indicates that resources have not been set up. Any resource request will be accepted.


    Updating Users and Groups...


    Groups matching *

     rsmadmins


    Users matching *

     root

     rsmadmin


    Updating queues...



         Name      Status   Priority  Start Time  End Time  Max Jobs   Allowed Machines     Allowed Users 

    ================================================================================================================================


        default      Active    0    00:00:00   23:59:59   *   login:nid00033:nid00034      all    

        local      Active    0    00:00:00   23:59:59   *        login          all    



      * Indicates that resources have not been set up. Any resource request will be accepted.

  • MangeshANSYSMangeshANSYS Forum Coordinator

    Please refer RSM documentation, search for "Example: Setting Up a Multi-Node ANSYS RSM Cluster (ARC)". Please double check that all the lines in Step #1 were followed. The arcmaster node needs to be able to communicate with the arcnode

  • I have setup the ARC master and client nodes and they can communicate with each other. I can see the execution nodes and I have setup the queues using the arcconfigui.

    I have configured the RSM and I can see the queue of ARC in it. Once I submit the jobs from the RSM the arc master service is stopped ?? and the process gives as can't communicate to arc master

Sign In or Register to comment.