Host key verification failed - Submitting Fluent Jobs in LSF server

AlanAlan Member
edited November 16 in Ansys Products

Dear all,

I hope you are having a nice weekend!

I was trying to submit a Fluent job in a LSF server by using:

"fluent 3d -g -t4 -i ###.jou -scheduler_tight_coupling"

The job started successfully but stops very soon becaus of:

"Host key verification failed. Error: It seems ssh is trying to verify authenticity of c38b06. Please resolve it and try again! Client interrupted."

I checked online, some were using "export SSH_SPAWN=0" before the fluent submission command to avoid SSH verificaiton. But, I came across another problem:

"mpirun: rsh: Command not found"

Any inputs and suggestions are greatly appreciated!

thanks in advance!

Alan

Answers

  • RobRob UKForum Coordinator

    Sounds like permissions on the cluster: ie the host can't see/connect to the nodes.

  • AlanAlan Member

    Thank you Rob. I will contact the cluster adminstrator to resolve the problem then.

  • AlanAlan Member

    Dear Rob,

    I just contacted the admin team. They said that access to the compute nodes is handled by the LSF scheduler and direct ssh access to the compute nodes is not allowed. If my simulation requires ssh type of access, the 'blaunch' (https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_command_ref/blaunch.8.html) command should provide the same type of functionality.

    I was trying to use blaunch for fluent but can not find any guides from ANSYS Learning Forum. Do you by chance have any information about integrating blaunch with Fluent?


    thanks!

    Alan

  • RobRob UKForum Coordinator

    One for @mmadore then: I just break clusters....

  • mmadoremmadore Forum Coordinator
    edited November 17

    @Alan For blaunch to work passwordless you need to set these variables:

    Please set the below two system environment variables

    In Bash Shell:

    export FLUENT_SSH=blaunch

    export SCHEDULER_RSH=1


    In C Shell:

    setenv FLUENT_SSH blaunch

    setenv SCHEDULER_RSH 1


    Also, you have to use -scheduler_tight_coupling in your command line.


    Thanks

    Matt

  • AlanAlan Member

    Thanks, Matt.

    I tried your commands (In Bash Shell). I used the following command to launch Fluent:

    fluent 3d -g -t4 -i ###.jou -scheduler_tight_coupling

    It does use the blaunch as the system shows: "using remote shell blaunch". But, it still comes up with an error message:

    Host key verification failed.

    Error: It seems ssh is trying to verify authenticity of c39b12. Please resolve it and try again!

    Client interrupted.

    Any ideas to resolve this problem?


    thanks in advance!

    Alan

  • mmadoremmadore Forum Coordinator

    Could you please try adding

    -mpi=openmpi


    Thanks

    Matt

  • AlanAlan Member

    -mpi=openmpi does not work. It says:

    --------------------------------------------------------------------------

    A sequential map was requested, but not enough node entries

    were given to support the requested number of processes:


     Num procs: 4

     Num nodes: 1


    We cannot continue - please adjust either the number of processes

    or provide more node locations in the file.

    --------------------------------------------------------------------------


    But, -mpi=intel works. But seems slower than default mpi (-mpi=ibmmpi) in the mpitest:

    for -mpi=intel with -scheduler_tight_coupling

    Ping pong latency test ...

    ping..pong..latency(usec)...count..host

    -------------------------------------------------------------

    0.....1.....0.300575........10000..0:c40b06

    1.....2.....0.302362........10000..1:c40b06

    2.....3.....0.307225........10000..2:c40b06

    3.....0.....0.305513........10000..3:c40b06


    Ping pong bandwidth test ...

    ping..pong..bandwidth(MB)...count.msg-size(MB)..host

    -------------------------------------------------------------

    0.....1.....8916.7..........10....4.............0:c40b06

    1.....2.....8597.53.........10....4.............1:c40b06

    2.....3.....8345.63.........10....4.............2:c40b06

    3.....0.....8744.05.........10....4.............3:c40b06


    Global reduction test ...

    MPI-function...time-per-msg(usec)..count...total-time(sec)..

    -------------------------------------------------------------

    Bcast..........0.89159.............5000....0.00445795.......

    Reduce.........1.11198.............5000....0.00555992.......

    Barier.........0.34256.............5000....0.0017128........



    For -mpi=ibmmpi (which can not run Fluent because of the rsh problem)

    Ping pong latency test ...

    ping..pong..latency(usec)...count..host

    -------------------------------------------------------------

    0.....1.....0.2452..........10000..0:c40b06

    1.....2.....0.2407..........10000..1:c40b06

    2.....3.....0.23835.........10000..2:c40b06

    3.....0.....0.248725........10000..3:c40b06


    Ping pong bandwidth test ...

    ping..pong..bandwidth(MB)...count.msg-size(MB)..host

    -------------------------------------------------------------

    0.....1.....7351.01.........10....4.............0:c40b06

    1.....2.....7340.88.........10....4.............1:c40b06

    2.....3.....7287.95.........10....4.............2:c40b06

    3.....0.....7446.61.........10....4.............3:c40b06


    Global reduction test ...

    MPI-function...time-per-msg(usec)..count...total-time(sec)..

    -------------------------------------------------------------

    Bcast..........0.49243.............5000....0.00246215.......

    Reduce.........0.57559.............5000....0.00287795.......

    Barier.........0.510406............5000....0.00255203.......


    Is the -mpi=intel a good option for running Fluent? or other -mpi options may be better?

Sign In or Register to comment.