Fail to run system coupling for 2020 r1 and r2 on HPC cluster

Hello Everyone,

I am student at NAU. I used to run system coupling (version 2019r2) without any issue on our university HPC cluster. Now the university has upgraded the system from EL6 to El8 and installed new version of Ansys (2020 r1 and r2). The issue is that I am not able to run the system coupling on the new package. It appears to be an issue with EL8, or enterprise linux 8, vs EL6 since my job runs fine on EL6, but not EL8 or maybe new version of Ansys(note that I did setup on 2019r2 and have not tried 2020 version on EL6 to see if it works fine but I checked the 2020 tutorial and figured out there is no much difference about the command lines ). I would really appreciate it if anyone knows about this issue help me to solve it.

This is the error I got:

And this is contents of my input file:

ImportSystemCouplingInputFile(FilePath = 'scinput.sci')

execCon = DatamodelRoot().CouplingParticipant

execCon['Solution'].ExecutionControl.InitialInput = 'fluidflow.cas'

execCon['Solution'].ExecutionControl.WorkingDirectory = 'Fluid_Run'

execCon['Solution 1'].ExecutionControl.InitialInput = 'mapdl.dat'

execCon['Solution 1'].ExecutionControl.WorkingDirectory = 'Structural_Run'

execCon['Solution 1'].ExecutionControl.AdditionalArguments = '-smp'

execCon['Solution'].ExecutionControl.PrintState()

execCon['Solution 1'].ExecutionControl.PrintState()

Solve()


Thanks,

Comments

  • Hi Maryam


    Did you solve the problem? I met this kind of problem before, but not sure if this one will fit yours, it's probably because of the new directory path of system coupling file that installed in your school system. Please let me know if you solve it or not, I'm interested in.


    Best,

    Jirong

  • Hi Jirong,

    Thank you for your advice. Unfortunately I have not been able to resolve the issue yet. I will let you know if I could.

  • SteveSteve Forum Coordinator
    edited December 2020

    Hi Jirong,

    Please copy and paste the slurm script that you're using in the comments. System Coupling supports slurm, so it's best to use the PartitionParticipants command in your run.py to specify how the cores are allocated to each solver. In the System Coupling Help, See: System Coupling Settings and Commands Reference > PartitionParticipants. However I don't think this could be the problem. My guess is that System Coupling isn't installed on all nodes.

    Steve

  • SteveSteve Forum Coordinator

    @maryam , my mistake, I was referring to you.

    Steve

  • Steve,

    This is content of my job script. As mentioned before i successfully ran this job script on the old system.


    #!/bin/bash

    #SBATCH --job-name=fluent_test

    #SBATCH --output=/scratch/user/Files/rest_simulation/output.txt

    #SBATCH --chdir=/scratch/user/Files/rest_simulation/

    #SBATCH --mem=16000               # 2GB of memory                                                                      

    #SBATCH --partition=arzani 

    #SBATCH --nodes=1

    #SBATCH --ntasks=28                                                                                                                                     

    #SBATCH --time=300:00:00

    ### format the slurm node list so fluent can handle it

    scontrol show hostname $SLURM_NODELIST > hosts.txt

    ### load a module, for example

    module load ansys

    ### run fluent without srun, specifying a list of hosts provided by slurm

    systemcoupling -R inputfile.in


    Maryam

  • SteveSteve Forum Coordinator

    Hi @maryam , the submission script looks fine. I recommend double checking with your IT team to see if Ansys is installed in the expected location. In the first error you sent, it says "line 14: module: command not found" and "line 17: system coupling: command not found". This suggests that the Slurm script can't find the Ansys install.

  • maryammaryam Member

    The output from my ansys job is below. It seems like the important information is:


    *************** caught exception in method doExecute in BaseValidator can only concatenate str (not "NoneType") to str


    Is it possibly an issue with version of python being used? Or some other incompatibility with EL8 and ansys 20r2?


    Here is the slurm jobscript:


    #!/bin/bash

    #SBATCH --job-name=fluent_test

    #SBATCH --output=/scratch/ma3367/el8test/output.txt

    #SBATCH --chdir=/scratch/ma3367/el8test

    #SBATCH --mem=16000

    #SBATCH --partition=arzani  

    #SBATCH --nodes=1

    #SBATCH --ntasks=2

    #SBATCH --time=30:00

    ### format the slurm node list so fluent can handle it


    scontrol show hostname $SLURM_NODELIST > hosts.txt


    ### load a module, for example

    module load ansys/20r2


    ### run fluent without srun, specifying a list of hosts provided by slurm

    systemcoupling -R inputfile.in --cnf hosts.txt


    =========


    And my input file:


    ImportSystemCouplingInputFile(FilePath = 'scinput.sci')

    execCon = DatamodelRoot().CouplingParticipant

    execCon['Solution'].ExecutionControl.InitialInput = 'fluidflow.cas'

    execCon['Solution'].ExecutionControl.WorkingDirectory = 'Fluid_Run'

    execCon['Solution 1'].ExecutionControl.InitialInput = 'mapdl.dat'

    execCon['Solution 1'].ExecutionControl.WorkingDirectory = 'Structural_Run'

    execCon['Solution 1'].ExecutionControl.AdditionalArguments = '-smp'

    execCon['Solution'].ExecutionControl.PrintState()

    execCon['Solution 1'].ExecutionControl.PrintState()

    Solve()


    =========


    Ansys output:



    ANSYS(R) System Coupling


    Executing from: /packages/ansys/20r2/v202/SystemCoupling/bin/systemcoupling


    2020 R2


    Point Releases and Patches installed:


    ANSYS, Inc. Products 2020 R2

    Autodyn 2020 R2

    LS-DYNA 2020 R2

    CFD-Post only 2020 R2

    CFX (includes CFD-Post) 2020 R2

    Chemkin 2020 R2

    EnSight 2020 R2

    FENSAP-ICE 2020 R2

    Fluent (includes CFD-Post) 2020 R2

    Forte (includes EnSight) 2020 R2

    Polyflow (includes CFD-Post) 2020 R2

    TurboGrid 2020 R2

    ICEM CFD 2020 R2

    Aqwa 2020 R2

    Customization Files for User Programmable Features 2020 R2

    Mechanical Products 2020 R2

    Icepak (includes CFD-Post) 2020 R2

    Remote Solve Manager Standalone Services 2020 R2

    ACIS Geometry Interface 2020 R2

    Catia, Version 5 Geometry Interface 2020 R2

    NX Geometry Interface 2020 R2

    Parasolid Geometry Interface 2020 R2

    ANSYS, Inc. License Manager 2020 R2


    (c) 2014-2020 ANSYS, Inc. All rights reserved. Unauthorized use, distribution

    or duplication is prohibited. This product is subject to U.S. laws governing

    export and re-export. For full Legal Notice, see documentation.


    executing script 'inputfile.in'

    Traceback (most recent call last):

    File "PyLib/kernel/statevalidation/core/BaseValidator.py", line 34, in doExecute

    File "PyLib/kernel/statevalidation/core/BaseValidator.py", line 42, in doValidate

    File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 22, in doValidateImpl

    File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 297, in validateTransferAttributes

    File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 646, in __getMappingType

    File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 641, in isExtensive

    TypeError: can only concatenate str (not "NoneType") to str


    *************** caught exception in method doExecute in BaseValidator can only concatenate str (not "NoneType") to str


    +-----------------------------------------------------------------------------+

    | The variable locations of force and heatflow variables for all Fluent |

    | participants are set to 'Element'. Make sure that this is consistent with |

    | the setup within Fluent. When importing the SCI file (this includes running |

    | inside Workbench), you can control the locations of these variables by |

    | using the 'SYC_FLUENT_CONSNODAL' environment variable. For more |

    | information, refer to System Coupling User's Guide. |

    +-----------------------------------------------------------------------------+

    Starting /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/bin/mpirun --rsh=/usr/bin/ssh -f /tmp/fluent-appfile.cbc.734361 -genv I_MPI_FABRICS shm:tcp -genv I_MPI_FALLBACK_DEVICE disable -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_PIN disable -genv I_MPI_ADJUST_REDUCE 2 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_ADJUST_BCAST 1 -genv I_MPI_ADJUST_BARRIER 2 -genv I_MPI_ADJUST_ALLGATHER 2 -genv I_MPI_ADJUST_GATHER 2 -genv I_MPI_ADJUST_ALLTOALL 1 -genv I_MPI_ADJUST_SCATTER 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python -genv FLUENT_PROD_DIR /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0 -genv TMI_CONFIG /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/etc/tmi.conf -n 2 -host cn108 /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python/bin/python3 /packages/ansys/20r2/v202/SystemCoupling/PyLib/kernel/Engine/ComputeNode.py -mpiw intel -mport 172.16.2.88:172.16.2.88:41651:0


    ===================================================================================

    = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

    = PID 734395 RUNNING AT cn108

    = EXIT CODE: 11

    = CLEANING UP REMAINING PROCESSES

    = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

    ===================================================================================

  • JirongJirong Member

    Hi Guys,

    Thanks all for your information and help. I appreciate it.

    Here is my slurm script which caused error: |Exception encountered when the coupling service requested | |  ChartableData->GetRootLevelName: Communication socket | |  unexpectedly disconnected. 

    -----------------------------------------------------------------------------------------------------------------------------------------------------------

    #!/bin/bash -e

    #SBATCH --job-name   ANSYS_FSI

    #SBATCH --time     02:00:00     # Walltime

    #SBATCH --ntasks    3

    #SBATCH --mem-per-cpu  10gb        # Memory per CPU

    #SBATCH --hint     nomultithread   # No hyperthreading


    cd /panfs/roc/groups/14/tranquil/li000096/ansys/dhp/Slurm/New_large/Newfolder

    module load ansys/19.2


    COMP_CPUS=$((SLURM_NTASKS-1))

    MECHANICAL_CPUS=1

    FLUID_CPUS=$((COMP_CPUS-MECHANICAL_CPUS))

    export SLURM_EXCLUSIVE="" # don't share CPUs

    echo "CPUs: Coupler:1 Struct:$MECHANICAL_CPUS Fluid:$FLUID_CPUS"


    echo "STARTING SYSTEM COUPLER"

    ANSYSDIR=/panfs/roc/msisoft/ansys/19.2/v192


    srun -N1 -n1 $ANSYSDIR/aisol/.workbench -cmd ansys.services.systemcoupling.exe -inputFile coupler.sci || scancel $SLURM_JOBID &


    SERVERFILE=scServer.scs


    ### Input files

    ###export fluent_journal=fluidFlow.jou

    ###export structural=structural.dat

    ###export coupling=coupler.sci


    ### Output files

    ###export structural_restart=ANSYSRestart1.out

    ###export fluent_restart=FLUENTRestart1.out

    ###export scrresult=scResult_01_000500.scr


    # Wait till $SERVERFILE is created


    while [[ ! -f "$SERVERFILE" ]] ; do

      sleep 1 # waiting for SC to start

    done

    sleep 1


    # Parse the data in $SERVERFILE

    cat $SERVERFILE

    (

    read hostport

    read count

    read ansys_sol

    read tmp1

    read fluent_sol

    read tmp2

    set `echo $hostport | sed 's/@/ /'`

    echo $1 > out.port

    echo $2 > out.host

    echo $ansys_sol > out.ansys

    echo $fluent_sol > out.fluent

    ) < $SERVERFILE

    read host < out.host

    read port < out.port

    read ansys_sol < out.ansys

    read fluent_sol < out.fluent


    echo "Port number: $port"

    echo "Host name: $host"

    echo "Fluent name: $fluent_sol"

    echo "Mechanical name: $ansys_sol"



    echo "STARTING ANSYS"

    # Run Ansys

    mapdl -b -np $MECHANICAL_CPUS -scport $port -schost $host -scname "$ansys_sol" -i structural.dat> struct.out || scancel $SLURM_JOBID &


    sleep 2

    echo "STARTING FLUENT"

    # Run Fluent

    fluent 3ddp -g -t$FLUID_CPUS -scport=$port -schost=$host -scname="$fluent_sol" -i fluidFlow.jou > fluent.out || scancel $SLURM_JOBID &

    wait

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    I'm sure if the error is because there is some issue in my slurm script that I don't know.


    Thanks,

    Jirong

Sign In or Register to comment.