Platform

Platform

Topics relate to optiSLang, HPC, DesignXplorer, Cloud and more

Fail to run system coupling for 2020 r1 and r2 on HPC cluster

    • maryam
      Subscriber

      Hello Everyone,

      I am student at NAU. I used to run system coupling (version 2019r2) without any issue on our university HPC cluster. Now the university has upgraded the system from EL6 to El8 and installed new version of Ansys (2020 r1 and r2). The issue is that I am not able to run the system coupling on the new package. It appears to be an issue with EL8, or enterprise linux 8, vs EL6 since my job runs fine on EL6, but not EL8 or maybe new version of Ansys(note that I did setup on 2019r2 and have not tried 2020 version on EL6 to see if it works fine but I checked the 2020 tutorial and figured out there is no much difference about the command lines ). I would really appreciate it if anyone knows about this issue help me to solve it.

      This is the error I got:

    • Jirong
      Subscriber
      Hi MaryamnnDid you solve the problem? I met this kind of problem before, but not sure if this one will fit yours, it's probably because of the new directory path of system coupling file that installed in your school system. Please let me know if you solve it or not, I'm interested in.Best,nJirongnn
    • maryam
      Subscriber

      Hi MaryamDid you solve the problem? I met this kind of problem before, but not sure if this one will fit yours, it's probably because of the new directory path of system coupling file that installed in your school system. Please let me know if you solve it or not, I'm interested in. Best,Jironghttps://forum.ansys.com/discussion/comment/100016#Comment_100016

      Hi Jirong,nThank you for your advice. Unfortunately I have not been able to resolve the issue yet. I will let you know if I could.n
    • Steve
      Ansys Employee
      Hi Jirong,nPlease copy and paste the slurm script that you're using in the comments. System Coupling supports slurm, so it's best to use the PartitionParticipants command in your run.py to specify how the cores are allocated to each solver. In the System Coupling Help, See: System Coupling Settings and Commands Reference > PartitionParticipants. However I don't think this could be the problem. My guess is that System Coupling isn't installed on all nodes.nSteven
    • maryam
      Subscriber

      Hi Jirong,Please copy and paste the slurm script that you're using in the comments. System Coupling supports slurm, so it's best to use the PartitionParticipants command in your run.py to specify how the cores are allocated to each solver. In the System Coupling Help, See: System Coupling Settings and Commands Reference > PartitionParticipants. However I don't think this could be the problem. My guess is that System Coupling isn't installed on all nodes.Stevehttps://forum.ansys.com/discussion/comment/100057#Comment_100057

      Hi Steve,nIs your comment referring to me or Jirong?Maryamn
    • Steve
      Ansys Employee
      my mistake, I was referring to you.nSteven
    • maryam
      Subscriber
      Steve,nThis is content of my job script. As mentioned before i successfully ran this job script on the old system.nn#!/bin/bashn#SBATCH --job-name=fluent_testn#SBATCH --output=/scratch/user/Files/rest_simulation/output.txtn#SBATCH --chdir=/scratch/user/Files/rest_simulation/n#SBATCH --mem=16000 # 2GB of memory n#SBATCH --partition=arzani n#SBATCH --nodes=1n#SBATCH --ntasks=28 n#SBATCH --time=300:00:00n### format the slurm node list so fluent can handle itnscontrol show hostname $SLURM_NODELIST > hosts.txtn### load a module, for examplenmodule load ansysn### run fluent without srun, specifying a list of hosts provided by slurmnsystemcoupling -R inputfile.innnMaryam
    • maryam
      Subscriber

      @maryam , my mistake, I was referring to you.Stevehttps://forum.ansys.com/discussion/comment/100418#Comment_100418

      Steve,nI copied the job script as you asked.nnMaryamn
    • Steve
      Ansys Employee
      Hi Array, the submission script looks fine. I recommend double checking with your IT team to see if Ansys is installed in the expected location. In the first error you sent, it says line 14: module: command not found and line 17: system coupling: command not found. This suggests that the Slurm script can't find the Ansys install.n
    • maryam
      Subscriber
      The output from my ansys job is below. It seems like the important information is:nn*************** caught exception in method doExecute in BaseValidator can only concatenate str (not NoneType) to strnnIs it possibly an issue with version of python being used? Or some other incompatibility with EL8 and ansys 20r2?.Here is the slurm jobscript:nn#!/bin/bashn#SBATCH --job-name=fluent_testn#SBATCH --output=/scratch/ma3367/el8test/output.txtn#SBATCH --chdir=/scratch/ma3367/el8testn#SBATCH --mem=16000n#SBATCH --partition=arzani  n#SBATCH --nodes=1n#SBATCH --ntasks=2n#SBATCH --time=30:00n### format the slurm node list so fluent can handle itnnscontrol show hostname $SLURM_NODELIST > hosts.txtnn### load a module, for examplenmodule load ansys/20r2nn### run fluent without srun, specifying a list of hosts provided by slurmnsystemcoupling -R inputfile.in --cnf hosts.txtnn=========nnAnd my input file:nnImportSystemCouplingInputFile(FilePath = 'scinput.sci')nexecCon = DatamodelRoot().CouplingParticipantnexecCon['Solution'].ExecutionControl.InitialInput = 'fluidflow.cas'nexecCon.ExecutionControl.WorkingDirectory = 'Fluid_Run'nexecCon.ExecutionControl.InitialInput = 'mapdl.dat'nexecCon.ExecutionControl.WorkingDirectory = 'Structural_Run'nexecCon.ExecutionControl.AdditionalArguments = '-smp'nexecCon.ExecutionControl.PrintState()nexecCon.ExecutionControl.PrintState()nSolve()nn=========nnAnsys output:nnnANSYS(R) System CouplingnnExecuting from: /packages/ansys/20r2/v202/SystemCoupling/bin/systemcouplingnn2020 R2nnPoint Releases and Patches installed:nnANSYS, Inc. Products 2020 R2nAutodyn 2020 R2nLS-DYNA 2020 R2nCFD-Post only 2020 R2nCFX (includes CFD-Post) 2020 R2nChemkin 2020 R2nEnSight 2020 R2nFENSAP-ICE 2020 R2nFluent (includes CFD-Post) 2020 R2nForte (includes EnSight) 2020 R2nPolyflow (includes CFD-Post) 2020 R2nTurboGrid 2020 R2nICEM CFD 2020 R2nAqwa 2020 R2nCustomization Files for User Programmable Features 2020 R2nMechanical Products 2020 R2nIcepak (includes CFD-Post) 2020 R2nRemote Solve Manager Standalone Services 2020 R2nACIS Geometry Interface 2020 R2nCatia, Version 5 Geometry Interface 2020 R2nNX Geometry Interface 2020 R2nParasolid Geometry Interface 2020 R2nANSYS, Inc. License Manager 2020 R2nn(c) 2014-2020 ANSYS, Inc. All rights reserved. Unauthorized use, distributionnor duplication is prohibited. This product is subject to U.S. laws governingnexport and re-export. For full Legal Notice, see documentation.nnexecuting script 'inputfile.in'nTraceback (most recent call last):nFile PyLib/kernel/statevalidation/core/BaseValidator.py, line 34, in doExecutenFile PyLib/kernel/statevalidation/core/BaseValidator.py, line 42, in doValidatenFile PyLib/kernel/physics/ModelListeners/InterfaceValidator.py, line 22, in doValidateImplnFile PyLib/kernel/physics/ModelListeners/InterfaceValidator.py, line 297, in validateTransferAttributesnFile PyLib/kernel/physics/ModelListeners/InterfaceValidator.py, line 646, in __getMappingTypenFile PyLib/kernel/physics/ModelListeners/InterfaceValidator.py, line 641, in isExtensivenTypeError: can only concatenate str (not NoneType) to strnn*************** caught exception in method doExecute in BaseValidator can only concatenate str (not NoneType) to strnn+
      +n| The variable locations of force and heatflow variables for all Fluent |n| participants are set to 'Element'. Make sure that this is consistent with |n| the setup within Fluent. When importing the SCI file (this includes running |n| inside Workbench), you can control the locations of these variables by |n| using the 'SYC_FLUENT_CONSNODAL' environment variable. For more |n| information, refer to System Coupling User's Guide. |n+
      +nStarting /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/bin/mpirun --rsh=/usr/bin/ssh -f /tmp/fluent-appfile.cbc.734361 -genv I_MPI_FABRICS shm:tcp -genv I_MPI_FALLBACK_DEVICE disable -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_PIN disable -genv I_MPI_ADJUST_REDUCE 2 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_ADJUST_BCAST 1 -genv I_MPI_ADJUST_BARRIER 2 -genv I_MPI_ADJUST_ALLGATHER 2 -genv I_MPI_ADJUST_GATHER 2 -genv I_MPI_ADJUST_ALLTOALL 1 -genv I_MPI_ADJUST_SCATTER 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python -genv FLUENT_PROD_DIR /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0 -genv TMI_CONFIG /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/etc/tmi.conf -n 2 -host cn108 /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python/bin/python3 /packages/ansys/20r2/v202/SystemCoupling/PyLib/kernel/Engine/ComputeNode.py -mpiw intel -mport 172.16.2.88:172.16.2.88:41651:0nn===================================================================================n= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSESn= PID 734395 RUNNING AT cn108n= EXIT CODE: 11n= CLEANING UP REMAINING PROCESSESn= YOU CAN IGNORE THE BELOW CLEANUP MESSAGESn===================================================================================nn
    • Jirong
      Subscriber
      Hi Guys,nThanks all for your information and help. I appreciate it.nHere is my slurm script which caused error: |Exception encountered when the coupling service requested | |  ChartableData->GetRootLevelName: Communication socket | |  unexpectedly disconnected. n-----------------------------------------------------------------------------------------------------------------------------------------------------------n#!/bin/bash -en#SBATCH --job-name   ANSYS_FSIn#SBATCH --time     02:00:00     # Walltimen#SBATCH --ntasks    3n#SBATCH --mem-per-cpu  10gb        # Memory per CPUn#SBATCH --hint     nomultithread   # No hyperthreadingnncd /panfs/roc/groups/14/tranquil/li000096/ansys/dhp/Slurm/New_large/Newfoldernmodule load ansys/19.2nnCOMP_CPUS=$((SLURM_NTASKS-1))nMECHANICAL_CPUS=1nFLUID_CPUS=$((COMP_CPUS-MECHANICAL_CPUS))nexport SLURM_EXCLUSIVE= # don't share CPUsnecho CPUs: Coupler:1 Struct:$MECHANICAL_CPUS Fluid:$FLUID_CPUSnnecho STARTING SYSTEM COUPLERnANSYSDIR=/panfs/roc/msisoft/ansys/19.2/v192nnsrun -N1 -n1 $ANSYSDIR/aisol/.workbench -cmd ansys.services.systemcoupling.exe -inputFile coupler.sci || scancel $SLURM_JOBID &nnSERVERFILE=scServer.scsnn### Input filesn###export fluent_journal=fluidFlow.joun###export structural=structural.datn###export coupling=coupler.scinn### Output filesn###export structural_restart=ANSYSRestart1.outn###export fluent_restart=FLUENTRestart1.outn###export scrresult=scResult_01_000500.scrnn# Wait till $SERVERFILE is creatednnwhile [[ ! -f $SERVERFILE ]] ; don  sleep 1 # waiting for SC to startndonensleep 1nn# Parse the data in $SERVERFILEncat $SERVERFILEn(nread hostportnread countnread ansys_solnread tmp1nread fluent_solnread tmp2nset echo $hostport | sed 's/@/ /' necho $1 > out.portnecho $2 > out.hostnecho $ansys_sol > out.ansysnecho $fluent_sol > out.fluentn) < $SERVERFILEnread host < out.hostnread port < out.portnread ansys_sol < out.ansysnread fluent_sol < out.fluentnnecho Port number: $portnecho Host name: $hostnecho Fluent name: $fluent_solnecho Mechanical name: $ansys_solnnnecho STARTING ANSYSn# Run Ansysnmapdl -b -np $MECHANICAL_CPUS -scport $port -schost $host -scname $ansys_sol -i structural.dat> struct.out || scancel $SLURM_JOBID &nnsleep 2necho STARTING FLUENTn# Run Fluentnfluent 3ddp -g -t$FLUID_CPUS -scport=$port -schost=$host -scname=$fluent_sol -i fluidFlow.jou > fluent.out || scancel $SLURM_JOBID &nwaitn
      nI'm sure if the error is because there is some issue in my slurm script that I don't know.Thanks,nJirongn
Viewing 10 reply threads
  • The topic ‘Fail to run system coupling for 2020 r1 and r2 on HPC cluster’ is closed to new replies.