Fail to run system coupling for 2020 r1 and r2 on HPC cluster
Hello Everyone,
I am student at NAU. I used to run system coupling (version 2019r2) without any issue on our university HPC cluster. Now the university has upgraded the system from EL6 to El8 and installed new version of Ansys (2020 r1 and r2). The issue is that I am not able to run the system coupling on the new package. It appears to be an issue with EL8, or enterprise linux 8, vs EL6 since my job runs fine on EL6, but not EL8 or maybe new version of Ansys(note that I did setup on 2019r2 and have not tried 2020 version on EL6 to see if it works fine but I checked the 2020 tutorial and figured out there is no much difference about the command lines ). I would really appreciate it if anyone knows about this issue help me to solve it.
This is the error I got:
And this is contents of my input file:
ImportSystemCouplingInputFile(FilePath = 'scinput.sci')
execCon = DatamodelRoot().CouplingParticipant
execCon['Solution'].ExecutionControl.InitialInput = 'fluidflow.cas'
execCon['Solution'].ExecutionControl.WorkingDirectory = 'Fluid_Run'
execCon['Solution 1'].ExecutionControl.InitialInput = 'mapdl.dat'
execCon['Solution 1'].ExecutionControl.WorkingDirectory = 'Structural_Run'
execCon['Solution 1'].ExecutionControl.AdditionalArguments = '-smp'
execCon['Solution'].ExecutionControl.PrintState()
execCon['Solution 1'].ExecutionControl.PrintState()
Solve()
Thanks,
Comments
Hi Maryam
Did you solve the problem? I met this kind of problem before, but not sure if this one will fit yours, it's probably because of the new directory path of system coupling file that installed in your school system. Please let me know if you solve it or not, I'm interested in.
Best,
Jirong
Hi Jirong,
Thank you for your advice. Unfortunately I have not been able to resolve the issue yet. I will let you know if I could.
Hi Jirong,
Please copy and paste the slurm script that you're using in the comments. System Coupling supports slurm, so it's best to use the PartitionParticipants command in your run.py to specify how the cores are allocated to each solver. In the System Coupling Help, See: System Coupling Settings and Commands Reference > PartitionParticipants. However I don't think this could be the problem. My guess is that System Coupling isn't installed on all nodes.
Steve
Hi Steve,
Is your comment referring to me or Jirong?
Maryam
@maryam , my mistake, I was referring to you.
Steve
Steve,
This is content of my job script. As mentioned before i successfully ran this job script on the old system.
#!/bin/bash
#SBATCH --job-name=fluent_test
#SBATCH --output=/scratch/user/Files/rest_simulation/output.txt
#SBATCH --chdir=/scratch/user/Files/rest_simulation/
#SBATCH --mem=16000 # 2GB of memory
#SBATCH --partition=arzani
#SBATCH --nodes=1
#SBATCH --ntasks=28
#SBATCH --time=300:00:00
### format the slurm node list so fluent can handle it
scontrol show hostname $SLURM_NODELIST > hosts.txt
### load a module, for example
module load ansys
### run fluent without srun, specifying a list of hosts provided by slurm
systemcoupling -R inputfile.in
Maryam
Steve,
I copied the job script as you asked.
Maryam
Hi @maryam , the submission script looks fine. I recommend double checking with your IT team to see if Ansys is installed in the expected location. In the first error you sent, it says "line 14: module: command not found" and "line 17: system coupling: command not found". This suggests that the Slurm script can't find the Ansys install.
The output from my ansys job is below. It seems like the important information is:
*************** caught exception in method doExecute in BaseValidator can only concatenate str (not "NoneType") to str
Is it possibly an issue with version of python being used? Or some other incompatibility with EL8 and ansys 20r2?
Here is the slurm jobscript:
#!/bin/bash
#SBATCH --job-name=fluent_test
#SBATCH --output=/scratch/ma3367/el8test/output.txt
#SBATCH --chdir=/scratch/ma3367/el8test
#SBATCH --mem=16000
#SBATCH --partition=arzani
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --time=30:00
### format the slurm node list so fluent can handle it
scontrol show hostname $SLURM_NODELIST > hosts.txt
### load a module, for example
module load ansys/20r2
### run fluent without srun, specifying a list of hosts provided by slurm
systemcoupling -R inputfile.in --cnf hosts.txt
=========
And my input file:
ImportSystemCouplingInputFile(FilePath = 'scinput.sci')
execCon = DatamodelRoot().CouplingParticipant
execCon['Solution'].ExecutionControl.InitialInput = 'fluidflow.cas'
execCon['Solution'].ExecutionControl.WorkingDirectory = 'Fluid_Run'
execCon['Solution 1'].ExecutionControl.InitialInput = 'mapdl.dat'
execCon['Solution 1'].ExecutionControl.WorkingDirectory = 'Structural_Run'
execCon['Solution 1'].ExecutionControl.AdditionalArguments = '-smp'
execCon['Solution'].ExecutionControl.PrintState()
execCon['Solution 1'].ExecutionControl.PrintState()
Solve()
=========
Ansys output:
ANSYS(R) System Coupling
Executing from: /packages/ansys/20r2/v202/SystemCoupling/bin/systemcoupling
2020 R2
Point Releases and Patches installed:
ANSYS, Inc. Products 2020 R2
Autodyn 2020 R2
LS-DYNA 2020 R2
CFD-Post only 2020 R2
CFX (includes CFD-Post) 2020 R2
Chemkin 2020 R2
EnSight 2020 R2
FENSAP-ICE 2020 R2
Fluent (includes CFD-Post) 2020 R2
Forte (includes EnSight) 2020 R2
Polyflow (includes CFD-Post) 2020 R2
TurboGrid 2020 R2
ICEM CFD 2020 R2
Aqwa 2020 R2
Customization Files for User Programmable Features 2020 R2
Mechanical Products 2020 R2
Icepak (includes CFD-Post) 2020 R2
Remote Solve Manager Standalone Services 2020 R2
ACIS Geometry Interface 2020 R2
Catia, Version 5 Geometry Interface 2020 R2
NX Geometry Interface 2020 R2
Parasolid Geometry Interface 2020 R2
ANSYS, Inc. License Manager 2020 R2
(c) 2014-2020 ANSYS, Inc. All rights reserved. Unauthorized use, distribution
or duplication is prohibited. This product is subject to U.S. laws governing
export and re-export. For full Legal Notice, see documentation.
executing script 'inputfile.in'
Traceback (most recent call last):
File "PyLib/kernel/statevalidation/core/BaseValidator.py", line 34, in doExecute
File "PyLib/kernel/statevalidation/core/BaseValidator.py", line 42, in doValidate
File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 22, in doValidateImpl
File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 297, in validateTransferAttributes
File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 646, in __getMappingType
File "PyLib/kernel/physics/ModelListeners/InterfaceValidator.py", line 641, in isExtensive
TypeError: can only concatenate str (not "NoneType") to str
*************** caught exception in method doExecute in BaseValidator can only concatenate str (not "NoneType") to str
+-----------------------------------------------------------------------------+
| The variable locations of force and heatflow variables for all Fluent |
| participants are set to 'Element'. Make sure that this is consistent with |
| the setup within Fluent. When importing the SCI file (this includes running |
| inside Workbench), you can control the locations of these variables by |
| using the 'SYC_FLUENT_CONSNODAL' environment variable. For more |
| information, refer to System Coupling User's Guide. |
+-----------------------------------------------------------------------------+
Starting /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/bin/mpirun --rsh=/usr/bin/ssh -f /tmp/fluent-appfile.cbc.734361 -genv I_MPI_FABRICS shm:tcp -genv I_MPI_FALLBACK_DEVICE disable -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_PIN disable -genv I_MPI_ADJUST_REDUCE 2 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_ADJUST_BCAST 1 -genv I_MPI_ADJUST_BARRIER 2 -genv I_MPI_ADJUST_ALLGATHER 2 -genv I_MPI_ADJUST_GATHER 2 -genv I_MPI_ADJUST_ALLTOALL 1 -genv I_MPI_ADJUST_SCATTER 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python -genv FLUENT_PROD_DIR /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0 -genv TMI_CONFIG /packages/ansys/20r2/v202/SystemCoupling/runTime/linx64/cnlauncher/fluent/fluent20.2.0/multiport/mpi/lnamd64/intel/etc/tmi.conf -n 2 -host cn108 /packages/ansys/20r2/v202/commonfiles/CPython/3_7/linx64/Release/python/bin/python3 /packages/ansys/20r2/v202/SystemCoupling/PyLib/kernel/Engine/ComputeNode.py -mpiw intel -mport 172.16.2.88:172.16.2.88:41651:0
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 734395 RUNNING AT cn108
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Hi Guys,
Thanks all for your information and help. I appreciate it.
Here is my slurm script which caused error: |Exception encountered when the coupling service requested | | ChartableData->GetRootLevelName: Communication socket | | unexpectedly disconnected.
-----------------------------------------------------------------------------------------------------------------------------------------------------------
#!/bin/bash -e
#SBATCH --job-name ANSYS_FSI
#SBATCH --time 02:00:00 # Walltime
#SBATCH --ntasks 3
#SBATCH --mem-per-cpu 10gb # Memory per CPU
#SBATCH --hint nomultithread # No hyperthreading
cd /panfs/roc/groups/14/tranquil/li000096/ansys/dhp/Slurm/New_large/Newfolder
module load ansys/19.2
COMP_CPUS=$((SLURM_NTASKS-1))
MECHANICAL_CPUS=1
FLUID_CPUS=$((COMP_CPUS-MECHANICAL_CPUS))
export SLURM_EXCLUSIVE="" # don't share CPUs
echo "CPUs: Coupler:1 Struct:$MECHANICAL_CPUS Fluid:$FLUID_CPUS"
echo "STARTING SYSTEM COUPLER"
ANSYSDIR=/panfs/roc/msisoft/ansys/19.2/v192
srun -N1 -n1 $ANSYSDIR/aisol/.workbench -cmd ansys.services.systemcoupling.exe -inputFile coupler.sci || scancel $SLURM_JOBID &
SERVERFILE=scServer.scs
### Input files
###export fluent_journal=fluidFlow.jou
###export structural=structural.dat
###export coupling=coupler.sci
### Output files
###export structural_restart=ANSYSRestart1.out
###export fluent_restart=FLUENTRestart1.out
###export scrresult=scResult_01_000500.scr
# Wait till $SERVERFILE is created
while [[ ! -f "$SERVERFILE" ]] ; do
sleep 1 # waiting for SC to start
done
sleep 1
# Parse the data in $SERVERFILE
cat $SERVERFILE
(
read hostport
read count
read ansys_sol
read tmp1
read fluent_sol
read tmp2
set `echo $hostport | sed 's/@/ /'`
echo $1 > out.port
echo $2 > out.host
echo $ansys_sol > out.ansys
echo $fluent_sol > out.fluent
) < $SERVERFILE
read host < out.host
read port < out.port
read ansys_sol < out.ansys
read fluent_sol < out.fluent
echo "Port number: $port"
echo "Host name: $host"
echo "Fluent name: $fluent_sol"
echo "Mechanical name: $ansys_sol"
echo "STARTING ANSYS"
# Run Ansys
mapdl -b -np $MECHANICAL_CPUS -scport $port -schost $host -scname "$ansys_sol" -i structural.dat> struct.out || scancel $SLURM_JOBID &
sleep 2
echo "STARTING FLUENT"
# Run Fluent
fluent 3ddp -g -t$FLUID_CPUS -scport=$port -schost=$host -scname="$fluent_sol" -i fluidFlow.jou > fluent.out || scancel $SLURM_JOBID &
wait
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I'm sure if the error is because there is some issue in my slurm script that I don't know.
Thanks,
Jirong