randyk
Ansys Employee

Hi Ziqi,

What is the OS version:   cat /etc/*release
What is the SLURM version:  sinfo -V

Note that AnsysEM2021R2 was the first release to officially support SLURM. There were a few bugs/workarounds needed.  I would consider migrating to a current version AEDT.

Here are my notes regarding AEDT2021R2 on SLURM

If you get a case with AEDT2021R2 and SLURM 20.x - run through this first.

 
Important,  customer must be running AEDT 2021R2 and SLURM 20.x for the following steps
-- You would want to gather the following to confirm scheduler version and network info
$ cat /etc/*release
$ sinfo -V
$ ifconfig
 
 
Enabling tight integration change - this requires editing slurm_srun_wrapper.sh and setting batchoption 'RemoteSpawncommand'
 
1.. (TFS447753)  edit the slurm_srun_wrapper.sh  
archive/copy .../AnsysEM21.2/Linux64/schedulers/scripts/utils/slurm_srun_wrapper.sh to slurm_srun_wrapper.sh.ORIG
 
edit .../AnsysEM21.2/Linux64/schedulers/scripts/utils/slurm_srun_wrapper.sh and insert the following at line 28: host=$(echo "${host}" | cut -d'.' -f1)
ex:
if [[ -n "$ANSYSEM_SLURM_JOB_ID" ]]
then
 export SLURM_JOB_ID="${ANSYSEM_SLURM_JOB_ID}"
 echo "set SLURM_JOB_ID=${SLURM_JOB_ID}" >> "$DEBUG_FILE"
fi
=> # tfs447753
=> host=$(echo "${host}" | cut -d'.' -f1)
verStr=scontrol --version
 
 
2.. (TFS554680)  If SLURM version 20.0-20.10 - Generic SLURM scheduler integration does not check for SLURM minor version
     srun '--overlap' option was introduced in SLURM VERSION 20.11
     slurm_srun_wrapper.sh only checks if the version is >= 20 and attempts to apply the srun '--overlap' option
edit .../AnsysEM21.2/Linux64/schedulers/scripts/utils/slurm_srun_wrapper.sh and comment out the appropriate srun and if/else/fi lines at bottom of file
 
# if [ "${ver[0]}" -ge 20 ];then # SLURM ver >= 20.**.**
#     echo "srun --overcommit --overlap --export=ALL -n 1 -N 1 --cpu-bind=none --mem-per-cpu=0 -w ${host} $@" >> "$DEBUG_FILE"
#     srun --overcommit --overlap --export=ALL -n 1 -N 1 --cpu-bind=none --mem-per-cpu=0 -w $host "$@"
# else
    echo "srun --overcommit --export=ALL -n 1 -N 1 --cpu-bind=none --mem-per-cpu=0 -w ${host} $@" >> "$DEBUG_FILE"
    srun --overcommit --export=ALL -n 1 -N 1 --cpu-bind=none --mem-per-cpu=0 -w $host "$@"
# fi
 
This should enable customer to run manual jobs for adaptive meshing, and then auto jobs for frequency sweeps. 
 
 
 
3.. After making the slurm_srun_wrapper.sh file change, run the following to set defaults to tight integration   (will create/modify  //AnsysEM21.2/Linux64/config/default.XML)
 
as root or installation owner
cd /opt/AnsysEM/AnsysEM21.2/Linux64
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Desktop/Settings/ProjectOptions/ProductImprovementOptStatus" -RegistryValue 0 -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "HFSS/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "HFSS 3D Layout Design/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Maxwell 2D/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Maxwell 3D/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Q3D Extractor/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Icepak/MPIVendor" -RegistryValue "Intel" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "HFSS/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "HFSS 3D Layout Design/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Maxwell 3D/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Maxwell 2D/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Q3D Extractor/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Icepak/RemoteSpawnCommand" -RegistryValue "scheduler" -RegistryLevel install
./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Desktop/Settings/ProjectOptions/ProductImprovementOptStatus" -RegistryValue 0 -RegistryLevel install
# ./UpdateRegistry -set -ProductName ElectronicsDesktop2021.2 -RegistryKey "Desktop/Settings/ProjectOptions/AnsysEMPreferredSubnetAddress" -RegistryValue "192.168.1.0/24" -RegistryLevel install
 
 
 
4.. An example batch script:
 
Create %HOME/anstest/job.sh with the following contents (correct highlighted):
 
#!/bin/bash
#SBATCH -N 3        # allocate 3 nodes
#SBATCH -n 12       # 12 tasks total
##SBATCH --exclusive    # no other jobs on the nodes while job is running
#SBATCH -J AnsysEMTest   # sensible name for the job
 
#Set job folder, scratch folder, project, and design (Design is optional)
JobFolder=$(pwd)
ProjName=OptimTee-DiscreteSweep-FineMesh.aedt
DsnName="TeeModel:Nominal"
 
# Executable path and SLURM custom integration variables
AppFolder=/opt/AnsysEM/AnsysEM21.2/Linux64
 
# setup environment and srun
export ANSYSEM_GENERIC_MPI_WRAPPER=${AppFolder}/schedulers/scripts/utils/slurm_srun_wrapper.sh
export ANSYSEM_COMMON_PREFIX=${AppFolder}/common
export ANSYSEM_TASKS_PER_NODE=${SLURM_TASKS_PER_NODE}  
 
# setup srun
srun_cmd="srun --overcommit --export=ALL -n 1 --cpu-bind=none --mem-per-cpu=0 --overlap "
# note: srun '--overlap' option was introduced in SLURM VERSION 20.11"
 
 
# MPI timeout set to 30min default for cloud suggest lower to 120 or 240 seconds for onprem
export MPI_TIMEOUT_SECONDS=120
 
# System networking environment variables - HPC system dependent should not be user edits!
# export ANSOFT_MPI_INTERCONNECT=ib
# export ANSOFT_MPI_INTERCONNECT_VARIANT=ofed
 
# Skip dependency check
# export ANS_NODEPCHECK=1
 
# Autocompute total cores from node allocation
CoreCount=$((SLURM_JOB_NUM_NODES * SLURM_CPUS_ON_NODE))
 
# Run Job
${srun_cmd} ${AppFolder}/ansysedt -ng -monitor -waitforlicense -useelectronicsppe=1 -distributed -auto -machinelist numcores=$CoreCount -batchoptions "" -batchsolve ${DsnName} ${JobFolder}/${ProjName}
 
 
 
Then run it:
$ dos2unix $HOME/anstest/job.sh
$ chmod +x  $HOME/anstest/job.sh
$ sbatch $HOME/anstest/job.sh
 
When complete, send resulting:
$HOME/anstest/OptimTee-DiscreteSweep-FineMesh.aedt.batchinfo/*.log
 
 
5. If customer using Windows to Linux submission
Ansoftrsmservice cannot run as root.
-- make sure to set ANSYSEM_GENERIC_EXEC_PATH before starting ansoftrsmservice
--  In ansoftrsmservice.cfg, the content should be:
$begin 'Scheduler'
   'SchedulerName'='generic'
   'ConfigString'='{"Proxy":"slurm", "Data":""}'
$end 'Scheduler'