Ansys Products

Ansys Products

Problem running Maxwell on an HPC

    • tesla
      Subscriber

      When I run Maxwell 2021 R1 simulations on an HPC cluster, I get the following error.


      terminate called after throwing an instance of 'std::invalid_argument'

       what(): stoi


      The same simulations run correctly on a Windows 10 desktop installation of Maxwell 2021 R1. Additionally, the same simulations run correctly using Maxwell 2019 R1 on the HPC cluster.

      I would appreciate any help fixing this issue that you can provide. Thank you.

    • AndyJP
      Subscriber
      https://www.cplusplus.com/reference/string/stoi/
      A portmanteau for "string to int". Parses str interpreting its content as an integral number of the specified base, which is returned as an int value.
      ...a bug in the project data interpreter in RSM?
    • Randy Kosarik
      Ansys Employee
      Can you please provide more details regarding this situation.
      Is the HPC Cluster a scheduler environment - if so, what scheduler and version scheduler? LSF, PBS, SLURM, UGE, Windows server 2016 U3
      Are you submitting with batch?
      Are you submitting Windows to Linux using the AEDT Job Submission UI?

    • carpenterjj
      Subscriber
      We're getting the same issue on our HPC. Details as follows:
      Scheduler: Slurm 20.02.6
      OS: Red Hat 8.3
      ANSYS EM version: 2021R1
      Further info:
      Submission on HPC's login node using sbatch command.
      Failing command: ansysedt -ng -BatchSolve -Distributed -UseElectronicsPPE -machinelist num="${SLURM_NTASKS}" Patch_Antenna_Folded_Slot_CPW.aedt
      Running the same command via a Slurm interactive job (i.e. on a compute node) doesn't cause the issue. This only happens when it's submitted as a batch job.
      Error:
      terminate called after throwing an instance of 'std::invalid_argument'
      what():stoi
      /rds/bear-apps/2020a/EL8-cas/software/ANSYSEM/2021R1-GCCcore-9.3.0/AnsysEM21.1/Linux64/.setup_runtime: line 426: 3434145 Aborted(core dumped) "$@"

    • Randy Kosarik
      Ansys Employee
      Hi Officially, RHEL 8.3 is not tested/supported at AEDT21R1
      https://www.ansys.com/content/dam/it-solutions/platform-support/ansys-platform-support-strategy-plans-february-2021.pdf
      I suspect our included Intel MPI bits are failing and suggest the following.
      Create: /tmp/anstest/mpitest.sh with the following contents:
      #!/bin/bash
      #AEDT path
      BasePath=/rds/bear-apps/2020a/EL8-cas/software/ANSYSEM/2021R1-GCCcore-9.3.0/AnsysEM21.1/Linux64
      #MPIpaths
      MPIDir=${BasePath}/common/fluent_mpi/multiport/mpi/lnamd64/intel
      export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MPIDir}/lib
      export PATH=${PATH}:${MPIDir}/bin
      #MPItest call PIngPong
      ${MPIDir}/bin/mpirun -genv I_MPI_FABRICS shm:tcp-np 2 -host localhost IMB-MPI1 pingpong
      #===END===

      Correct format/permissions and run:
      $ dos2unix /tmp/anstest/mpitest.sh
      $ chmod +x /tmp/anstest/mpitest.sh
      $ /tmp/anstest/mpitest.sh

      Do you see a successful pingpong result?
      ex:
      -bash-4.1$ ./job.sh
      #------------------------------------------------------------
      #Intel (R) MPI Benchmarks 2018, MPI-1 part
      #------------------------------------------------------------
      # Date: Tue May 25 09:59:08 2021
      # Machine: x86_64
      # System: Linux
      # Release: 2.6.32-642.11.1.el6.x86_64
      # Version: #1 SMP Fri Nov 18 19:25:05 UTC 2016
      # MPI Version: 3.1
      # MPI Thread Environment:
      # Calling sequence was:
      # IMB-MPI1 pingpong
      # Minimum message length in bytes:0
      # Maximum message length in bytes:4194304
      #
      # MPI_Datatype:MPI_BYTE
      # MPI_Datatype for reductions:MPI_FLOAT
      # MPI_Op:MPI_SUM
      #
      #
      # List of Benchmarks to run:

      # PingPong

      #---------------------------------------------------
      # Benchmarking PingPong
      # #processes = 2
      #---------------------------------------------------
      #bytes #repetitionst[usec]Mbytes/sec
      010000.550.00
      110000.691.45
      210000.722.79
      410000.695.76
      810000.7011.42
      1610000.7122.44
      3210000.7244.41
      6410000.7882.15
      12810000.83153.48
      25610000.84305.47
      51210000.85605.52
      102410001.06970.18
      204810001.301577.87
      409610001.622532.97
      819210002.343501.63
      1638410003.904204.83
      3276810006.934731.44
      655366405.6111673.26
      1310723209.5313758.41
      26214416019.1513689.35
      5242888038.5013617.82
      10485764084.5812398.03
      209715220150.6013925.03
      419430410320.8513072.40


      # All processes entering MPI_Finalize

      -bash-4.1$

    • tesla
      Subscriber
      I am running the job on a SLURM environment. I am submitting the job by using the batchsolve command on the HPC. It looks like the solution for my issue was to remove the machinelist argument. That worked on 2019 R1, but it does not appear to be set up for the same syntax on 2021 R1. Maybe playing around with the order of that argument would allow it to remain and the command to work.
Viewing 5 reply threads
  • You must be logged in to reply to this topic.