Lumerical Python API doesn't Use the full capacity of the assigned CPU in HPC

DebinDebin Member Posts: 2

Hi Lumerical Team

I tried to run Lumerical python API on HPC. The process is:

Running Python on HPC --> Python creates pattern--> Lumerical python API read the pattern and create simulation file through .lsf script ----> Running created simulation file on HPC ---> Python read the results.

For this process, we have assigned 4 nodes with 24 CPUs and 24 microprocessors to the job which are 96 microprocessors and CPUs in total. However, while running the simulation, it will only use 8 microprocessors which results in a long simulation time. Also, during this process, only one "lum_fdtd_solve" will be occupied.

For the other case where we create our simulation file on a local desktop and just use the lum_fdtd_solve to solve the simulation, it will use the whole assigned capacity.

Could you help me with this?

Tagged:

Answers

  • LitoLito Posts: 212Ansys Employee

    @Debin,

    Can you share how you are running the job? Paste the command(s) you used to run the simulation on your cluster. Do not send any attachments as we are not allowed to download them. Thanks!

  • DebinDebin Posts: 9Member

    Hi Lito

    I use the pbs submission file to submit the python code to run on HPC as shown in the figure below.

    The python code basically create the pattern and use python lumerical api to run the simulation. What we use to run is just FDTD.run, FDTD.save some commands like that.


  • LitoLito Posts: 212Ansys Employee
    edited November 13

    @Debin,

    The Automation/Python API is part of our scripting environment. This will take the resource configuration settings from the FDTD CAD when running the simulation. i.e. the resource configuration set in the CAD/GUI is stored in your user's preference "FDTD Solutions.ini" file. This is used by the script to run the simulation. It will not run using the resources set in your submission script. If you want to use the resources set in your submission script you will have to run the "simulation.fsp" file directly by something like:

    mpirun /path/to/Lumerical/installation/bin/fdtd-engine-impi-lcl -t 1 simulationfile.fsp  
    


  • DebinDebin Posts: 9Member

    Hi Lito

    Thank you for your reply. Unfortunately, we don't have CAD/GUI capability on our HPC. We use pyvirtual display to make the process working for lumapi to create .fsp file on HPC and run it. Is there any other way we can make the process working on HPC and assign the correct node number and microprocessor to the job?

  • LitoLito Posts: 212Ansys Employee

    @Debin,

    As you are using a virtual display and scripts on your cluster you can also set your resources via the script which will save your setting into the "FDTD Solutions.ini" file into your user/.config/Lumerical folder.

  • DebinDebin Posts: 9Member

    Hi Lito

    Thank you for your reply. The case is that our node has 24 microprocessors and 24 CPUs. If we assign 2 nodes or more than 2 nodes, the simulation will be terminated at the middle of the stage. It won't start running for some reason. I am not quite sure how to solve this problem.


    Best Regards

    Debin Meng

  • LitoLito Posts: 212Ansys Employee
    edited November 16

    How did you set your resources? Have you tried running with 1 node using all available cores on 1 node?

  • DebinDebin Posts: 9Member

    Hi Lito

    THis is the FDTD.ini

    [General]

    SaveBackupFile=true

    backgroundB=@Variant(\0\0\0\x87\0\0\0\0)

    backgroundG=@Variant(\0\0\0\x87\0\0\0\0)

    backgroundR=@Variant(\0\0\0\x87\0\0\0\0)

    geometry8.26=@ByteArray(\x1\xd9\xd0\xcb\0\x3\0\0\0\0\a\x80\0\0\0\x17\0\0\xe\xff\0\0\x4\xf\0\0\a\x80\0\0\0\x17\0\0\xe\xff\0\0\x4\xf\0\0\0\0\0\0\0\0\xf\0\0\0\a\x80\0\0\0\x17\0\0\xe\xff\0\0\x4\xf)

    lastexpirewarning=@DateTime(\0\0\0\x10\0\0\0\0\0\0%\x87\x89\0\xe3\xa0\x42\0)

    mainWindowVersion8.26=@ByteArray(\0\0\0\xff\0\0\0\0\xfd\0\0\0\x3\0\0\0\0\0\0\0\xce\0\0\x2\xb3\xfc\x2\0\0\0\x2\xfb\0\0\0\x18\0\x64\0o\0\x63\0k\0T\0r\0\x65\0\x65\0V\0i\0\x65\0w\x1\0\0\0N\0\0\x1\xd3\0\0\0\x92\0\xff\xff\xff\xfb\0\0\0\x1c\0\x64\0o\0\x63\0k\0R\0\x65\0s\0u\0l\0t\0V\0i\0\x65\0w\x1\0\0\x2'\0\0\0\xda\0\0\0[\0\xff\xff\xff\0\0\0\x1\0\0\x2?\0\0\x2\xb3\xfc\x2\0\0\0\x1\xfc\0\0\0N\0\0\x2\xb3\0\0\0\xc7\x1\0\0\x1a\xfa\0\0\0\0\x2\0\0\0\x3\xfb\0\0\0 \0\x64\0o\0\x63\0k\0S\0\x63\0r\0i\0p\0t\0\x45\0\x64\0i\0t\0o\0r\x1\0\0\0\0\xff\xff\xff\xff\0\0\0\xac\0\xff\xff\xff\xfb\0\0\0(\0\x64\0o\0\x63\0k\0\x43\0o\0m\0p\0o\0n\0\x65\0n\0t\0L\0i\0\x62\0r\0\x61\0r\0y\x1\0\0\0\0\xff\xff\xff\xff\0\0\0\x13\0\xff\xff\xff\xfb\0\0\0\x30\0O\0p\0t\0i\0m\0i\0z\0\x61\0t\0i\0o\0n\0 \0\x44\0o\0\x63\0k\0 \0W\0i\0\x64\0g\0\x65\0t\x1\0\0\0\0\xff\xff\xff\xff\0\0\0}\0\xff\xff\xff\0\0\0\x3\0\0\aX\0\0\0\xd5\xfc\x1\0\0\0\x2\xfb\0\0\0 \0\x64\0o\0\x63\0k\0S\0\x63\0r\0i\0p\0t\0P\0r\0o\0m\0p\0t\x1\0\0\0(\0\0\x3\xcb\0\0\0y\0\a\xff\xff\xfb\0\0\0.\0\x64\0o\0\x63\0k\0S\0\x63\0r\0i\0p\0t\0W\0o\0r\0k\0s\0p\0\x61\0\x63\0\x65\0V\0i\0\x65\0w\x1\0\0\x3\xf9\0\0\x3\x87\0\0\0J\0\xff\xff\xff\0\0\x4?\0\0\x2\xb3\0\0\0\x4\0\0\0\x4\0\0\0\b\0\0\0\b\xfc\0\0\0\x2\0\0\0\0\0\0\0\x4\0\0\0\x16\0\x65\0\x64\0i\0t\0T\0o\0o\0l\0\x62\0\x61\0r\x3\0\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x18\0m\0o\0u\0s\0\x65\0T\0o\0o\0l\0\x62\0\x61\0r\x3\0\0\0\xb0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x16\0v\0i\0\x65\0w\0T\0o\0o\0l\0\x62\0\x61\0r\x3\0\0\x1\x87\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x18\0\x61\0l\0i\0g\0n\0T\0o\0o\0l\0\x62\0\x61\0r\x3\0\0\x1\xd7\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x2\0\0\0\x3\0\0\0\x16\0m\0\x61\0i\0n\0T\0o\0o\0l\0\x62\0\x61\0r\x1\0\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x1e\0s\0i\0m\0u\0l\0\x61\0t\0\x65\0T\0o\0o\0l\0\x62\0\x61\0r\x1\0\0\x3\xdd\xff\xff\xff\xff\0\0\0\0\0\0\0\0\0\0\0\x12\0s\0\x65\0\x61\0r\0\x63\0h\0\x42\0\x61\0r\x1\0\0\x4\xb2\xff\xff\xff\xff\0\0\0\0\0\0\0\0)

    perspectiveView=true

    pwd=/scratch/RDS-FEI-Lidar-RW/dmen7911/new_test

    updateCheckDate=@Variant(\0\0\0\xe\0%\x87\x90)

    viewFilenameInTitlebar=true

    viewMode=0

    xyView=true

    xzView=true

    yzView=true


    [jobmanager]

    FDTD=<engines><engine><name>Local Host</name><host>localhost</host><nProc>24</nProc><nThread>1</nThread><nCapacity>1</nCapacity><active>true</active><solverName>FDTD</solverName><exeName>fdtd-engine-mpich2nem</exeName><multiHosts>true</multiHosts><multiProcs>true</multiProcs><advanced><runType>Remote: MPICH2</runType><mpiPath>/usr/local/lumerical/2021-R2.3/mpich2/nemesis/bin/mpiexec</mpiPath><exePath>/usr/local/lumerical/2021-R2.3/bin/fdtd-engine-mpich2nem</exePath><useBinding>false</useBinding><logAll>false</logAll><extraMpiOptions></extraMpiOptions><suppressDefaultMPI>false</suppressDefaultMPI><suppressDefaultEngine>false</suppressDefaultEngine><extraOptions></extraOptions><ckptDirectory></ckptDirectory><bypassLocalMpi>false</bypassLocalMpi><schedulerCommand></schedulerCommand><schedulerSubmissionScript>#!/bin/sh\nmodule load intel-mpi\nmodule load lumerical/2021-R2.3\nmpirun fdtd-engine-impi-lcl -logall -remote {PROJECT_FILE_PATH}</schedulerSubmissionScript></advanced></engine></engines>

    We are using PBS job scheduler to submit and run the job to the HPC.

  • LitoLito Posts: 212Ansys Employee

    @Debin,

    Try to upgrade to the latest release, 2021 R2.4. We have seen some issues with 2021 R2.3 on Linux without a GUI or X-server. Running the Python script is done through the GUI and set the resource as shown in this article in our KB when using a job scheduler. One thing to check as well is if you can run a "simulation.fsp' file directly using the job scheduler on more than 1 node:

    #PBS -1 select=2:ncpus=24:mem=24GB:mpiprocs=24
    cd /path/to/simulationfile.fsp
    module load lumerical/2021-R2.4
    module load intel-mpi
    mpirun /path/to/Lumerical/installation/bin/fdtd-engine-impi-lcl -t 1 /your_simulationfile.fsp
    
  • DebinDebin Posts: 9Member

    Hi Lito

    We are able to assign more than 1 node to the job by directly uploading the simulation file to HPC and running it by submitting the PBS job scheduler-based script. However, if we use python assisted way to run the process and assign more than one node to the job, the simulation will be terminated. Also, since we are using the PBS job scheduler, I am not sure how we can make the process work even we have the graphical node on HPC. The resource setting seems to only support job schedulers in the type of torque, LSF, slurm, and SGE.


    Best Regards

    Debin Meng

  • LitoLito Posts: 212Ansys Employee

    @Debin,

    We are able to assign more than 1 node to the job by directly uploading the simulation file to HPC and running it by submitting the PBS job scheduler-based script. 

    Are you using 2021 R2.3 when running simulations directly on the HPC/cluster using the job submission script to run on more than 1 node per your message above?

    We use pyvirtual display to make the process working for lumapi to create .fsp file on HPC and run it.

    If you are already using "virtual display", you can set the resources in your python script. Run the Python script using the job scheduler submission script and the Python script will do the rest of setting the resource for Lumerical to use when running any simulation file on the cluster using the specs requested on your job submission script.

  • DebinDebin Posts: 9Member

    Hi Lito

    We are using R2.3 at the moment. We will ask the HPC team to upgrade the version for us. Thank you.

Sign In or Register to comment.