How to perform optimization over terminal in linux without GUI?

RyanChungRyanChung Member Posts: 2

Hi, I was wondering that if it's possible to perform the optimization, say PSO with 5 parameters 20 particles and 50 generations, on the linux operating system without graphical supported?


I found that sweep can be done by using the feature "save to files" followed by uploading, finishing the simulations on linux without GUI (but by MPI engine solver such as fdtd-engine-impi-lcl), downloading back to the local pc with GUI, and the results of interest can be obtained by "load from files". But it doesn't work for optimization which don't have the function of "save to files" or "load from files". Is there any other way to run optimization on linux without graphical supported if the engine license number is sufficient?


Thanks

Best Answers

  • greg_baethgegreg_baethge Posts: 120Ansys Employee
    Accepted Answer

    Hi @RyanChung,

    Thank you for posting your question. Unfortunately, while for a sweep, it is possible to generate all the files and then run them independently from the GUI, for optimizations, the GUI is used to create the new files from the results of the previous generation.

    A possibility could be to use the job scheduler integration, to launch the jobs remotely on the Linux machine, but this would require to have a job scheduler installed on the machine and some modification in the python script we use to enable file transfer from your local machine to the Linux one.

  • greg_baethgegreg_baethge Posts: 120Ansys Employee
    Accepted Answer

    Unfortunately, our scheduler integration currently doesn't support PBS (we support SGE, Slurm, Torque and LSF). I guess it could be possible to adapt one of these scripts for PBS. Note you don't have to run the python script, when you select one of the "Job Scheduler: ..." option in the resource manager, the python script is used to submit the job and get the output.


    I had a look at the script, I think there's only 2 functions with parts that are scheduler dependent:

    def submit_job(submission_script, submission_command, job_name):
       cmd = submission_command + ['--job-name=' + job_name, '--output=' + job_name + '-%j.out']
       if USE_SSH:
           cmd = ssh_wrapper(cmd)
    
       log('Submission Command: ' + ' '.join(cmd))
       log('Submission Script:\n' + '\n'.join(submission_script))
       p = subprocess.Popen(cmd, stdin=subprocess.PIPE,stdout=subprocess.PIPE, encoding='utf8', universal_newlines=True)
       p.stdin.buffer.write('\n'.join(submission_script).encode()) # ensure unix style line endings are used
       p.stdin.close()
    
       result = p.stdout.readlines()
       log('Submission Response:\n' + '\n'.join(result))
    
       result = result[0].split()
       job_id = result[3].rstrip()
       assert(job_id.isdigit())
       log(f'Submission successful, Job ID: {job_id}')
       return job_id
    

    and

    def job_in_queue(job_id):
       cmd = ['squeue', '-j' + str(job_id)]
       if USE_SSH:
           cmd = ssh_wrapper(cmd)
       job_status = check_output(cmd)
       for line in job_status.splitlines():
           possible_job_id = line.strip().split()[0]
           if possible_job_id == str(job_id):
               return True
       return False
    

    For the first one, the job_id extraction should be adapted as the output string could be different. For the second one, the command to get the jobs in the queue (squeue in Slurm) is probably different.

    Then, in:

    USE_SSH = False
    USE_SCP = False
    CLUSTER_CWD = ''
    if USE_SSH:
       USER_NAME = "centos" # usernames can dynamically be assigned with: import getpass; getpass.getuser() 
       SSH_LOGIN = f"{USER_NAME}@<master-node-ip>"
       SSH_KEY = expanduser('~/.ssh/<private-key>.pem')
       # CLUSTER_CWD = "$HOME/Project_1/" # NFS directory shared by all nodes, can be '$HOME/' or a full path, ending in '/'
    if not USE_SCP:
       PATH_TRANSLATION = ('','') # Default, no translation. paths are 1:1 match. Use this in the case that the GUI is running on the cluster already.
       # PATH_TRANSLATION = ("X:/Working Dir/", "/share/working/dir/") # use unix-style path deliminters '/'. Use this in the case that there is a shared file system on a windows machine to a linux cluster, or if the host machine has different mount points for the shared directory.
    

    USE_SSH and USE_SCP should be set to "True" and "CLUSTER_CWD" should be set to the folder where the simulation files should be copied. Then you need to have a passwordless connection enabled (you have to have a ssh private and public keys. You should probably check with the

Answers

  • RyanChungRyanChung Posts: 12Member

    Hi, thanks for the reply. The cluster we use DO have the similar job scheduler called PBS which I don't really understand, but I'll try it.


    Thanks again for the help.

  • RyanChungRyanChung Posts: 12Member
    edited May 13

    Hi, I've already finished the job scheduler information for the cluster I'm using. But where do I need to modify for the login information, say username IP and password of cluster, or what should I do before I use this modified job scheduler for simulation?

    Is the python script in the website I should use?

    Do I need to run the python file for the simulation??


    Thanks

  • RyanChungRyanChung Posts: 12Member

    Thanks for the reply. I had finished the modification of the python file at the assignment part in the beginning, the modification for PBS job script in the "resource advanced options" and generated the private ssh key called id_rsa (id_dsa) for the SSH_KEY required for the .py script. Although the resource test in the local CAD is successful, the job keeps showing "initializing" when I try to run the simulation. So I login to the cluster node to see what happened there. It seems that the PBS job script was not created and submitted. And the .fsp file was not uploaded. I'm guessing there's something wrong with the connection between local PC and the cluster node.

    Is there a simple way to ensure that the connections with the use of ssh and scp are successful before I start to try the steps mentioned above??

  • greg_baethgegreg_baethge Posts: 120Ansys Employee
    edited May 13

    I think you can try and "manually" test ssh and scp from your local machine. To test ssh:

    ssh -i <ssh_key>  <ssh_login>
    

    where <ssh_key> is the path to the key (for example, /home/user/.ssh/private.pem) and <ssh_login> the login information in the form user@host, where host is either the hostname or IP address of the remote machine.

    For scp:

    scp -i <ssh_key> <file> <ssh_login>:<destination_path>
    

    where <file> is a file to copy to the remove machine, and <destination_path> the full path to the folder where you want to copy the file. Make sure the path is correct and exists.

  • RyanChungRyanChung Posts: 12Member
    edited May 13

    Thanks. I'd found the usage of them in the py file and the scp works well with the private key, i.e. id_ecdsa, after I use the chmod to change the mode of the key on the cluster submission node. However, with the same private key, ssh command keeps asking me to put the password along with the OTP. I'd tried something but still got the request of psw and otp.

    Can you tell me what might be useful to this problem?

  • greg_baethgegreg_baethge Posts: 120Ansys Employee

    Unfortunately, my knowledge of ssh is fairly limited! Do you get any error message?

  • RyanChungRyanChung Posts: 12Member
    edited May 14

    No error, only the request of password and OTP, like the optional argument -i <secret_key> doesn't exist.

    It seems that the primary cluster node for job submission but the data-exchange node has been forbidden to login with only the private ssh key.

    I'll try something else like the interactive modules of python to login with the psw and the extracted TOTP.


    Thanks

  • RyanChungRyanChung Posts: 12Member
    edited May 15

    Hi, I still have problem with the sweep and optimization on the non-GUI linux cluster.

    For now, I've tried all the possibilities I can think of and I still can't use the GUI-supported command on cluster and I can't log in only with the private ssh key via "ssh -i <key>" even after I ssh-keygen on the local PC, scp the public key, and cat into the authorized_keys file on the remote login cluster (i.e. the submission cluster). So I can't optimize or sweep with the GUI-supported job manager on the local PC. I'm pretty sure that the pay-for-use cluster don't let users to ssh without psw and TOTP but with only private key, and my guess is that it's due to the different psw between the login node and the compute nodes and users don't get to know the psws of compute nodes. So I'm now stuck and looking for other possible ways.

    Since the parameterized structure can be achieved by lsf file but still the GUI *-solutions should be called, and there's no usage of calling engine with the optional argument of lsf file in the terminal, sweep or optimization becomes difficult to be performed on the non-GUI supported linux cluster if the passwordless ssh method is not allowed.

    I'm curious about that if it's possible to run the parameterized .fsp file by using the "userprop" in the ::model, say calling it in terminal via "*engine-solver-mpi *.fsp -R 5e-6" implying the radius of the model is assigned with 5 um and the model is gonna be solved and managed by engine-solver-mpi, or run the parameterized model by the combination of engine-solver-mpi and the lsf file and the results can be obtained through the script feature "write" into the txt file, so that the GUI-supported job manager can be avoided and users might be able to apply user-defined job scheduler script in shell or in python file for submission?


    Or maybe the job manager on the local GUI-supported PC can have one more login method with interactive console, allowing users input psw and TOTP by themselves, and all the jobs of each sweep or each generation of optimization can be submitted within only one job script (avoiding the queue waiting time and under the situation of the queue stat is not always given) like this:

    #!......

    #PBS ...

    #PBS ...

    #PBS ...

    .

    .


    module load ...

    <mpirun> <lumerical-engine-mpi-lcl> -logall -remote <cad file1 such as .fsp, .lms, or .ldev>

    <mpirun> <lumerical-engine-mpi-lcl> -logall -remote <cad file2>

    <mpirun> <lumerical-engine-mpi-lcl> -logall -remote <cad file3>

    <mpirun> <lumerical-engine-mpi-lcl> -logall -remote <cad file4>

    .

    .

    .

    .

    But by this way, the manually interactive ssh login for job submission have to be performed for each sweep or each generation.



    Or maybe I could perform sweep or optimization, right click (after creation of job files) on the job for pause, scp and ssh to the cluster for data exchangement and job submission via interactive terminal by manually, download back to local PC for substitution of the job files, hit the "quit and don't save" or "force quit" to end the sweep or generation and the job manager will have to collect the results of all job files, and then again the steps above if there's next generation ??

    Still, this might require the ssh login manually for each sweep or each generation.




    Anyway, any advice would be highly appreciated if you could give us some. Thanks.


    Have a nice weekend

  • greg_baethgegreg_baethge Posts: 120Ansys Employee

    Hi @RyanChung,

    Apologies for the delay and sorry you're still having issues with this. I find it quite confusing you are able to scp but not ssh! I wouldn't see why it wouldn't be allowed, but I guess this is something you should check with the cluster's admins.

    Regarding your other points, unfortunately, only the GUI can set up the file (ie run script, modify the structure, etc.). The engine can only run the actual FDTD calculation. Also, our optimization tool wasn't build to run that way, it has to be able to run the simulations, get the results to generate the next lot of files, etc. An alternative could be to drive the optimization using other mean, for instance via Python or Matlab. If the optimization is run "externally", it could be possible to pause it while to get the files on the cluster, run them, etc. That said, I still think it would be better if you could get ssh work to avoid all these manual steps.

  • RyanChungRyanChung Posts: 12Member
    edited May 18

    Thanks for the reply. Although the optimization can be performed by the external mean such as python or matlab, it seems to me that the results data still need to be extracted via CAD (i.e. fdtd-solutions -run script.lsf to export, I'd done that before) and there's no way to obtain the FoM without CAD supported. That's why I hope the command for engine-solver could be more flexible to be used. If there's any way to extract results without lsf or CAD but only with the engine-solver, plz let me know how. Thanks again.

  • greg_baethgegreg_baethge Posts: 120Ansys Employee

    Unfortunately, this is not possible :( the engine can only process the FDTD calculation. Extracting data can only be done from the GUI.

Sign In or Register to comment.