Comparing performance is very challenge and depends several factors. Lumerical's FDTD is the benchmark in the maret for a long time and with proper settings it is the fastest of its kind, in particular in 3D simulations.
Are you only interested in the quality factor and resonance frequency, or total transmission? if it is only qualifactor and resonance, it should be much faster as it does not need to simulate to fulfill the autoshutoff min, with either Fourier transform method or findresonance. There are several examples online. As early as in 2008 Harvard is the first to demonstrate Q ~1 million from a phtonic crystal cavity.
If it is for transmission, autoshutoff min should be fulfilled. However it may not need about 12 hours. FDTD, in particular Lumerical FDTD is well designed for parallel distributed computing. However limited by the hardware configration and specs, the "scale" may not be linear as the number of cores. that said, more cores do not necessarily mean faster simulation. Too many cores for small jobs can actually hinder the simulation speed. You will need to test the Resource configuration and see how many cores/processes/threads are the most efficient, by checking the estimated simulation time.
I have a post for slow convergence: https://forum.ansys.com/forums/topic/ansys-insight-slow-convergence-in-fdtd/#post-205484
Please have a look.
If it is still a problem please give some screenshots for the device and FDTD settings. and if you have the premium support you can also email us.