You mention that you test Fluent SETUP, did you test using the same model (same mesh size, physics and setup)? The total time in the graph - Is it wall clock time or the simulation time?
In some cases, using more number of cores actually reduces the performance. See this section of the user's guide: 40.9. Checking and Improving Parallel Performance (ansys.com). I would also suggest to not display any report definitions (any graphics) while running the simulation if you are testing the wall clock time.