General Mechanical

General Mechanical

Varying “arrangements” of element densities cost so differently. Why?

    • piknockyou
      Subscriber

      Hello.

       

      I have computed a "similar" structural model in 2 different ways.

       

      I would appreciate it, if you could explain the significant discrepancies in computational cost regarding memory and CPU time between the 2 "similar" models.

       

      The geometry is identical.

      The boundary conditions and analysis settings are identical (e.g., load steps: 1, solver: direct).

      No nonlinearities available, thus, iterations/convergence are irrelevant.

      No adaptive meshing.

      The type of elements is identical (quadratic tetrahedras).

      The number of nodes is the same.

      The number of elements is the same.

      The number of equations/DOFs are the same.

      The computations were done in-core & on a CPU with 4 cores.

       

      The most significant difference between the 2 models is, in fact, the ELEMENT DENSITY:

      Model 1: a homogeneous element density is distributed over all the volume (fine).

      Model 2: a homogeneous element density is distributed over all the volume (coarse), except for one location where the mesh density is extremely high incl. a small growth rate.

       

      In the solve.outs, one can see the computational cost of the matrix factorizations comparing Model 1 to 2: 4 vs 12 lines.

      Comparing Model 1 to 2, there is over 6x the CPU time and over 3x the memory usage (in-core) necessary.

       

      With my limited knowledge of FEM theory, I would have intuitively estimated that similar computational effort would be required, if the same number of nodes, elements, equations/DOFs, and other identical boundary conditions are present.

      However, my intuition appears to be entirely incorrect.

      Therefore, I would be very grateful if you could explain to me:

       

      Why do varying "arrangements" of element densities lead to such huge discrepancies in computational cost?

       

      Thank you!

    • peteroznewman
      Subscriber

      What type of analysis was performed?

    • piknockyou
      Subscriber

      Static Structural

    • peteroznewman
      Subscriber

      What is the acceleration load?

    • piknockyou
      Subscriber

      25000 mm/s²

    • peteroznewman
      Subscriber

      I misunderstood at first, but now I am onto your question, which has nothing to do with material mass density, but element size!

      A relevant question, are you using the recommended default of Distributed Ansys?

      If so, insert the following Command into the Static Structural section of your model.

      DSPOPTION,,,,,,PERFORMANCE
       
      Put that command in each static structural model and solve. That will cause the Solution Output to contain all the performance data generated during the solution.
       
      The Direct solver, also called a Sparse Solver, means that once the stiffness matrix is assembled, most of the values in the matrix are zeros and the number of non-zero values in the matrix is an important characteristic on how many floating point operations are required, see the output below. I didn’t make two models have an identical number of nodes, but I made them close enough as you can see in the number of equations.
       
      When a stiffness matrix is created, the solver reorders the rows to optimize the matrix inversion. That reordering puts non-zero numbers near the diagonal and keeps the far-off diagonal values as zeroes. The more connections an element has to other elements increases the width of non-zero values near the diagonal. A model with beam elements arranged in a single line has the minimum width of non-zero values about the diagonal.  3D solid elements have a much wider set of non-zero values about the diagonal. This is called the bandwidth of a matrix. It seems that the sparse matrix is being strongly affected by the connections between the elements when the mesh is less uniform.
       
      There are many methods to solve sparse linear systems. Section 10 of this paper describes Frontal Methods. Maybe that is relevant when you look in the solution output below and you can see the maximum size of a front matrix which is 6.5 million for the uniform mesh and 27.8 million for the non-uniform mesh.
       
      Below are sections from the two Solution Output files, labelled with the two types of mesh.
      UNIFORM MESH
       ===========================
       = multifrontal statistics =
       ===========================

           number of equations                     =          179178
           no. of nonzeroes in lower triangle of a =         6694203
           no. of nonzeroes in the factor l        =        96111205
           ratio of nonzeroes in factor (min/max)  =          0.8874
           number of super nodes                   =            6267
           maximum order of a front matrix         =            3597
           maximum size of a front matrix          =         6471003
           maximum size of a front trapezoid       =         4566471
           no. of floating point ops for factor    =      1.2764D+11

        Solver Memory allocated on core    0       =      331.057 MB
        Solver Memory allocated on core    1       =      304.230 MB
        Solver Memory allocated on core    2       =      308.788 MB
        Solver Memory allocated on core    3       =      293.040 MB
        Total Solver Memory allocated by all cores =     1237.114 MB

        DSP Matrix Solver         CPU Time (sec) =          5.078
        DSP Matrix Solver     ELAPSED Time (sec) =          5.101
        DSP Matrix Solver      Memory Used ( MB) =        331.057

        EQUIL ITER   1   CPU TIME =   6.797      ELAPSED TIME =   6.454
      NON-UNIFORM MESH
       ===========================
       = multifrontal statistics =
       ===========================

           number of equations                     =          181173
           no. of nonzeroes in lower triangle of a =         7717884
           no. of nonzeroes in the factor l        =       231164534
           ratio of nonzeroes in factor (min/max)  =          0.8010
           number of super nodes                   =            5671
           maximum order of a front matrix         =            7458
           maximum size of a front matrix          =        27814611
           maximum size of a front trapezoid       =        15050733
           no. of floating point ops for factor    =      8.0454D+11

        Solver Memory allocated on core    0       =      769.905 MB
        Solver Memory allocated on core    1       =      763.813 MB
        Solver Memory allocated on core    2       =      672.730 MB
        Solver Memory allocated on core    3       =      638.205 MB
        Total Solver Memory allocated by all cores =     2844.652 MB

        DSP Matrix Solver         CPU Time (sec) =         21.734
        DSP Matrix Solver     ELAPSED Time (sec) =         21.757
        DSP Matrix Solver      Memory Used ( MB) =        769.905

        EQUIL ITER   1   CPU TIME =   23.75      ELAPSED TIME =   23.44
Viewing 5 reply threads
  • You must be logged in to reply to this topic.