Fluids

Fluids

When reading a data file of 23GB by Fluent, Error: Invalid section id: 9$??8q??8o??8ys8v

    • lei2019
      Subscriber

      Since the data file is big, I request 150GB memory in HPC. but it still has error. Pls give me some suggestions.


    • Keyur Kanade
      Ansys Employee
      Is it data file of 23 GB?nCan you please check by increasing memory? nAre you able to read any small data file with 150GB memory?nPlease go through help manual for more details nRegards,nKeyurnHow to access Ansys Online Help DocumentnHow to show full resolution imagenGuidelines on the Student CommunitynHow to use Google to search within Ansys Student Communityn
    • lei2019
      Subscriber
      Thank you, Kanade.nyes. the data file is 23 GB. (.dat file)nyes. I can read smaller files; since I run LES, I save the file automatically. Fluent could read 5 gb .dat.gz file. but if .dat.gz file greater than 6gb, then I will receive this error.nI post my HPC staff's reply here: nLei, I have been unable to successfully read the 20+GB .dat files. I made a very simple journal file that would read the .cas file and then one fo the .dat files. No matter how I ran it, single-threaded, multi-threaded, etc. It would fail with something along the lines of: Error: Invalid section id: �9u000bError Object: #f gzip: stdout: Broken pipe When monitoring the jobs, I would see them use as much as 100GB of RAM before exiting, but they were on a node where I had requested 512gb of RAM, so memory pressure was not an issue. I think at this point you may need to consult with your colleagues or with ANSYS to determine why these files cannot be read.n----------------------------------nThe system info isnnnI also submit a request in customer port: 11067416291. pls help me on this issue. it has been bugging us for more than a year. my colleague's post is attached.nhttps://forum.ansys.com/discussion/15214/error-invalid-section-id-error-object/p1n
    • Keyur Kanade
      Ansys Employee
      There is invalid section id errro. This is somewhat generic error. nPlease check if you can read this file in serial mode. nAlso please check in which version this file is created.n
    • Rob
      Ansys Employee
      Is the file being saved locally or onto a network drive? Given the size it's possible data packet loss is responsible for the problem. n
    • lei2019
      Subscriber
      Thank you Kanade and Rob. The module I use is module load ansys/fluids_19.2. So the version is R19.2.nWhen I can't read it in 40 cores, somehow I can read dat.gz file with 6GB in 80 cores. But I can't request more CPU resource; 40 cores to run two weeks is my quota. Now I have to run the case day by day in order to request 80 cores. nThe case has run on HPC, and files are saved on HPC; I think it is local drive.nSince I doing coal combustion with DPM model, the number of particles at flow time 1.6 s is about 15 million. each time step, 9400 particles are injected into the burner. it is one of the reasons that file size keeps growing.
    • Rob
      Ansys Employee
      I wonder if you've run out of RAM, how much RAM is available per core? If you write out unzipped you can sometimes reduce the RAM need, but finish up with very big files. n
    • lei2019
      Subscriber
      3 gb per core. nUpdate:nNow my data file is 7.6 GB. I use 120 cores, memory 360 gb;nnn
    • lei2019
      Subscriber
      nnreading 19186200 particles for injection injection-0.nnFluent host process is out of memory (dpm/dpm.c:10025 9362865600 bytes).nn==============================================================================Node 0: Process 309147: Received signal SIGSEGV.nn==============================================================================nn==============================================================================Node 999999: Process 305767: Received signal SIGSEGV.nn==============================================================================nn===============Message from the Cortex Process================================nnFatal error in one of the compute processes.nn==============================================================================n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000005f52680 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000067f1680 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000059e6680 ***n======= Backtrace: =========n======= Backtrace: =========n/usr/lib64/libc.so.6(+0x81489)[0x7f7f248f0489]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7f7f2466fece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7f7f258367bd]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7fdbe8500a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7f7f25848a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItem+0x67)[0x7fdbe8502487]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItemString+0x38)[0x7f50fb7d9b18]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItemString+0x38)[0x7f7f2584bb18]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyImport_Cleanup+0x127)[0x7fdbe8587767]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyImport_Cleanup+0x127)[0x7f7f258cf767]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x7fdbe85998de]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x7f7f258e18de]nnn
    • lei2019
      Subscriber
      tcocs018.hpc.wvu.edu-14869.shnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnnError: Invalid section id: %7:Ýøu000e:~Ïà9nError Object: #fnngzip: stdout: Broken pipennError: Error reading | gunzip -c 50-0kw-coal-tip-300k-25-2.040000.dat.gz.nError Object: #fnn==============================================================================nStack backtrace generated for process id 14709 on signal 11 :n1000000: fluent() [0x67f3b9]n1000000: /usr/lib64/libc.so.6(+0x36280) [0x7f2f9826c280]n1000000: /usr/lib64/libc.so.6(+0x39acd) [0x7f2f9826facd]n1000000: /usr/lib64/libc.so.6(+0x39bb7) [0x7f2f9826fbb7]n1000000: fluent(main+0x9c) [0x67f99c]n1000000: /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2f982583d5]n1000000: fluent() [0x5ed4fd]nPlease include this information with any bug report you file on this issue!n==============================================================================nn
    • YasserSelima
      Subscriber
      360 GB should be enough , but they are distributed between all nodes evenly. When you have particle tracking, the host requires memory enough to store the entire mesh. I mean the host is now storing the 15 millions paths ... n
    • YasserSelima
      Subscriber
      Node 999999 is the host ... and it is Compute Node 0 as well. That is why you get the error at this noden
    • lei2019
      Subscriber
      So the host node 0 (999999) storing 15 million paths alone? how to solve this issue? Thank you, Yassser n
    • lei2019
      Subscriber
      every time step (0.001s), 9405 particles are injected into the burner. Rosin-Rammler distribution is used to define the diameters of particles. the number of particles is determined by this method; is there any other ways to reduce the particle number?n
    • Rob
      Ansys Employee
      You can increase the particle time step to inject less frequently, but also look at why you have so many particles in the system. Why aren't they leaving the domain? The new h5 format in 2021R1 may help, and I had around 9-12M particles in a recent run with similar hardware fairly recently (128 cores spread over 4 boxes). n
    • lei2019
      Subscriber
      the burner is about 2 meter, and it takes about at least 3 seconds for particles to leave the domain. West virginia U HPC don't have the latest module of 2021R1.n
    • lei2019
      Subscriber
      particle time step is the same with iteration time step; I can't increase particle time step now because it may result in the unconvergence of the simulation.n
    • lei2019
      Subscriber
      Using shared memory for DPM with 80 cores, 1000GB nFilling Host Domain 1 [38.7291 sec]Warning: new_object_mt: Unable to allocate objects. Out of memorynn==============================================================================Node 999999: Process 421484: Received signal SIGSEGV.nn==============================================================================nMPI Application rank 32 killed before MPI_Finalize() with signal 11nMPI Application rank 77 exited before MPI_Finalize() with status 2n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000006943f70 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000063cef70 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000005217f70 ***n======= Backtrace: =========n======= Backtrace: =========n======= Backtrace: =========n/usr/lib64/libc.so.6(+0x81489)[0x7fd43a1b1489]n/usr/lib64/libc.so.6(+0x81489)[0x7f9d1e724489]n/usr/lib64/libc.so.6(+0x81489)[0x7fa92226e489]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7f9d1e4a3ece]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7fd439f30ece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7fd43b0f77bd]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7f9d1f66a7bd]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7fa921fedece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7fd43b109a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7f9d1f67ca27]n
    • Rob
      Ansys Employee
      Can the head node see all that RAM? n
    • lei2019
      Subscriber
      I use top, but it can't show RAM of Task.n
    • electroknit
      Subscriber
      ArrayRob,n If I'm understanding correctly here, Head Node in this case means the node where the ansys process is started? In the above example from Lei, that is the node with 756GB RAM. I have tried a variety of methods, spreading out the job so that there would be (for example) a smaller number of processes so that each MPI process would have 300+GB of ram available to it, but with similar results.n
    • lei2019
      Subscriber
      by using pbsnodes -a | less |egrep 487198 at head node: nsome info about a case I am running now,nnn   jobs = 0-39/487198.trcis002.hpc.wvu.edun   status = opsys=linux,uname=Linux tcocs020.hpc.wvu.edu 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64,sessions=6008,nsessions=1,nusers=1,idletime=19231116,totmem=108959308kb,availmem=69232560kb,physmem=98473552kb,ncpus=40,loadave=43.75,gres=,netload=53571989872582,state=free,varattr= ,cpuclock=Fixed,macaddr=20:67:7c:57:47:d4,version=6.1.3,rectime=1619104148,jobs=487198.trcis002.hpc.wvu.edun   jobs = 0-39/487198.trcis002.hpc.wvu.edun   status = opsys=linux,uname=Linux tcocs039.hpc.wvu.edu 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64,sessions=28820,nsessions=1,nusers=1,idletime=12870432,totmem=108959308kb,availmem=66161132kb,physmem=98473552kb,ncpus=40,loadave=46.52,gres=,netload=17351822237817,state=free,varattr= ,cpuclock=Fixed,macaddr=20:67:7c:57:48:40,version=6.1.3,rectime=1619104139,jobs=487198.trcis002.hpc.wvu.edu
    • Rob
      Ansys Employee
      head node is where node 0 is, you're mostly correct except it's possible to launch the GUI on one system and all the nodes, including host, elsewhere. n
    • electroknit
      Subscriber
      Gotcha. In this case, Lei is either launching the GUI on the node assigned by the system scheduler, or non-interactively by reading in the journal file.n
    • Rob
      Ansys Employee
      And there aren't any RAM limits on the system side, ie IT haven't set anything up? Looks like one for or n
    • electroknit
      Subscriber
      No limits in this case. I've also just installed 2021R1 as the University just upgraded the license server, and will see if that makes any sort of a difference.n
    • lei2019
      Subscriber
      H5 format seems working now and Fluent can load 7 GB data file; Thank you ,,.nn-----------------------------------------------------------------nreading 18245700 particles for injection injection-0.n locating particles on compute nodes...n using the new algorithm for particle relocation.n Relocated particles using cell ID.n all particles successfully re-located.nWarning: The time step size (0.001) in the session did not match the time step size in the data file (0.001), and has been overwritten by the value from the data file.nnParallel variables...nDone.n
    • Rob
      Ansys Employee
      Excellent. You should see a read/write speed up with the h5 format too. n
    • lei2019
      Subscriber
      yes. It is quite fast. Cheers.n
Viewing 28 reply threads
  • You must be logged in to reply to this topic.