-
-
March 16, 2021 at 3:37 pm
lei2019
SubscriberSince the data file is big, I request 150GB memory in HPC. but it still has error. Pls give me some suggestions.
March 17, 2021 at 3:43 amKeyur Kanade
Ansys EmployeeIs it data file of 23 GB?nCan you please check by increasing memory? nAre you able to read any small data file with 150GB memory?nPlease go through help manual for more details nRegards,nKeyurnHow to access Ansys Online Help DocumentnHow to show full resolution imagenGuidelines on the Student CommunitynHow to use Google to search within Ansys Student CommunitynApril 18, 2021 at 2:22 amlei2019
SubscriberThank you, Kanade.nyes. the data file is 23 GB. (.dat file)nyes. I can read smaller files; since I run LES, I save the file automatically. Fluent could read 5 gb .dat.gz file. but if .dat.gz file greater than 6gb, then I will receive this error.nI post my HPC staff's reply here: nLei, I have been unable to successfully read the 20+GB .dat files. I made a very simple journal file that would read the .cas file and then one fo the .dat files. No matter how I ran it, single-threaded, multi-threaded, etc. It would fail with something along the lines of: Error: Invalid section id: �9u000bError Object: #f gzip: stdout: Broken pipe When monitoring the jobs, I would see them use as much as 100GB of RAM before exiting, but they were on a node where I had requested 512gb of RAM, so memory pressure was not an issue. I think at this point you may need to consult with your colleagues or with ANSYS to determine why these files cannot be read.n----------------------------------nThe system info isnnnI also submit a request in customer port: 11067416291. pls help me on this issue. it has been bugging us for more than a year. my colleague's post is attached.nhttps://forum.ansys.com/discussion/15214/error-invalid-section-id-error-object/p1n
April 19, 2021 at 5:20 amKeyur Kanade
Ansys EmployeeThere is invalid section id errro. This is somewhat generic error. nPlease check if you can read this file in serial mode. nAlso please check in which version this file is created.nApril 19, 2021 at 3:54 pmRob
Ansys EmployeeIs the file being saved locally or onto a network drive? Given the size it's possible data packet loss is responsible for the problem. nApril 20, 2021 at 1:10 pmlei2019
SubscriberThank you Kanade and Rob. The module I use is module load ansys/fluids_19.2. So the version is R19.2.nWhen I can't read it in 40 cores, somehow I can read dat.gz file with 6GB in 80 cores. But I can't request more CPU resource; 40 cores to run two weeks is my quota. Now I have to run the case day by day in order to request 80 cores. nThe case has run on HPC, and files are saved on HPC; I think it is local drive.nSince I doing coal combustion with DPM model, the number of particles at flow time 1.6 s is about 15 million. each time step, 9400 particles are injected into the burner. it is one of the reasons that file size keeps growing.April 20, 2021 at 2:41 pmRob
Ansys EmployeeI wonder if you've run out of RAM, how much RAM is available per core? If you write out unzipped you can sometimes reduce the RAM need, but finish up with very big files. nApril 20, 2021 at 6:16 pmApril 20, 2021 at 8:38 pmlei2019
Subscribernnreading 19186200 particles for injection injection-0.nnFluent host process is out of memory (dpm/dpm.c:10025 9362865600 bytes).nn==============================================================================Node 0: Process 309147: Received signal SIGSEGV.nn==============================================================================nn==============================================================================Node 999999: Process 305767: Received signal SIGSEGV.nn==============================================================================nn===============Message from the Cortex Process================================nnFatal error in one of the compute processes.nn==============================================================================n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000005f52680 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000067f1680 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000059e6680 ***n======= Backtrace: =========n======= Backtrace: =========n/usr/lib64/libc.so.6(+0x81489)[0x7f7f248f0489]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7f7f2466fece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7f7f258367bd]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7fdbe8500a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7f7f25848a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItem+0x67)[0x7fdbe8502487]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItemString+0x38)[0x7f50fb7d9b18]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItemString+0x38)[0x7f7f2584bb18]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyImport_Cleanup+0x127)[0x7fdbe8587767]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyImport_Cleanup+0x127)[0x7f7f258cf767]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x7fdbe85998de]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x7f7f258e18de]nnn
April 20, 2021 at 10:53 pmlei2019
Subscribertcocs018.hpc.wvu.edu-14869.shnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnsched_setaffinity() call failed: Invalid argumentnnError: Invalid section id: %7:Ýøu000e:~Ïà9nError Object: #fnngzip: stdout: Broken pipennError: Error reading | gunzip -c 50-0kw-coal-tip-300k-25-2.040000.dat.gz.nError Object: #fnn==============================================================================nStack backtrace generated for process id 14709 on signal 11 :n1000000: fluent() [0x67f3b9]n1000000: /usr/lib64/libc.so.6(+0x36280) [0x7f2f9826c280]n1000000: /usr/lib64/libc.so.6(+0x39acd) [0x7f2f9826facd]n1000000: /usr/lib64/libc.so.6(+0x39bb7) [0x7f2f9826fbb7]n1000000: fluent(main+0x9c) [0x67f99c]n1000000: /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2f982583d5]n1000000: fluent() [0x5ed4fd]nPlease include this information with any bug report you file on this issue!n==============================================================================nnApril 21, 2021 at 10:26 amYasserSelima
Subscriber360 GB should be enough , but they are distributed between all nodes evenly. When you have particle tracking, the host requires memory enough to store the entire mesh. I mean the host is now storing the 15 millions paths ... nApril 21, 2021 at 10:31 amYasserSelima
SubscriberNode 999999 is the host ... and it is Compute Node 0 as well. That is why you get the error at this nodenApril 21, 2021 at 2:24 pmlei2019
SubscriberSo the host node 0 (999999) storing 15 million paths alone? how to solve this issue? Thank you, Yassser nApril 21, 2021 at 2:44 pmlei2019
Subscriberevery time step (0.001s), 9405 particles are injected into the burner. Rosin-Rammler distribution is used to define the diameters of particles. the number of particles is determined by this method; is there any other ways to reduce the particle number?nApril 21, 2021 at 3:01 pmRob
Ansys EmployeeYou can increase the particle time step to inject less frequently, but also look at why you have so many particles in the system. Why aren't they leaving the domain? The new h5 format in 2021R1 may help, and I had around 9-12M particles in a recent run with similar hardware fairly recently (128 cores spread over 4 boxes). nApril 21, 2021 at 4:52 pmlei2019
Subscriberthe burner is about 2 meter, and it takes about at least 3 seconds for particles to leave the domain. West virginia U HPC don't have the latest module of 2021R1.nApril 21, 2021 at 5:03 pmlei2019
Subscriberparticle time step is the same with iteration time step; I can't increase particle time step now because it may result in the unconvergence of the simulation.nApril 21, 2021 at 6:43 pmlei2019
SubscriberUsing shared memory for DPM with 80 cores, 1000GB nFilling Host Domain 1 [38.7291 sec]Warning: new_object_mt: Unable to allocate objects. Out of memorynn==============================================================================Node 999999: Process 421484: Received signal SIGSEGV.nn==============================================================================nMPI Application rank 32 killed before MPI_Finalize() with signal 11nMPI Application rank 77 exited before MPI_Finalize() with status 2n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000006943f70 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x00000000063cef70 ***n*** Error in `/shared/software/ansys/v192/fluent/fluent19.2.0/lnamd64/3d_node/fluent_mpi.19.2.0': free(): corrupted unsorted chunks: 0x0000000005217f70 ***n======= Backtrace: =========n======= Backtrace: =========n======= Backtrace: =========n/usr/lib64/libc.so.6(+0x81489)[0x7fd43a1b1489]n/usr/lib64/libc.so.6(+0x81489)[0x7f9d1e724489]n/usr/lib64/libc.so.6(+0x81489)[0x7fa92226e489]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7f9d1e4a3ece]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7fd439f30ece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7fd43b0f77bd]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x8a7bd)[0x7f9d1f66a7bd]n/shared/software/ansys/v192/fluent/fluent19.2.0/multiport/mpi/lnamd64/ibmmpi/lib/linux_amd64/libmpi.so.1(free+0x2e)[0x7fa921fedece]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7fd43b109a27]n/shared/software/ansys/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x7f9d1f67ca27]n
April 22, 2021 at 12:26 pmRob
Ansys EmployeeCan the head node see all that RAM? nApril 22, 2021 at 1:41 pmApril 22, 2021 at 3:08 pmelectroknit
SubscriberArrayRob,n If I'm understanding correctly here, Head Node in this case means the node where the ansys process is started? In the above example from Lei, that is the node with 756GB RAM. I have tried a variety of methods, spreading out the job so that there would be (for example) a smaller number of processes so that each MPI process would have 300+GB of ram available to it, but with similar results.nApril 22, 2021 at 3:13 pmlei2019
Subscriberby using pbsnodes -a | less |egrep 487198 at head node: nsome info about a case I am running now,nnn jobs = 0-39/487198.trcis002.hpc.wvu.edun status = opsys=linux,uname=Linux tcocs020.hpc.wvu.edu 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64,sessions=6008,nsessions=1,nusers=1,idletime=19231116,totmem=108959308kb,availmem=69232560kb,physmem=98473552kb,ncpus=40,loadave=43.75,gres=,netload=53571989872582,state=free,varattr= ,cpuclock=Fixed,macaddr=20:67:7c:57:47:d4,version=6.1.3,rectime=1619104148,jobs=487198.trcis002.hpc.wvu.edun jobs = 0-39/487198.trcis002.hpc.wvu.edun status = opsys=linux,uname=Linux tcocs039.hpc.wvu.edu 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64,sessions=28820,nsessions=1,nusers=1,idletime=12870432,totmem=108959308kb,availmem=66161132kb,physmem=98473552kb,ncpus=40,loadave=46.52,gres=,netload=17351822237817,state=free,varattr= ,cpuclock=Fixed,macaddr=20:67:7c:57:48:40,version=6.1.3,rectime=1619104139,jobs=487198.trcis002.hpc.wvu.edu
April 22, 2021 at 3:20 pmRob
Ansys Employeehead node is where node 0 is, you're mostly correct except it's possible to launch the GUI on one system and all the nodes, including host, elsewhere. nApril 22, 2021 at 3:35 pmelectroknit
SubscriberGotcha. In this case, Lei is either launching the GUI on the node assigned by the system scheduler, or non-interactively by reading in the journal file.nApril 22, 2021 at 4:01 pmRob
Ansys EmployeeAnd there aren't any RAM limits on the system side, ie IT haven't set anything up? Looks like one for or nApril 22, 2021 at 4:15 pmelectroknit
SubscriberNo limits in this case. I've also just installed 2021R1 as the University just upgraded the license server, and will see if that makes any sort of a difference.nApril 23, 2021 at 3:15 pmlei2019
SubscriberH5 format seems working now and Fluent can load 7 GB data file; Thank you ,,.nn-----------------------------------------------------------------nreading 18245700 particles for injection injection-0.n locating particles on compute nodes...n using the new algorithm for particle relocation.n Relocated particles using cell ID.n all particles successfully re-located.nWarning: The time step size (0.001) in the session did not match the time step size in the data file (0.001), and has been overwritten by the value from the data file.nnParallel variables...nDone.nApril 23, 2021 at 3:40 pmRob
Ansys EmployeeExcellent. You should see a read/write speed up with the h5 format too. nApril 23, 2021 at 3:44 pmlei2019
Subscriberyes. It is quite fast. Cheers.nViewing 28 reply threads- You must be logged in to reply to this topic.
Ansys Innovation SpaceEarth Rescue – An Ansys Online Series
The climate crisis is here. But so is the human ingenuity to fight it. Earth Rescue reveals what visionary companies are doing today to engineer radical new ideas in the fight against climate change. Click here to watch the first episode.
Ansys Blog
Subscribe to the Ansys Blog to get great new content about the power of simulation delivered right to your email on a weekly basis. With content from Ansys experts, partners and customers you will learn about product development advances, thought leadership and trends and tips to better use Ansys tools. Sign up here.
Trending discussions- Suppress Fluent to open with GUI while performing in journal file
- Floating point exception in Fluent
- What are the differences between CFX and Fluent?
- Heat transfer coefficient
- Getting graph and tabular data from result in workbench mechanical
- The solver failed with a non-zero exit code of : 2
- Difference between K-epsilon and K-omega Turbulence Model
- Time Step Size and Courant Number
- Mesh Interfaces in ANSYS FLUENT
- error in cfd post
Top Contributors-
2524
-
2066
-
1279
-
1096
-
459
Top Rated Tags© 2023 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.
-