Python Cpickle Memory Error
Contents |
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn
Pickle Load Memory Error
more about Stack Overflow the company Business Learn more about hiring developers or klepto python posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community streaming-pickle Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Pickle dump huge file without memory error up vote 6
Python Pickle Memory Usage
down vote favorite 3 I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle Dictionary object at Dictionary.txt. The problem is, is that everytime that I run the program it pulls in the Dictionary.txt, turns it into a dictionary object, makes it's edits and overwrites Dictionary.txt. This is
Pickle Protocol
pretty memory intensive as the Dictionary.txt is 123 MB. When I dump I am getting the MemoryError, everything seems fine when I pull it in.. Is there a better (more efficient) way of doing the edits? (Perhaps w/o having to overwrite the entire file everytime) Is there a way that I can invoke garbage collection (through gc module)? (I already have it auto-enabled via gc.enable()) I know that besides readlines() you can read line-by-line. Is there a way to edit the dictionary incrementally line-by-line when I already have a fully completed Dictionary object File in the program. Any other solutions? Thank you for your time. python memory file-io pickle share|improve this question asked Jul 7 '13 at 14:31 user2543682 3313 There are a few compressive and other libraries. Personally, I like dill and H5Py for large objects. If you are using scikit learn and have to use a model base on dictionary, perhaps you could use joblib as well (only really for these models). –Andrew Scott Evans Nov 5 '15 at 22:44 add a comment| 6 Answers 6 active oldest votes up vote 3 down vote I am the author of a package called klepto (and also the author of
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of cpickle vs pickle this site About Us Learn more about Stack Overflow the company Business Learn
Pip Install Cpickle
more about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question joblib dump x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Memory http://stackoverflow.com/questions/17513036/pickle-dump-huge-file-without-memory-error Error when using Pickle in Anaconda up vote 1 down vote favorite The following code: import cPickle as pickle with open(r"file.pkl", "rb") as fid: data = pickle.load(fid) returns File "
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more http://stackoverflow.com/questions/23261598/cpickle-load-in-python-consumes-a-large-memory about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges https://github.com/waf-project/waf/issues/1330 Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up cPickle.load() in python consumes a large memory up vote 0 down vote favorite 1 I have a large dictionary whose structure looks like: dcPaths = {'id_jola_001': CPath instance} memory error where CPath is a self-defined class: class CPath(object): def __init__(self): # some attributes self.m_dAvgSpeed = 0.0 ... # a list of CNode instance self.m_lsNodes = [] where m_lsNodes is a list of CNode: class CNode(object): def __init__(self): # some attributes self.m_nLoc = 0 # a list of Apps self.m_lsApps = [] Here, m_lsApps is a list of CApp, which is another self-defined class: class CApp(object): def __init__(self): # some attributes self.m_nCount= 0 self.m_nUpPackets = 0 I serialize this python cpickle memory dictionary by using cPickle: def serialize2File(strFileName, strOutDir, obj): if len(obj) != 0: strOutFilePath = "%s%s" % (strOutDir, strFileName) with open(strOutFilePath, 'w') as hOutFile: cPickle.dump(obj, hOutFile, protocol=0) return strOutFilePath else: print("Nothing to serialize!") It works fine and the size of serialized file is about 6.8GB. However, when I try to deserialize this object: def deserializeFromFile(strFilePath): obj = 0 with open(strFilePath) as hFile: obj = cPickle.load(hFile) return obj I find it consumes more than 90GB memory and takes a long time. why would this happen? Is there any way I could optimize this? BTW, I'm using python 2.7.6 python memory pickle share|improve this question edited Apr 24 '14 at 7:26 asked Apr 24 '14 at 6:40 Jason Yang 130112 how much memory did the process consume before it dumped the data to the pickle? –WeaselFox Apr 24 '14 at 6:47 It may depend on the way the stored classes are defined, so it will help if you add the code. –bereal Apr 24 '14 at 6:49 Are you sure it's the deserialization? Serialization is known to take up quite a bit more memory to handle cyclic references. –KillianDS Apr 24 '14 at 6:53 1. Can you tell us: of which type are the most used objects? 2. Could you dump it with the lowest protocol and show/link some smaller chunks? 3. Do you use
Sign in Pricing Blog Support Search GitHub This repository Watch 38 Star 412 Fork 126 waf-project/waf Code Issues 8 Pull requests 1 Projects 0 Pulse Graphs New issue Out of memory during cPickle.dumps() #1330 Closed GoogleCodeExporter opened this Issue Apr 3, 2015 · 4 comments Projects None yet Labels auto-migrated Type-Other Milestone No milestone Assignees No one assigned 1 participant GoogleCodeExporter commented Apr 3, 2015 >> operating system: Windows 7 (64-bit) >> python version: 2.7.1 (32-bit) >> waf version (or svn revision): 1.7.10 >> observed output and expected output: Traceback (most recent call last): File "submodules\external\tools\waf\waflib\Scripting.py", line 138, in waf_entry_point run_commands() File "submodules\external\tools\waf\waflib\Scripting.py", line 232, in run_commands ctx = run_command(cmd_name) File "submodules\external\tools\waf\waflib\Scripting.py", line 219, in run_command ctx.execute() File "submodules\external\tools\waf\waflib\Scripting.py", line 576, in execute return execute_method(self) File "submodules\external\tools\waf\waflib\Build.py", line 251, in execute self.execute_build() File "submodules\external\tools\waf\waflib\Build.py", line 273, in execute_build self.compile() File "submodules\external\tools\waf\waflib\Build.py", line 369, in compile self.store() File "submodules\external\tools\waf\waflib\extras\relocation.py", line 24, in store old1(self) File "submodules\external\tools\waf\waflib\Utils.py", line 669, in f ret = fun(*k, **kw) File "submodules\external\tools\waf\waflib\Build.py", line 334, in store x = cPickle.dumps(data, -1) MemoryError: out of memory >> how to reproduce the problem? This happens after building our large project many times. We have thousands of compiled files and a few variants, so the pickle file can grow to 100MB. This error can crop up with the file anywhere between 50-100MB, it isn't consistent. Once in this state, every build will fail with the error until a clean build is done, or the pickle file is deleted. In commit 5c6d626048c1613450b00eff23dd8beaddd4e5ef the store() function was changed from having cPickle dump directly to a file to dumping to a string which is then written to a file. If I undo that change then this error goes away. Original issue reported on code.google.com by dj88we...@gmail.com on 21 Jun 2013 at 2:48 GoogleCodeExporter added auto-migrated Type-Other labels Apr 3, 2015 GoogleCodeExporter commented Apr 3, 2015 Sorry, I pasted the wrong commit id. The change that caused problems is: 7e