Python Memory Error Numpy Array
Contents |
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or numpy memory error zeros posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss how to solve memory error in python Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only
Python Memory Error Increase Memory
takes a minute: Sign up Python/Numpy MemoryError up vote 10 down vote favorite 7 Basically, I am getting a memory error in python when trying to perform an algebraic operation on a numpy matrix. The variable u, is a large matrix
Memory Error Pandas
of double (in the failing case its a 288x288x156 matrix of doubles. I only get this error in this huge case, but I am able to do this on other large matrices, just not this big). Here is the Python error: Traceback (most recent call last): File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc t perim erosion flattop\SwSim.py", line 121, in __init__ self.mainSimLoop() File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc t perim erosion flattop\SwSim.py", line 309, in mainSimLoop u = solver.solve_cg(u,b,tensors,param,fdHold,resid) # Solve the left hand memoryerror python si de of the equation Au=b with conjugate gradient method to approximate u File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc t perim erosion flattop\conjugate_getb.py", line 47, in solv e_cg u = u + alpha*p MemoryError u = u + alpha*p is the line of code that fails. alpha is just a double, while u and r are the large matrices described above (both of the same size). I don't know that much about memory errors especially in Python. Any insight/tips into solving this would be very appreciated! Thanks python memory numpy scipy share|improve this question edited Mar 24 '15 at 23:45 ali_m 28.3k662114 asked Nov 30 '10 at 21:03 tylerthemiler 85331435 add a comment| 3 Answers 3 active oldest votes up vote 23 down vote accepted Rewrite to p *= alpha u += p and this will use much less memory. Whereas p = p*alpha allocates a whole new matrix for the result of p*alpha and then discards the old p; p*= alpha does the same thing in place. In general, with big matrices, try to use op= assignment. share|improve this answer answered Nov 30 '10 at 22:22 luispedro 4,29932246 This is very helpful, I didn't know this. –tylerthemiler Nov 30 '10 at 22:36 add a comment| up vote 5 down vote Another tip I have found to avoid memory errors is to manually control garbage collection. When objects are deleted or go our of scope, the memory used for
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more
Python Sparse
about Stack Overflow the company Business Learn more about hiring developers or posting gc.collect python ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack python clear memory Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Processing a very very big data set in python - memory http://stackoverflow.com/questions/4318615/python-numpy-memoryerror error up vote 6 down vote favorite 2 I'm trying to process data obtained from a csv file using csv module in python. there are about 50 columns & 401125 rows in this. I used the following code chunk to put that data into a list csv_file_object = csv.reader(open(r'some_path\Train.csv','rb')) header = csv_file_object.next() data = [] for row in csv_file_object: data.append(row) I can get length of this list using len(data) http://stackoverflow.com/questions/14551451/processing-a-very-very-big-data-set-in-python-memory-error & it returns 401125. I can even get each individual record by calling list indices. But when I try to get the size of the list by calling np.size(data) (I imported numpy as np) I get the following stack trace. MemoryError Traceback (most recent call last) in () ----> 1 np.size(data) C:\Python27\lib\site-packages\numpy\core\fromnumeric.pyc in size(a, axis) 2198 return a.size 2199 except AttributeError: -> 2200 return asarray(a).size 2201 else: 2202 try: C:\Python27\lib\site-packages\numpy\core\numeric.pyc in asarray(a, dtype, order) 233 234 """ --> 235 return array(a, dtype, copy=False, order=order) 236 237 def asanyarray(a, dtype=None, order=None): MemoryError: I can't even divide that list into a multiple parts using list indices or convert this list into a numpy array. It give this same memory error. how can I deal with this kind of big data sample. Is there any other way to process large data sets like this one. I'm using ipython notebook in windows 7 professional. python numpy python-2.7 data-analysis share|improve this question asked Jan 27 '13 at 19:45 maheshakya 74331527 Can you process the file line by line as suggested in a related answer? –gauden Jan 27 '13 at 19:49 Do you need to have all rows in memory at once? Are you just doing one-of
for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About http://codereview.stackexchange.com/questions/41316/python-numpy-optimize-module-to-handle-big-file Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Code Review Questions Tags Users Badges Unanswered Ask Question _ Code Review Stack Exchange is http://www.draketo.de/english/python-memory-numpy-list-array a question and answer site for peer programmer code reviews. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best memory error answers are voted up and rise to the top Python, numpy, optimize module to handle big file up vote 7 down vote favorite 2 With the help of some people here and there, I wrote the module below which works very well if the file targets.csv is not too big. As soon as this file exceeds 100 MB, a MemoryError is returned. The problem is that python memory error I used numpy to make the computations faster (and they became really fast compared to the previous versions of this module), and I think numpy requires to load the entire file into memory. However, there should be a way to handle the MemoryError that happens there: targets = np.array([(float(X), float(Y), float(Z)) for X, Y, Z in csv.reader(open('targets.csv'))]) Any ideas? Let me know if you need more details (files, etc.) import csv import numpy as np import scipy.spatial import cv2 import random """loading files""" points = np.array([(int(R), int(G), int(B), float(X), float(Y), float(Z)) for R, G, B, X, Y, Z in csv.reader(open('colorlist.csv'))]) # load X,Y,Z coordinates of 'points' in a np.array print "colorlist loaded" targets = np.array([(float(X), float(Y), float(Z)) for X, Y, Z in csv.reader(open('targets.csv'))]) # load the XYZ target values in a np.array print "targets loaded" img = cv2.imread("MAP.tif", -1) height, width = img.shape total = height * width # load dimensions of tif image print "MAP loaded" ppm = file("walladrien.ppm", 'w') ppm.write("P3" + "\n" + str(height) + " " + str(width) +"\n" + "255" + "\n") # write PPM file header """doing geometry""" tri = scipy.spatial.Delaunay(points[:,[3,4,5]], furthest_site=False) # True makes an almost BW picture # Delaunay triangulation indic
Table of Contents 1. Intro 2. The test 2.1. Numpy 2.2. Native lists 2.3. Array module 3. The results 4. Summary Intro We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure. If you have such a need, there’s always a first stop: Fire up the interpreter and try it out. The test We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values. All our tests are done in Python3. Numpy For numpy we just create an array of random values cast to floats: import numpy as np a = np.array(np.random.random((100, 100, 10000)), dtype="float") Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy. Native lists For the native lists, we use the same array, but convert it to a list of lists of lists: import numpy as np a = [[[float(i) for i in j] for j in k] for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))] Array module Instead of using the full-blown numpy, we can also turn the inner list into an array. import numpy as np a = [[array.array("d", [float(i) for i in j]) for j in k] for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))] The results With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4. Using an inner array (via array module) instead of the innermost list provides roughly the same gains. I would have expected factor 3: The value plus a pointer to the next and to the previous entry. The details are in the following table. Table 1: Memory requirement of different ways to store values in Python to