Numpy Loadtxt Memory Error
Contents |
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or
Python Out Of Memory Error
posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss pandas large csv Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only
Numpy Memoryerror
takes a minute: Sign up numpy Loadtxt function seems to be consuming too much memory up vote 5 down vote favorite 3 When I load an array using numpy.loadtxt, it seems to take too much memory. E.g. a = numpy.zeros(int(1e6)) pandas.read_csv memory error causes an increase of about 8MB in memory (using htop, or just 8bytes*1million \approx 8MB). On the other hand, if I save and then load this array numpy.savetxt('a.csv', a) b = numpy.loadtxt('a.csv') my memory usage increases by about 100MB! Again I observed this with htop. This was observed while in the iPython shell, and also while stepping through code using Pdb++. Any idea what's going on here? After reading jozzas's answer, I realized that if I know ahead of time the numpy load csv array size, there is a much more memory efficient way to do things if say 'a' was an mxn array: b = numpy.zeros((m,n)) with open('a.csv', 'r') as f: reader = csv.reader(f) for i, row in enumerate(reader): b[i,:] = numpy.array(row) python numpy out-of-memory share|improve this question edited May 26 '15 at 14:50 mob 294 asked Oct 27 '11 at 0:48 Ian Langmore 80011017 1 Do you need to load and save text files, rather than using numpy's binary formats? There's a lot of potentially unnecessary processing going on to parse the text file and converting strings back into the correct data types. –jozzas Oct 27 '11 at 1:00 Yes, I work with people who only use text files. –Ian Langmore Oct 27 '11 at 1:16 a is an array of integers. b is an array of strings. –Falmarri Oct 27 '11 at 2:17 @Falmarri a is an array of floats, 64 bit if on a 64 bit machine. –jozzas Oct 27 '11 at 2:26 @Falmarri b is an array of floats, e.g. type(b[0]) returns
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business
Numpy Fromfile
Learn more about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation numpy read csv Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like
Csv Reader Python
you, helping each other. Join them; it only takes a minute: Sign up Python MemoryError or ValueError in np.loadtxt and iter_loadtxt up vote 2 down vote favorite My starting point was a problem with NumPy's function loadtxt: http://stackoverflow.com/questions/7910591/numpy-loadtxt-function-seems-to-be-consuming-too-much-memory X = np.loadtxt(filename, delimiter=",") that gave a MemoryError in np.loadtxt(..). I googled it and came to this question on StackOverflow. That gave the following solution: def iter_loadtxt(filename, delimiter=',', skiprows=0, dtype=float): def iter_func(): with open(filename, 'r') as infile: for _ in range(skiprows): next(infile) for line in infile: line = line.rstrip().split(delimiter) for item in line: yield dtype(item) iter_loadtxt.rowlength = len(line) data = np.fromiter(iter_func(), dtype=dtype) data = data.reshape((-1, iter_loadtxt.rowlength)) return data data = iter_loadtxt('your_file.ext') So I tried that, http://stackoverflow.com/questions/26558278/python-memoryerror-or-valueerror-in-np-loadtxt-and-iter-loadtxt but then encountered the following error message: > data = data.reshape((-1, iter_loadtext.rowlength)) > ValueError: total size of new array must be unchanged Then I tried to add the number of rows and maximum number of cols to the code with the code fragments down here, which I partly got from another question and partly wrote myself: num_rows = 0 max_cols = 0 with open(filename, 'r') as infile: for line in infile: num_rows += 1 tmp = line.split(",") if len(tmp) > max_cols: max_cols = len(tmp) def iter_func(): #didn't change data = np.fromiter(iter_func(), dtype=dtype, count=num_rows) data = data.reshape((num_rows, max_cols)) But this still gave the same error message though I thought it should have been solved. On the other hand I'm not sure if I'm calling data.reshape(..) in the correct manner. I commented the rule where date.reshape(..) is called to see what happened. That gave this error message: > ValueError: need more than 1 value to unpack Which happened at the first point where something is done with X, the variable where this problem is all about. I know this code can work on the input files I got, because I saw it in use with them. But I can't find why I can't solve this problem. My reasoning goes as far as that because I'm using a 32-bit Python version (on a 64-bit Windows machine), som
for a quick overview of the site Help Center Detailed answers to any questions you might have Meta http://codereview.stackexchange.com/questions/41316/python-numpy-optimize-module-to-handle-big-file Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Code Review Questions Tags Users Badges Unanswered Ask Question _ Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only memory error takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Python, numpy, optimize module to handle big file up vote 7 down vote favorite 2 With the help of some people here and there, I wrote the module below numpy loadtxt memory which works very well if the file targets.csv is not too big. As soon as this file exceeds 100 MB, a MemoryError is returned. The problem is that I used numpy to make the computations faster (and they became really fast compared to the previous versions of this module), and I think numpy requires to load the entire file into memory. However, there should be a way to handle the MemoryError that happens there: targets = np.array([(float(X), float(Y), float(Z)) for X, Y, Z in csv.reader(open('targets.csv'))]) Any ideas? Let me know if you need more details (files, etc.) import csv import numpy as np import scipy.spatial import cv2 import random """loading files""" points = np.array([(int(R), int(G), int(B), float(X), float(Y), float(Z)) for R, G, B, X, Y, Z in csv.reader(open('colorlist.csv'))]) # load X,Y,Z coordinates of 'points' in a np.array print "colorlist loaded" targets = np.array([(float(X), float(Y), float(Z)) for X, Y, Z in csv.reader(open('targets.csv'))]) # load the XYZ target values in a np.array print "targets loaded" img = cv2.imread("MAP.tif", -1) height, width =