Numpy Dot Memory Error
Contents |
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the
Numpy Memmap
company Business Learn more about hiring developers or posting ads with us Stack Overflow python memory error Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2
Pytables
million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up numpy.dot -> MemoryError, my_dot -> very slow, but works. Why? up vote 3 down vote favorite I am trying to scipy sparse compute the dot product of two numpy arrays sized respectively (162225, 10000) and (10000, 100). However, if I call numpy.dot(A, B) a MemoryError happens. I, then, tried to write my implementation: def slower_dot (A, B): """Low-memory implementation of dot product""" #Assuming A and B are of the right type and size R = np.empty([A.shape[0], B.shape[1]]) for i in range(A.shape[0]): for j in range(B.shape[1]): R[i,j] = np.dot(A[i,:], B[:,j]) return R and it works just fine, but is of course very slow. Any idea of 1) what is the reason behind this behaviour and 2) how I could circumvent / solve the problem? I am using Python 3.4.2 (64bit) and Numpy 1.9.1 on a 64bit equipped computer with 16GB of ram running Ubuntu 14.10. python arrays numpy share|improve this question edited Dec 27 '14 at 23:52 asked Dec 27 '14 at 15:02 tamzord 4781513 Just to clear that up up-front, are you running a 64-bit version of Python? –filmor Dec 27 '14 at 15:08 You might try np.einsum. I've used it in cases where a np.dot returns memory errors or bogs down with memory swapping. –hpaulj Dec 27 '14 at 22:24 Yes, it is a 64 bit version. –tamzord Dec 27 '14 at 23:52 add a comment| 2 Answers 2 active oldest votes up vote 1 down vote accepted I think the problem starts from the matrix A itself as a 16225 * 10000 size matrix already occupies about 12GB of memory if each element is a double precision floating point number. That together with how numpy creates temporary copies to do the dot operation will cause the error. The extra copies is because numpy uses the underlying BLAS operations for dot which needs the mat
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Numpy dot product MemoryError for small matrices up vote 1 down vote favorite I wanted to implement Singular http://stackoverflow.com/questions/27668462/numpy-dot-memoryerror-my-dot-very-slow-but-works-why Value Decomposition (SVD) as the collaborative filtering method for recommendation systems. I have this sparse_matrix, with rows representing users and columns representing items, and each matrix entry as the user-item rating. >>> type(sparse_matrix) scipy.sparse.csr.csr_matrix First I factorized this matrix using SVD: from scipy.sparse.linalg import svds u, s, vt = svds(sparse_matrix.asfptype(), k = 2) s_diag = np.diag(s) Then I make the prediction by taking the dot product of u, s_diag, and vt: >>> http://stackoverflow.com/questions/40081115/numpy-dot-product-memoryerror-for-small-matrices tmp = np.dot(u, s_diag) >>> pred = np.dot(tmp, vt) Traceback (most recent call last): File "
Sign in Pricing Blog Support Search GitHub This repository Watch 247 Star 3,463 Fork 1,755 numpy/numpy Code Issues 1,170 Pull requests 152 Projects 0 Wiki Pulse Graphs New issue numpy.dot out of memory https://github.com/numpy/numpy/issues/4062 when multiplying big matrix #4062 Closed grfo opened this Issue Nov 19, 2013 · 12 comments Projects None yet Labels None yet Milestone No milestone Assignees No one assigned 6 participants grfo http://cyrille.rossant.net/numpy-performance-tricks/ commented Nov 19, 2013 I have a 2000 by 1,000,000 matrix A and want to calculate the 2000 by 2000 matrix .> B = numpy.dot(A,A.T) but numpy just eats up all my memory, slows down memory error my whole computer and crashes after a couple of hours. I then rewrote the matrix multiplication to .> B = numpy.zeros(2000,2000) .> A.shape = (2000,10000,100) .> for M in numpy.rollaxis(A,2): .>.. B += numpy.dot(M,M.T) and it just runs fine in a couple of minutes. I don't see the reason numpy needs so much memory for a matrix multiplication. But I think at least it should not crash on that numpy dot memory problem. NumPy member seberg commented Nov 19, 2013 Got to ask, but are you sure you didn't transpose the wrong argument making the result 1 Million x 1 Million instead of 2000x2000? NumPy member seberg commented Nov 19, 2013 Ah, sorry. There is a point. The thing is that np.dot(A, A.T) is not a valid blas operation I think, because the order of A and A.T do not match, so one of them needs to be copied. So requiring that one copy (i.e. A.T.copy() instead of A.T) might already be eating your RAM. If this is not the problem, maybe check the blas you are linking, i.e. some openblas versions are apparently known to sometimes misbehave for large arrays. grfo commented Nov 19, 2013 This would explain the situation. The matrix A uses indeed over half of my memory. I would still put it on the wishlist. np.dot should divide a problem into subproblems if it'll not be able to deal with it at once. This should be easy to fix. Thanks! nouiz commented Nov 19, 2013 Hi, NumPy don't do the minimium amount of copy. For example, it was coping c contiguous input up to the new release of numpy 1.8. What version of numpy do
of millions of elements) and the performance of my code started to be disappointing. Then, through extensive line-by-line profiling, I discovered some subtleties that explain why some seemingly harmless lines of code can actually lead to major bottlenecks. Very often, a small trick allows to significantly improve the performance. Here is what I've learnt. These tips are intended to regular Numpy users rather than pure beginners. A new version of this post can be found here. ← previous next→ Please enable JavaScript to view the comments powered by Disqus. Cyrille Rossant, PhD CV data | software | neuroscience first.last@gmail.com Paris, France #neuroscience #python #dataviz #maths #gpu #opendata books Two books on Python for data science © Cyrille Rossant – Built with Pure Theme for Pelican