I'm running the following code in an ipython
session:Perché len (<a list object>) so slow?
# This call is slow, but that is expected. (It loads 3 GB of data.)
In [3]: arc, arc_sub, upls, go = foo_mod.ready_set()
# This call is also slow, as `upls` is huge.
In [4]: upls = list(upls)
# This call is slow in meatspace, but `%timeit` doesn't notice!
In [5]: %timeit -n1 -r1 len(upls)
1 loops, best of 1: 954 ns per loop
%timeit
is straight-up lying here. With or without the %timeit
, the command takes upwards of 10s to actually run. Only the first time, however; subsequent calls to len
are quick.
Even time.time()
sings a similar tune:
In [5]: import time
In [6]: s = time.time(); len_ = len(upls); e = time.time()
In [7]: e - s
Out[7]: 7.104873657226562e-05
But it took seconds in the real world for In [6]
to actually complete. I just don't seem to be able to capture where the actual time is being spent!
There's nothing terribly unusual about the list, aside from it's huge: it's a real list
; it holds ~¼ billion bson.ObjectId
objects. (Prior to the list()
call, it's a set
object; that call is also slow, but that makes sense; list(<set instance>)
is O(n), and my set is huge.)
Edit re GC
If I run gc.set_debug(gc.DEBUG_STATS)
just prior to ready_set
, which itself is a slow call, I see tons of GC cycles. This is expected. gen3 grows:
gc: objects in each generation: 702 701 3289802
gc: done, 0.0000s elapsed.
gc: collecting generation 0...
gc: objects in each generation: 702 1402 3289802
gc: done, 0.0000s elapsed.
gc: collecting generation 0...
gc: objects in each generation: 702 2103 3289802
Unfortunately the console outputs make this runtime of this impossibly slow. If I instead delay the gc.set_debug
call until just after ready_set
, I don't see any GC cycles, but gc.get_count()
claims the generations are tiny:
In [6]: gc.get_count()
Out[6]: (43, 1, 193)
In [7]: len(upls)
Out[7]: 125636395
(but why/how is get_count
less objects than what's in the list?; they're definitely all unique, since they just went through a set
…) The fact that involving gc
in the code makes len
speedy leads me to believe I'm paused for a collect-the-world.
(Versions, just in case:
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
IPython 3.2.0 -- An enhanced Interactive Python.
)
Questo è strano. Succede la stessa cosa nel normale interprete interattivo? Gli elenchi – user2357112
conoscono le loro dimensioni. una volta in una lista, len (thelist) è O (1) –
@ChadS. Perché le liste conoscano le loro dimensioni, devono calcolarle a un certo punto. Penso che il caching avvenga in prima chiamata a 'len()'. – freakish