2016-04-03 31 views
8

Sono in esecuzione Tensor Flow versione 0.7.1, 64-bit GPU-enabled, installato con pip e su un PC con Ubuntu 14.04. Il mio problema è che Tensor Flow sta esaurendo la memoria durante la costruzione della mia rete, anche se sulla base dei miei calcoli, ci dovrebbe essere spazio sufficiente sulla mia GPU.Flusso tensore: esaurito la memoria che tenta di allocare

Di seguito è riportato un esempio minimo del mio codice, che è basato sul tutorial TENSOR Flow MNIST. La rete è una rete completamente connessa a due livelli e il numero di nodi nel livello nascosto è definito dalla variabile n. La dimensione del minibatch formazione è 1. Ecco il mio codice:

n = 23000 

mnist = read_data_sets('MINST_Data', one_hot=True) 
session = tf.InteractiveSession() 
x = tf.placeholder(tf.float32, [None, 784]) 
W1 = tf.Variable(tf.truncated_normal([784, n], stddev=0.1)) 
b1 = tf.Variable(tf.constant(0.1, shape=[n])) 
nn1 = tf.matmul(x, W1) + b1 
W2 = tf.Variable(tf.truncated_normal([n, 10], stddev=0.1)) 
b2 = tf.Variable(tf.constant(0.1, shape=[10])) 
nn2 = tf.matmul(nn1, W2) + b2 
y = tf.nn.softmax(nn2) 
y_ = tf.placeholder(tf.float32, [None, 10]) 
cross_entropy = -tf.reduce_sum(y_*tf.log(y)) 
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

init = tf.initialize_all_variables() 
sess = tf.Session() 
sess.run(init) 
for i in range(1000): 
    batch_xs, batch_ys = mnist.train.next_batch(1) 
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) 

Ora, se n <= 22000, allora la rete funziona bene. Tuttavia, se n >= 23000, ottengo il seguente errore:

W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:211] Ran out of memory trying to allocate 877.38MiB. See logs for memory state 
W tensorflow/core/kernels/cwise_ops_common.cc:56] Resource exhausted: OOM when allocating tensor with shape[10000,23000] 

Tuttavia, secondo i miei calcoli, ci dovrebbe essere un problema con la memoria. Il numero di parametri nella rete è la seguente:

First layer weights: 784 * n 
First layer biases: n 
Second layer weights: 10 * n 
Second layer biases: 10 
Total: 795n + 10 

Quindi, con n = 23000, e utilizzando i dati float32, la memoria totale necessaria per la rete dovrebbe pertanto essere 73,1 MB.

Ora, la mia scheda grafica è la NVIDIA GeForce GTX 780 Ti, che ha 3072 MB di memoria. Dopo aver trovato la mia scheda grafica, Tensor flusso stampa il seguente:

Total memory: 3.00GiB 
Free memory: 2.32GiB 

Quindi, non ci dovrebbe essere intorno a 2,32 GB di memoria disponibile, che è di gran lunga maggiore rispetto ai 73,1 MB calcolati in precedenza. La dimensione del minibatch è 1, quindi questo ha un effetto minimo. Perché ricevo questo errore?


Ho anche provato questo sul mio portatile, che ha una GPU NVIDIA GeForce GTX 880M. Qui, Tensor Flow legge Free memory: 7.60GiB. Eseguendo lo stesso codice di cui sopra, mi dà un errore di memoria intorno a n = 700,000, che equivale a 2,2 GB. Questo ha un po 'più senso, ed è significativamente superiore al punto in cui si rompe il mio codice PC. Tuttavia, mi stupisce ancora il motivo per cui non si avvicina al marchio da 7,6 GB.


L'uscita completa dalla Tensor di flusso, mentre l'esecuzione del codice di cui sopra sul mio PC, con n = 23000, è:

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 780 Ti 
major: 3 minor: 5 memoryClockRate (GHz) 1.0455 
pciBusID 0000:01:00.0 
Total memory: 3.00GiB 
Free memory: 2.32GiB 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 780 Ti, pci bus id: 0000:01:00.0) 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 780 Ti, pci bus id: 0000:01:00.0) 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 2.03GiB bytes. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0xb04720000 extends to 0xb86295000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (16384):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (32768):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (65536):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (524288): Total Chunks: 2, Chunks in use: 0 819.0KiB allocated for chunks. 390.6KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (134217728):  Total Chunks: 1, Chunks in use: 0 68.79MiB allocated for chunks. 29.91MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (268435456):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (536870912):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (1073741824): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (2147483648): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:431] Bin (4294967296): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:450] Bin for 877.38MiB was 1.00GiB, Chunk State: 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d239400 of size 80128 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7600 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d24cd00 of size 438528 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7500 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb1a3e3200 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb1a302800 of size 920064 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15d58800 of size 920064 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb08cf7500 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04736b00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d2b7f00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15e39200 of size 72128000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb08c16b00 of size 920064 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15c61500 of size 92160 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04736d00 of size 72128000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d2b8100 of size 72128000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15c4ad00 of size 92160 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04736a00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d2b7e00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7900 of size 400128 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04720200 of size 92160 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04736c00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb08cf7600 of size 72128000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb1a3e3300 of size 1810570496 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1c0c00 of size 92160 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb08c00300 of size 92160 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d2b8000 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7800 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04720100 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7700 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb04720000 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb0d1d7400 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb11781700 of size 72128000 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15c77d00 of size 256 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:465] Chunk at 0xb15c77e00 of size 920064 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:468]  Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 16 Chunks of size 256 totalling 4.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 1 Chunks of size 80128 totalling 78.2KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 5 Chunks of size 92160 totalling 450.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 1 Chunks of size 400128 totalling 390.8KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 1 Chunks of size 438528 totalling 428.2KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 4 Chunks of size 920064 totalling 3.51MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 5 Chunks of size 72128000 totalling 343.93MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:471] 1 Chunks of size 1810570496 totalling 1.69GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:475] Sum Total of in-use chunks: 2.03GiB 
W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:211] Ran out of memory trying to allocate 877.38MiB. See logs for memory state 
W tensorflow/core/kernels/cwise_ops_common.cc:56] Resource exhausted: OOM when allocating tensor with shape[10000,23000] 
W tensorflow/core/common_runtime/executor.cc:1102] 0x50f40e0 Compute status: Resource exhausted: OOM when allocating tensor with shape[10000,23000] 
    [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul, Variable_1/read)]] 
W tensorflow/core/common_runtime/executor.cc:1102] 0x3234d30 Compute status: Resource exhausted: OOM when allocating tensor with shape[10000,23000] 
    [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul, Variable_1/read)]] 
    [[Node: range_1/_13 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_97_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 
W tensorflow/core/common_runtime/executor.cc:1102] 0x3234d30 Compute status: Resource exhausted: OOM when allocating tensor with shape[10000,23000] 
    [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul, Variable_1/read)]] 
    [[Node: Cast/_11 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_96_Cast", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 
Traceback (most recent call last): 
    File "/home/jrowlay/Projects/Tensor_Flow_Tutorial/MNIST_CNN_Simple/memory_test.py", line 232, in <module> 
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 315, in run 
    return self._run(None, fetches, feed_dict) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 511, in _run 
    feed_dict_string) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 564, in _do_run 
    target_list) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 586, in _do_call 
    e.code) 
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[10000,23000] 
    [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul, Variable_1/read)]] 
    [[Node: range_1/_13 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_97_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 
Caused by op u'add', defined at: 
    File "/home/jrowlay/Projects/Tensor_Flow_Tutorial/MNIST_CNN_Simple/memory_test.py", line 215, in <module> 
    nn1 = tf.matmul(x, W1) + b1 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 468, in binary_op_wrapper 
    return func(x, y, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 44, in add 
    return _op_def_lib.apply_op("Add", x=x, y=y, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2040, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1087, in __init__ 
    self._traceback = _extract_stack() 
+0

Solo supposizioni ma può essere che il set di dati sia in qualche modo in memoria nella GPU? Prova a rimuovere alcuni dati dal set di dati e controlla di nuovo la memoria. Non dovrebbe essere il set di dati ma ... chi lo sa. – jorgemf

risposta

6

A giudicare dall'errore, tensorflow OOM'ed cercando di allocare una [10000, 23000] -sized tensore. Dato che 10.000 risulta essere il numero di esempi di solito nel set di test MNIST, assumerò che tu abbia un codice di valutazione che tenti di valutare l'intero set di test contemporaneamente. Solo per le attivazioni avresti bisogno di 10000 * (784 + n + 10) ~= 1GB, che di per sé non dovrebbe essere sufficiente per OOM. Ma c'è anche un tensore di 1,7 GB allocato per qualche motivo che è difficile da spiegare.

Per il caso sul laptop, mancano alcune variabili nel calcolo. Adam tracks the first and second moments for each variable quindi i 2,2 GB triplicano per diventare 6,6 GB. Aggiungi un sovraccarico per i gradienti che saranno in memoria e questo spiega che OOM.

Mi dispiace, questo non risponde completamente alla tua domanda, avrei aggiunto questo come commento ma non ho ancora la reputazione per questo.

1

Solo per vostra informazione. Ho avuto lo stesso errore sul mio Macbook Pro. Dopo aver chiuso poche altre applicazioni, il problema è scomparso. Ma ho ancora altri errori come:

Blockquote W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 214.51MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

Quindi, questo è un vero e proprio fuori di problema di memoria.

0

Incontrare lo stesso errore, basta riavviare il programma dal notebook jupyter, funziona correttamente. Ancora non trovi il motivo. Anche eseguire il singolo session = tf.InteractiveSession() viene visualizzato lo stesso errore. Spero che sia d'aiuto.