-
TensorFlow GPU Memory error - 절대적으로 GPU 메모리가 부족할 때 나는 에러인공지능(AI)/TensorFlow 2020. 3. 9. 18:02
이미지 사이즈가 큰 데이터로 딥러닝 트레이닝을 하려고 했는데 에러가 나고 안됐다.
절대적으로 GPU 메모리가 부족하다고 결론을 내렸다.
배치 사이즈가 1이므로 배치 크기를 더 줄일 수도 없었다. 이미지 크기를 잘게 잘라서 했더니 실행이 되었다.
아래와 같은 에러 메시지가 나왔었다.
에러 로그 메시지 일부-------------------------------------------------------------
아래처럼 별 에러 없이 메모리를 계속 할당하다가 결국 메모리가 부족해서 에러가 난다고 말하는 것 같은 메시지가 보였다.
2020-02-19 00:42:50.537629: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 1176764416 totalling 2.19GiB
2020-02-19 00:42:50.537650: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 3.67GiB
2020-02-19 00:42:50.537674: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 5740167168
InUse: 3937463552
MaxInUse: 4687955968
NumAllocs: 4401
MaxAllocSize: 2199650304
2020-02-19 00:42:50.537790: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***************************________________****_________************************* *****************__
2020-02-19 00:42:50.537834: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at image_ops.cc:86 : Resource exhausted: OOM when allocating tensor wi th shape[64,67,67,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-02-19 00:42:50.540794: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x149c7a40
INFO:tensorflow:Error reported to Coordinator: OOM when allocating tensor with shape[64,67,67,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 b y allocator GPU_0_bfc
[[Node: cond_1/rotated_subsampling/transform/ImageProjectiveTransform = ImageProjectiveTransform[dtype=DT_FLOAT, interpolation="BILINEAR", _device="/job:lo calhost/replica:0/task:0/device:GPU:0"](cond_1/rotated_subsampling/resize_image_with_crop_or_pad/control_dependency_3, cond_1/rotated_subsampling/compose_transforms /strided_slice_3)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: total_loss_1/_1889 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica :0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_11682_total_loss_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU :0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.'인공지능(AI) > TensorFlow' 카테고리의 다른 글
TensorFlow GPU Memory error (0) 2020.03.09 TensorFlow Error 2020/02/20 (0) 2020.02.20 TensorFlow Error 2020/02/19 (0) 2020.02.19