| 1) gpu-basics-similarity]$ nvcc -O3 gpu-basics-similarity.cpp -I /THL5/home/te33334.16/include -L /THL5/ho33334.16/lib64 -lopencv_cudabgsegm -lopencv_cudaobjdetect -lopencv_cudastereo -lopencv_stitching -lopencv_cudafeatures2d -lopencv_superres -lopencv_cudacodec -lopencv_videostab -lopencv_cudaoptflow -lopencv_cudalegacy -lopencv_cudawarping -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_dnn_objdetect -lopencv_dpm -lopencv_highgui -lopencv_videoio -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_hfs -lopencv_img_hash -lopencv_line_descriptor -lopencv_optflow -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_dnn -lopencv_plot -lopencv_xfeatures2d -lopencv_shape -lopencv_video -lopencv_ml -lopencv_ximgproc -lopencv_xobjdetect -lopencv_objdetect -lopencv_calib3d -lopencv_imgcodecs -lopencv_features2d -lopencv_flann -lopencv_xphoto -lopencv_photo -lopencv_cudaimgproc -lopencv_cudafilters -lopencv_imgproc -lopencv_cudaarithm -lopencv_core -lopencv_cudev 2) a.out gpu-basics-similarity.cpp lena.jpg lena_tmpl.jpg 3) ./a.out lena.jpg lena_tmpl.jpg -------------------------------------------------------------------------- This program shows how to port your CPU code to CUDA or write that from scratch. You can see the performance improvement for the similarity check methods (PSNR and SSIM). Usage: ./gpu-basics-similarity referenceImage comparedImage numberOfTimesToRunTest(like 10). -------------------------------------------------------------------------- terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid 运行出错,本地无GPU 3) yhrun -p TH_GPU -N 1 ./a.out lena.jpg lena_tmpl.jpg -------------------------------------------------------------------------- This program shows how to port your CPU code to CUDA or write that from scratch. You can see the performance improvement for the similarity check methods (PSNR and SSIM). Usage: ./gpu-basics-similarity referenceImage comparedImage numberOfTimesToRunTest(like 10). -------------------------------------------------------------------------- terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid yhrun: error: gn10: task 0: Aborted 还是一样的出错 gdb 调试 #11 0x00000000004063ec in main(int, char**) (argv=0x7fffffffda48) at gpu-basics-similarity.cpp:81 int TIMES = 10; stringstream sstr(argv[3]); sstr >> TIMES; double time, result = 0; 需要三个参数,我倒 4) 在K80上也非常好了 yhrun -p TH_GPU -N 1 ./a.out lena.jpg lena_tmpl.jpg 10 -------------------------------------------------------------------------- This program shows how to port your CPU code to CUDA or write that from scratch. You can see the performance improvement for the similarity check methods (PSNR and SSIM). Usage: ./gpu-basics-similarity referenceImage comparedImage numberOfTimesToRunTest(like 10). -------------------------------------------------------------------------- Time of PSNR CPU (averaged for 10 runs): 5.35158 milliseconds. With result of: 16.7378 Time of PSNR CUDA (averaged for 10 runs): 559.219 milliseconds. With result of: 16.7378 Initial call CUDA optimized: 1.8707 milliseconds. With result of: 16.7378 Time of PSNR CUDA OPTIMIZED ( / 10 runs): 1.25884 milliseconds. With result of: 16.7378 [ WARN:0] OpenCV/MatExpr: processing of multi-channel arrays might be changed in the future: https://github.com/opencv/opencv/issues/16739 [ WARN:0] OpenCV/MatExpr: processing of multi-channel arrays might be changed in the future: https://github.com/opencv/opencv/issues/16739 Time of MSSIM CPU (averaged for 10 runs): 152.462 milliseconds. With result of B0.904694 G0.905915 R0.909984 Time of MSSIM CUDA (averaged for 10 runs): 194.516 milliseconds. With result of B0.904694 G0.905915 R0.909984 Time of MSSIM CUDA Initial Call 12.9246 milliseconds. With result of B0.904694 G0.905915 R0.909984 Time of MSSIM CUDA OPTIMIZED ( / 10 runs): 11.4991 milliseconds. With result of B0.904694 G0.905915 R0.909984 6) gpu-thrust-interop]$ yhrun -p TH_GPU -N 1 ./a.out 40 也可以运行不知道 是啥事意思 |
说点什么...