在开发板上性能评估与模拟器评估模型运行帧率差距较大
(Log日志与下面的运行数据相同)问题复现步骤:
1.运行examples/tflite/mobilenet_v1中的test.py
2.分别使用模拟器进行评估以及连接开发板评估,结果如下:
----------模拟器评估-----------------
--> config model
done
--> Loading model
W The target_platform is not set in config, using default target platform rk1808.
done
--> Building model
done
--> Export RKNN model
done
--> Init runtime environment
librknn_runtime version 1.7.3 (5047ff8 build: 2022-08-13 12:11:22 base: 1131)
done
--> Running model
mobilenet_v1
-----TOP 5-----
: 0.8642578125
: 0.083740234375
: 0.01241302490234375
: 0.006565093994140625
: 0.002044677734375
done
--> Evaluate model performance
W When performing performance evaluation, inputs can be set to None to use fake inputs.
========================================================================
Performance
========================================================================
Layer ID Name Time(us)
60 openvx.tensor_transpose_3 72
1 convolution.relu.pooling.layer2_2 369
3 convolution.relu.pooling.layer2_2 211
5 convolution.relu.pooling.layer2_2 184
7 convolution.relu.pooling.layer2_2 315
9 convolution.relu.pooling.layer2_2 99
11 convolution.relu.pooling.layer2_2 137
13 convolution.relu.pooling.layer2_2 103
15 convolution.relu.pooling.layer2_2 116
17 convolution.relu.pooling.layer2_2 95
19 convolution.relu.pooling.layer2_2 102
21 convolution.relu.pooling.layer2_2 151
23 convolution.relu.pooling.layer2_2 95
25 convolution.relu.pooling.layer2_2 109
27 convolution.relu.pooling.layer2_2 106
29 convolution.relu.pooling.layer2_2 211
31 convolution.relu.pooling.layer2_2 106
33 convolution.relu.pooling.layer2_2 211
35 convolution.relu.pooling.layer2_2 106
37 convolution.relu.pooling.layer2_2 211
39 convolution.relu.pooling.layer2_2 106
41 convolution.relu.pooling.layer2_2 211
43 convolution.relu.pooling.layer2_2 106
45 convolution.relu.pooling.layer2_2 211
47 convolution.relu.pooling.layer2_2 108
49 convolution.relu.pooling.layer2_2 163
51 convolution.relu.pooling.layer2_2 206
53 convolution.relu.pooling.layer2_2 319
55 pooling.layer2 34
56 fullyconnected.relu.layer_3 110
58 softmaxlayer2.layer 39
Total Time(us): 4722
FPS(600MHz): 158.83
FPS(800MHz): 211.77
Note: Time of each layer is converted according to 800MHz!
========================================================================
done
---------开发板评估-----------
--> config model
done
--> Loading model
done
--> Building model
done
--> Export RKNN model
done
--> Init runtime environment
W Flag perf_debug has been set, it will affect the performance of inference!
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI: API: 1.7.3 (0cfd4a1 build: 2022-08-15 17:08:57)
D RKNNAPI: DRV: 1.7.0 (7880361 build: 2021-08-16 14:05:08)
D RKNNAPI: ==============================================
done
--> Running model
mobilenet_v1
-----TOP 5-----
: 0.8515625
: 0.091796875
: 0.0135955810546875
: 0.0064697265625
: 0.002239227294921875
done
--> Evaluate model performance
W When performing performance evaluation, inputs can be set to None to use fake inputs.
========================================================================
Performance
#### The performance result is just for debugging, ####
#### may worse than actual performance! ####
========================================================================
Layer ID Name Operator Uid Time(us)
0 MobilenetV1/MobilenetV1/Conv2d_0/Relu6_1 TENSOR_TRANS 60 361
_RKNN_mark_perm_60_0
2 MobilenetV1/MobilenetV1/Conv2d_0/Relu6_1 CONVOLUTION 1 920
_2
3 MobilenetV1/MobilenetV1/Conv2d_1_depthwi DEPTH_WISE_CONV 3 896
se/Relu6_3_2
4 MobilenetV1/MobilenetV1/Conv2d_1_pointwi CONVOLUTION 5 1106
se/Relu6_5_2
5 MobilenetV1/MobilenetV1/Conv2d_2_depthwi DEPTH_WISE_CONV 7 430
se/Relu6_7_2
6 MobilenetV1/MobilenetV1/Conv2d_2_pointwi CONVOLUTION 9 5080
se/Relu6_9_2
7 MobilenetV1/MobilenetV1/Conv2d_3_depthwi DEPTH_WISE_CONV 11 4888
se/Relu6_11_2
8 MobilenetV1/MobilenetV1/Conv2d_3_pointwi CONVOLUTION 13 5043
se/Relu6_13_2
9 MobilenetV1/MobilenetV1/Conv2d_4_depthwi DEPTH_WISE_CONV 15 148
se/Relu6_15_2
10 MobilenetV1/MobilenetV1/Conv2d_4_pointwi CONVOLUTION 17 243
se/Relu6_17_2
11 MobilenetV1/MobilenetV1/Conv2d_5_depthwi DEPTH_WISE_CONV 19 161
se/Relu6_19_2
12 MobilenetV1/MobilenetV1/Conv2d_5_pointwi CONVOLUTION 21 196
se/Relu6_21_2
13 MobilenetV1/MobilenetV1/Conv2d_6_depthwi DEPTH_WISE_CONV 23 74
se/Relu6_23_2
14 MobilenetV1/MobilenetV1/Conv2d_6_pointwi CONVOLUTION 25 109
se/Relu6_25_2
15 MobilenetV1/MobilenetV1/Conv2d_7_depthwi DEPTH_WISE_CONV 27 74
se/Relu6_27_2
16 MobilenetV1/MobilenetV1/Conv2d_7_pointwi CONVOLUTION 29 157
se/Relu6_29_2
17 MobilenetV1/MobilenetV1/Conv2d_8_depthwi DEPTH_WISE_CONV 31 69
se/Relu6_31_2
18 MobilenetV1/MobilenetV1/Conv2d_8_pointwi CONVOLUTION 33 160
se/Relu6_33_2
19 MobilenetV1/MobilenetV1/Conv2d_9_depthwi DEPTH_WISE_CONV 35 69
se/Relu6_35_2
20 MobilenetV1/MobilenetV1/Conv2d_9_pointwi CONVOLUTION 37 157
se/Relu6_37_2
21 MobilenetV1/MobilenetV1/Conv2d_10_depthw DEPTH_WISE_CONV 39 67
ise/Relu6_39_2
22 MobilenetV1/MobilenetV1/Conv2d_10_pointw CONVOLUTION 41 158
ise/Relu6_41_2
23 MobilenetV1/MobilenetV1/Conv2d_11_depthw DEPTH_WISE_CONV 43 72
ise/Relu6_43_2
24 MobilenetV1/MobilenetV1/Conv2d_11_pointw CONVOLUTION 45 159
ise/Relu6_45_2
25 MobilenetV1/MobilenetV1/Conv2d_12_depthw DEPTH_WISE_CONV 47 73
ise/Relu6_47_2
26 MobilenetV1/MobilenetV1/Conv2d_12_pointw CONVOLUTION 49 128
ise/Relu6_49_2
27 MobilenetV1/MobilenetV1/Conv2d_13_depthw DEPTH_WISE_CONV 51 69
ise/Relu6_51_2
28 MobilenetV1/MobilenetV1/Conv2d_13_pointw CONVOLUTION 53 209
ise/Relu6_53_2
29 MobilenetV1/Logits/AvgPool_1a/AvgPool_55 DEPTH_WISE_CONV 55 128
_2
30 MobilenetV1/Logits/Conv2d_1c_1x1/BiasAdd FULLYCONNECTED 56 477
_56_0
1 Softmax2Layer_1 SOFTMAX 945
Total Time(us): 22826
FPS: 43.81
========================================================================
done
相同的模型推理时间差距5倍左右
(后续测试yolov5s模型,在开发板上评估运行帧率4帧左右)
请问是什么原因导致?应该如何优化解决。
设置了perf_debug=True,修改成False之后正常了
页:
[1]