RK3588 RedeceMax OP在CPU上运行,耗时过大
在简单的自定网络结构中,需要实现(B, C, H, W) 降维成 (B, C, W)使用ReduceMax OP + Reshape OP来实现此功能,发现ReduceMax OP是在CPU上运行的,耗时很大(约140ms)。
在rk3588开发板上实测结果如下:
D RKNN: ID OpType DataType Target InputShape OutputShape DDR Cycles NPU Cycles Total Cycles Time(us) MacUsage(%) RW(KB) FullName
D RKNN: 0 InputOperator INT8 CPU \ (1,10,32,10000) 0 0 0 4 \ 5000.00 InputOperator:voxels_input
D RKNN: 1 ConvRelu INT8 NPU (1,10,32,10000),(64,10,1,1),(64) (1,64,32,10000) 811751 200000 811751 3871 6.89 25001.50 Conv:Conv_0
D RKNN: 2 ReduceMax INT8 CPU (1,64,32,10000) (1,64,1,10000) 0 0 0 139036 \ 20625.00 ReduceMax:ReduceMax_2
D RKNN: 3 Reshape INT8 CPU (1,64,1,10000),(4) (1,64,10000,1) 0 0 0 1048 \ 1250.03 Reshape:Squeeze_3_2reshape
D RKNN: 4 OutputOperator INT8 CPU (1,64,10000,1) \ 0 0 0 40 \ 625.00 OutputOperator:pillar_features
D RKNN: Total Operator Elapsed Time(us): 143999
------------------------------------------------------------------------------------------------------------------------------------------------------------------
另外,采用Maxpool替换ReduceMax OP,同样发现在CPU上运行,耗时很大(约130ms)。在rk3588开发板上实测结果如下:
D RKNN: ID OpType DataType Target InputShape OutputShape DDR Cycles NPU Cycles Total Cycles Time(us) MacUsage(%) RW(KB) FullName
D RKNN: 0 InputOperator INT8 CPU \ (1,10,32,10000) 0 0 0 4 \ 5000.00 InputOperator:voxels_input
D RKNN: 1 ConvRelu INT8 NPU (1,10,32,10000),(64,10,1,1),(64) (1,64,32,10000) 811751 200000 811751 3873 6.89 25001.50 Conv:Conv_0
D RKNN: 2 MaxPool INT8 CPU (1,64,32,10000) (1,64,1,10000) 0 0 0 130099 \ 20625.00 MaxPool:MaxPool_2
D RKNN: 3 Reshape INT8 CPU (1,64,1,10000),(4) (1,64,10000,1) 0 0 0 779 \ 1250.03 Reshape:Squeeze_3_2reshape
D RKNN: 4 OutputOperator INT8 CPU (1,64,10000,1) \ 0 0 0 28 \ 625.00 OutputOperator:pillar_features
D RKNN: Total Operator Elapsed Time(us): 134783
------------------------------------------------------------------------------------------------------------------------------------------------------------------
请问能否优化,使得reducemax op在NPU上运行,提高速度。另外,为何使用maxpool op是在CPU上运行而非NPU?
你好像用错工具了,3588 应该使用二代的工具,rknn-toolkit2-v1.3.0,你用的是一代 Liuth 发表于 2022-8-24 14:29
你好像用错工具了,3588 应该使用二代的工具,rknn-toolkit2-v1.3.0,你用的是一代
抱歉,填错了相关信息。使用的工具是rknn-toolkit2RK_NPU_SDK_1.3.0
下载链接:https://eyun.baidu.com/enterprise/share/init?cid=8272257679089781337&uk=1883176049&sid=202205053973938618
fetch code: rknn
页:
[1]