| Freq | width | height | cin | cout | kernel | CPU Direct | CPU GEMM | CPU Winograd | GPU Direct | GPU GEMM | GPU Winograd | comment |
| 415 | 28 | 28 | 64 | 64 | 3 | 4894 | 8329 | 8764 | 3702 | 3160 | 2073 | comment |
| 415 | 56 | 56 | 64 | 64 | 3 | 10550 | 8928 | 6389 | 11480 | 9794 | 4364 | comment |
| 415 | 112 | 112 | 64 | 64 | 3 | 61924 | 36448 | 29850 | 49285 | 44137 | 10729 | comment |
| 415 | 224 | 224 | 64 | 64 | 3 | 221346 | 156849 | 90626 | 609088 | 553315 | 31557 | comment |
| 415 | 448 | 448 | 64 | 64 | 3 | 938130 | 707488 | 584700 | 2083466 | 2036609 | 387459 | comment |
| 415 | 56 | 56 | 16 | 16 | 3 | 2078 | 2222 | 2407 | 1502 | 1878 | 1377 | comment |
| 415 | 56 | 56 | 32 | 32 | 3 | 5924 | 4531 | 3929 | 3619 | 2770 | 2256 | comment |
| 415 | 56 | 56 | 64 | 64 | 3 | 20023 | 8636 | 5254 | 12443 | 9336 | 3677 | comment |
| 415 | 56 | 56 | 128 | 128 | 3 | 39891 | 27909 | 16788 | 64911 | 43961 | 18008 | comment |
| 767 | 28 | 28 | 64 | 64 | 3 | 13135 | 4867 | 3867 | 2569 | 2174 | 4876 | comment |
| 767 | 56 | 56 | 64 | 64 | 3 | 20223 | 8148 | 6814 | 7177 | 5901 | 4322 | comment |
| 767 | 112 | 112 | 64 | 64 | 3 | 76097 | 37595 | 28982 | 26862 | 25174 | 6157 | comment |
| 767 | 224 | 224 | 64 | 64 | 3 | 195877 | 167989 | 87208 | 529305 | 684542 | 120056 | comment |
| 767 | 448 | 448 | 64 | 64 | 3 | 790089 | 610065 | 534801 | 3177016 | 488922 | 180504 | comment |
| 767 | 56 | 56 | 16 | 16 | 3 | 2570 | 3334 | 4277 | 3186 | 5256 | 3109 | comment |
| 767 | 56 | 56 | 32 | 32 | 3 | 5593 | 6777 | 4587 | 12356 | 9398 | 4785 | comment |
| 767 | 56 | 56 | 64 | 64 | 3 | 20750 | 9228 | 6636 | 46598 | 37182 | 13012 | comment |
| 767 | 56 | 56 | 128 | 128 | 3 | 39743 | 24085 | 16577 | 225332 | 24101 | 10531 | comment |
unit(us)
| index | Freq | width | height | cin | cout | kernel | GPU Direct | GPU GEMM | GPU Winograd | Speedup Binary | Speedup Ternary |
| 1 | 415 | 28 | 28 | 64 | 64 | 3 | 3702 | 3160 | 2073 | 6.7 | 6.1 |
| 2 | 415 | 56 | 56 | 64 | 64 | 3 | 11480 | 9794 | 4364 | 7.9 | 7.9 |
| 3 | 415 | 112 | 112 | 64 | 64 | 3 | 49285 | 44137 | 10729 | 10.8 | 11.1 |
| 4 | 415 | 224 | 224 | 64 | 64 | 3 | 609088 | 553315 | 31557 | 36.3 | 37.3 |
| 5 | 415 | 448 | 448 | 64 | 64 | 3 | 2083466 | 2036609 | 387459 | 33.9 | 35.0 |
| 6 | 415 | 56 | 56 | 16 | 16 | 3 | 1502 | 1878 | 1377 | 5.5 | 5.2 |
| 7 | 415 | 56 | 56 | 32 | 32 | 3 | 3619 | 2770 | 2256 | 5.0 | 4.9 |
| 8 | 415 | 56 | 56 | 64 | 64 | 3 | 12443 | 9336 | 3677 | 7.4 | 7.7 |
| 9 | 415 | 56 | 56 | 128 | 128 | 3 | 64911 | 43961 | 18008 | 11.2 | 12.1 |
| 10 | 415 | 56 | 56 | 256 | 256 | 3 | 799638 | 344602 | 142474 | 24.4 | 27.1 |
| 11 | 767 | 28 | 28 | 64 | 64 | 3 | 2569 | 2174 | 4876 | 5.8 | 6.5 |
| 12 | 767 | 56 | 56 | 64 | 64 | 3 | 7177 | 5901 | 4322 | 7.0 | 8.3 |
| 13 | 767 | 112 | 112 | 64 | 64 | 3 | 26862 | 25174 | 6157 | 10.9 | 11.3 |
| 14 | 767 | 224 | 224 | 64 | 64 | 3 | 529305 | 684542 | 120056 | 76.0 | 84.3 |
| 15 | 767 | 448 | 448 | 64 | 64 | 3 | 3177016 | 488922 | 180504 | 15.0 | 15.5 |
| 16 | 767 | 56 | 56 | 16 | 16 | 3 | 3186 | 5256 | 3109 | 9.2 | 9.7 |
| 17 | 767 | 56 | 56 | 32 | 32 | 3 | 12356 | 9398 | 4785 | 7.4 | 7.9 |
| 18 | 767 | 56 | 56 | 64 | 64 | 3 | 46598 | 37182 | 13012 | 46.5 | 21.7 |
| 19 | 767 | 56 | 56 | 128 | 128 | 3 | 225332 | 24101 | 10531 | 11.0 | 11.8 |
| 20 | 767 | 56 | 56 | 256 | 256 | 3 | 500958 | 88924 | 53745 | 10.6 | 12.7 |
Note: For certrain index, for example 4,5,14, the speedup is abnormal. It is not becuase we really run faster. It is in fact caused by the bad corner case of the ARM computing library.