PKTQ: Per‐Kernel Thresholded Quantization With Size‐Constrained Two‐Stage Optimization for CNNs

ABSTRACT Convolutional neural networks are challenging to deploy on resource‐constrained computing platforms due to their high storage and computational demands. Mixed‐precision quantization can mitigate these issues by assigning different bit‐widths per layer. However, finding an optimal quantization policy is challenging due to the exponential size of the policy search space and the need for extensive policy evaluations. Additionally, existing works often overlook strict model size constraints