WebNN POC API Native Mapping

	A	B	C	D	E	F	G	H	I
1			NNAPI	MPS	DirectML	BNNS	DNNL (MKL-DNN)	clDNN	ONNX

2	Tensor Type	Float32	ANEURALNETWORKS_TENSOR_FLOAT32	MPSImageFeatureChannelFormatFloat32 (MPS uses float 16 internally)	DML_TENSOR_DATA_TYPE_FLOAT32	BNNSDataTypeFloat32	memory::data_type:f32	cldnn_f32	float32 (Tensor Element Types)
3		Float16	ANEURALNETWORKS_TENSOR_FLOAT16	MPSImageFeatureChannelFormatFloat16	DML_TENSOR_DATA_TYPE_FLOAT16	BNNSDataTypeFloat16	memory::data_type:f16	cldnn_f16	float16 (Tensor Element Types)
4		Quantized Int8	ANEURALNETWORKS_TENSOR_QUANT8_ASYMM real_value = (integer_value - zeroPoint) * scale	[Note]: not supported	DML_TENSOR_DATA_TYPE_UINT8	[Note]: not supported	memory::data_type:s8	[Note]: not supported	[Note]: not mentioned
5
6	Convolution	op	ANEURALNETWORKS_CONV_2D	MPSCNNConvolutionNode + MPSCNNConvolutionDescriptor	DML_OPERATOR_CONVOLUTION+ DML_CONVOLUTION_OPERATOR_DESC	BNNSFilterCreateConvolutionLayer	convolution_forward	cldnn_convolution_desc	Conv
7		input	4-D tensor (NHWC) input[0]: [batches, height, width, depth_in]	4-D tensor (NHWC) MPSImageDescriptor.{numberOfImages, featureChannels, width, height, MPSImageFeatureChannelFormatFloat16} sourceImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_CONVOLUTION_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	4-D tensor (NCHW) BNNSImageStackDescription: P(c,x,y) at position(x,y) in channel c is stored in data[x + row_stride * y + image_stride * c] row_stride = width image_stride = row_stride * height	4-D tensor (NCHW) src_desc of convolution_forward::desc	4-D tensor (NHWC or NCHW) cldnn_convolution_desc.input	4-D tensor (NCHW) Inputs["X"] : [batches, channels, height, width] [Note]: also support tensor > 4D with [N, C, D1, D2, ..., Dn]
8		filter	input[1]: [depth_out, filter_height, filter_width, depth_in]	weight[ outputChannels ][ kernelHeight ][ kernelWidth ][ inputChannels / groups ] MPSCNNConvolutionDescriptor.{kernelWidth, kernelHeight, inputFeatureChannels, outputFeatureChannels} MPSCNNConvolutionDataSource.weights	4-D tensor [depth_out, depth_in, filter_height, filter_width] DML_CONVOLUTION_OPERATOR_DESC.FilterTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_OWNED_BY_DML, dimensions, strides}	4-D tensor [depth_out, depth_in, filter_height, filter_width] BNNSConvolutionLayerParameters.{in_channels, out_channels, k_width, k_height} Weight (o,i,kx,ky) for output (out_channels), input (in_channels) the kernel point (kx, ky) is stored in weights[kx + k_width * (ky + k_height * (i + in_channles * o))]	4-D tensor [depth_out, depth_in, filter_height, filter_width] weights_desc of convolution_forward::desc	cldnn_convolution_desc.weights	Inputs["W"]: [M, C/group, kH, kW], where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For common convolution, the group is 1 [Note]: also support tensor > 4D with size (M x C/group x k1 x k2 x ... x kn)
9		bias	input[2]: [depth_out]	MPSCNNConvolutionDataSource.biasTerms	4-D tensor [1, depth_out, 1, 1] DML_CONVOLUTION_OPERATOR_DESC.BiasTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_OWNED_BY_DML, dimensions, strides}	BNNSLayer BNNSConvolutionLayerParameters.bias	bias_desc of of convolution_forward::desc	cldnn_convolution_desc.bias	Inputs["B"]: 1D tensor [M], M is the number of feature maps
10		padding	explicit padding: input[3:6]: left, right, top, bottom implicit padding: input[3]: SAME, VALID	MPSCNNConvolutionNode.offset.{x, y} [Note]: right and bottom of explicit padding is not supported? [Note]: implicit padding is not supported?	DML_CONVOLUTION_OPERATOR_DESC.StartPadding {padding_top, padding_left} DML_CONVOLUTION_OPERATOR_DESC.EndPadding {padding_bottom, padding_right} [Note]: implicit padding is not supported?	BNNSConvolutionLayerParameters.x_padding BNNSConvolutionLayerParameters.y_padding [Note]: only support same left and right paddings, and same top and bottom paddings [Note]: implicit padding is not supported	padding_l of of convolution_forward::desc: [top, left] padding_r of convolution_forward::desc: [bottom, right] [Note]: implicit padding is not supported	cldnn_convolution_desc.input_offset [Note]: implicit padding is not supported	explicit padding: Attributes["pads"]: [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`, for 2D image, it's [top, bottom, left, right] implicit padding: Attribute["auto_pad"]: NOTSET, SAME_UPPER, SAME_LOWER or VALID [Note]: implicit padding's SAME_UPPER or SAME_LOWER are not mapped [Note]: implicit padding has DEPRECATION NOTE
11		stride	input[4:5]: stride_width, stride_height	MPSCNNConvolutionDescriptor.strideInPixelsX MPSCNNConvolutionDescriptor.strideInPixelsY	DML_CONVOLUTION_OPERATOR_DESC.Strides {stride_width, stride_height}	BNNSConvolutionLayerParameters.x_stride BNNSConvolutionLayerParameters.y_stride	strides of convolution_forward::desc: [stride_width, stride_height]	cldnn_convolution_desc.stride	Attributes["strides"]: list of ints, stride along each axis, for 2D image, stride_height, stride_width
12		fused activation	input[6]: RELU, RELU1, RELU6	MPSCNNConvolutionDescriptor.neuron MPSCNNNeuronReLU MPSCNNNeuronReLUN [Note]: RELU1 min(1.f, max(-1.f, input)) not supported	DML_OPERATOR_ACTIVATION_RELU DML_OPERATOR_ELEMENT_WISE_CLIP {-1, 1} DML_OPERATOR_ELEMENT_WISE_CLIP {0, 6}	BNNSActivationFunctionRectifiedLinear BNNSActivationFunctionClamp: {min(max(x, alpha), beta)}	post_ops::append_eltwise: eltwise_relu eltwise_bounded_relu	cldnn_convolution_desc.with_activation [Note]: fused RELU1 and RELU6 are not supported	[Note]: not mentoined, might not be needed as IR level
13		dilation rate	input[11:12]: dilation_width, dilation_height	MPSCNNConvolutionDescriptor.{dilationRateX, dilationRateY}	DML_CONVOLUTION_OPERATOR_DESC.Dilations {dilation_width, dilation_height}	[Note]: not supported	dilates of convolution_forward::desc: [dilation_width, dilation_height]	cldnn_convolution_desc.dilation	Attributes["dilations"]: list of ints, dilation value along each axis of the filter
14		output	4-D tensor (NHWC) output[0]: [batches, out_height, out_width, depth_out]	4-D tensor (NHWC) MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_CONVOLUTION_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	4-D tensor (NCHW) BNNSImageStackDescription: P(c,x,y) at position(x,y) in channel c is stored in data[x + row_stride * y + image_stride * c] row_stride = width image_stride = row_stride * height	4-D tensor (NCHW) dst_desc of convolution_forward::desc	4-D tensor (HNWC or NCHW) cldnn_convolution_desc.with_output_size cldnn_convolution_desc.output_size Output tensor of primitive	Outputs["Y"]: for 4-D tensor (NCHW)
15
16	Depthwise Convolution (addtional to convolution)	op	ANEURALNETWORKS_DEPTHWISE_CONV_2D	MPSCNNConvolutionNode + MPSCNNDepthWiseConvolutionDescriptor	DML_OPERATOR_CONVOLUTION+ DML_CONVOLUTION_OPERATOR_DESC Set DML_CONVOLUTION_OPERATOR_DESC.GroupCount to in_channels	[Note]: not supported	convolution_forward with weights_format = dnnl_hwigo and group size = number of filters	Set cldnn_convolution_desc.split as depth_out	Use same Conv op with attributes["group"] equals to in_channels [Note]: not well documented in onnx doc, hints from https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/tfonnx.py#L519
17	Depthwise Convolution (addtional to convolution)	depthwise multiplier	input[6]: depthwise multiplier	MPSCNNDepthWiseConvolutionDescriptor.channelMultiplier	[Note] only tested multiplier as 1	[Note]: not supported	[note] only multipier 1 is supported	[Note]: only support multiplier as 1	out_channels = K * in_channels, where depthwise multiplier K [Note]: not well documented in onnx doc, hints from https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/tfonnx.py#L519
18
19		op	ANEURALNETWORKS_AVERAGE_POOL_2D	MPSCNNPoolingAverageNode	DML_OPERATOR_AVERAGE_POOLING + DML_AVERAGE_POOLING_OPERATOR_DESC	BNNSFilterCreatePoolingLayer BNNSPoolingLayerParameters.BNNSPoolingFunction = BNNSPoolingAverage	dnnl_pooling_forward alg kind = dnnl_pooling_avg	cldnn_pooling_desc.mode = cldnn_pooling_average	AveragePool
20	Average Pooling	input	4-D tensor (NHWC) input[0]: [batches, height, width, depth_in]	4-D tensor (NHWC) MPSImageDescriptor.{numberOfImages, featureChannels, width, height} sourceImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_AVERAGE_POOLING_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	4-D tensor (NCWH) BNNSImageStackDescription: P(c,x,y) at position(x,y) in channel c is stored in data[x + row_stride * y + image_stride * c] row_stride = width image_stride = row_stride * height	4-D tensor (NCHW) src_desc of pooling_forward::desc	4-D tensor (NHWC or NCHW) cldnn_pooling_desc.input	4-D tensor (NCHW) Inputs["X"]: [batches, channels, height, width] [Note]: also support tensor > 4D with [N, C, D1, D2, ..., Dn]
21		padding	explicit padding: input[1:4]: left, right, top, bottom implicit padding: input[1]: SAME, VALID	MPSCNNPoolingAverageNode.offset.{x, y} [Note]: right and bottom of explicit padding is not supported [Note]: implicit padding is not supported	DML_AVERAGE_POOLING_OPERATOR_DESC.StartPadding {padding_top, padding_left} DML_AVERAGE_POOLING_OPERATOR_DESC.EndPadding {padding_bottom, padding_right}	BNNSPoolingLayerParameters.x_padding BNNSPoolingLayerParameters.y_padding [Note]: right and bottom of explicit padding is not supported [Note]: implicit padding is not supported	padding_l of of pooling_forward::desc: [top, left] padding_r of pooling_forward::desc: [bottom, right] [Note]: implicit padding is not supported	cldnn_pooling_desc.input_offset [Note]: implicit padding is not supported	"explicit padding: Attributes["pads"]: [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`, for 2D image, it's [top, bottom, left, right] implicit padding: Attributes["auto_pad"]: NOTSET, SAME_UPPER, SAME_LOWER or VALID [Note]: implicit padding's SAME_UPPER or SAME_LOWER are not mapped [Note]: implicit padding has DEPRECATION NOTE
22		stride	input[2:3]: stride_width, stride_height	MPSCNNPoolingAverageNode.{strideInPixelsX, strideInPixelsY}	DML_AVERAGE_POOLING_OPERATOR_DESC.Strides {stride_width, stride_height}	BNNSPoolingLayerParameters.x_stride BNNSPoolingLayerParameters.y_stride	strides of pooling_forward::desc: [stride_width, stride_height]	cldnn_pooling_desc.stride	Attributes["strides"] : list of ints, stride along each axis, for 2D image, stride_height, stride_width
23		filter size	input[4:5]: filter_width, filter_height	MPSCNNPoolingAverageNode.{kernelWidth, kernelHeight}	DML_AVERAGE_POOLING_OPERATOR_DESC.WindowSize {filter_width, filter_height}	BNNSPoolingLayerParameters.k_width, BNNSPoolingLayerParameters.k_height	2D tensor [filter_width, filter_height] kernel of convolution_forward::desc	cldnn_pooling_desc.size	Attributes["kernel_shape"]: list of ints, the size of the kernel along each axis, for 2D image, filter_height, filter_width
24		fused activation	input[6]: RELU, RELU1, RELU6	[Note]: not supported	DML_OPERATOR_ACTIVATION_RELU DML_OPERATOR_ELEMENT_WISE_CLIP {-1, 1} DML_OPERATOR_ELEMENT_WISE_CLIP {0, 6}	BNNSActivationFunctionRectifiedLinear BNNSActivationFunctionClamp: {min(max(x, alpha), beta)}	post_ops::append_eltwise: eltwise_relu eltwise_bounded_relu	[Note]: not supported	[Note]: not mentoined, might not be needed as IR level
25		output	4-D tensor (NHWC) output[0]: [batches, out_height, out_width, depth]	4-D tensor (NHWC) MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_AVERAGE_POOLING_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	4-D tensor(NCWH) BNNSImageStackDescription: P(c,x,y) at position(x,y) in channel c is stored in data[x + row_stride * y + image_stride * c] row_stride = width image_stride = row_stride * height	4-D tensor (NCHW) dst_desc of pooling_forward::desc	4-D tensor (HNWC or NCHW) cldnn_pooling_desc.with_output_size cldnn_pooling_desc.output_size Output tensor of primitive	Outputs["Y"]: for 4-D tensor (NCHW)
26
27	Max Pooling	op	ANEURALNETWORKS_MAX_POOL_2D	MPSCNNPoolingMaxNode	DML_OPERATOR_MAX_POOLING + DML_MAX_POOLING_OPERATOR_DESC	BNNSFilterCreatePoolingLayer BNNSPoolingLayerParameters.BNNSPoolingFunction = BNNSPoolingMax	dnnl_pooling_forward alg kind = dnnl_pooling_max	cldnn_pooling_desc.mode = cldnn_pooling_max	MaxPool
28	Max Pooling	[Note]: other parameters are same as Average Pooling 2D							[Note]: Attributes["storage_order"] is not mapped (issue: https://github.com/onnx/onnx/issues/1370) [Note]: Optional Outputs["Indices"] is not mapped
29
30	Softmax	op	ANEURALNETWORKS_SOFTMAX	MPSCNNSoftMaxNode	DML_OPERATOR_ACTIVATION_SOFTMAX + DML_ACTIVATION_SOFTMAX_OPERATOR_DESC	BNNSFilterCreateVectorActivationLayer BNNSActivation = BNNSActivationFunctionSoftmax	dnnl_softmax_forward_desc	cldnn_softmax_desc	Softmax
31		input	input[0]: 2-D or 4-D tensor (NHWC)	sourceImage: 2-D or 4-D tensor (NHWC)	4-D tensor (NCHW) DML_ACTIVATION_SOFTMAX_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSVectorDescription { size, data_type, data_scale, data_bias} size represenrs vector dimension	data_desc of dnnl_softmax_forward_desc softmax_axis axis over which softmax is computed	Up to 4-D tensor (HNWC or NCHW) cldnn_softmax_desc.with_output_size cldnn_softmax_desc.output_size	Inputs["input"]: 2-D tensor [batch_size, input_feature_dimensions] [Note]: document says input "X" (issue: https://github.com/onnx/onnx/issues/1369) does not need to explicitly be a 2D vector; rather, it will be coerced into one. Attributes["axis"] describes the axis of the inputs when coerced to 2D
32		beta	input[1]: float32 value	[Note]: only support beta as 1.0	[Note]: only support beta as 1.0	BNNSActivation.beta	[Note]: only support beta as 1.0	[Note]: only support beta as 1.0	[Note]: not supported
33		output	output[0]: output tensor of same shape as input0	destinationImage: NHWC	4-D tensor (NCHW) DML_ACTIVATION_SOFTMAX_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSVectorDescription { size, data_type, data_scale, data_bias} size represenrs vector dimension	data_desc of dnnl_softmax_forward_desc softmax_axis axis over which softmax is computed	Output tensor of primitive	Ouputs["output"]: the output values with the same shape as input tensor
34
35	Element-wise Add	op	ANEURALNETWORKS_ADD	MPSCNNAddNode	DML_OPERATOR_ELEMENT_WISE_ADD + DML_ELEMENT_WISE_ADD_OPERATOR_DESC	vDSP_vadd	dnnl_sum_primitive_desc	cldnn_eltwise_desc.mode = cldnn_eltwise_sum	Add
36		input	input[0, 1]: tensor 0 and 1, up to 4-D	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} primaryImage of MPSNNGraph.encode, up to 4-D secondaryImage of MPSNNGraph.encode, up to 4-D	4-D tensor (NCHW) DML_ELEMENT_WISE_ADD_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}		src[0](dnnl_query_src_md.0) src[1](dnnl_query_src_md,1) ... src[n-1](dnnl_query_src_md,n-1)	4-D tensor (HNWC or NCHW) cldnn_eltwise_desc.input (size = 2)	Inputs["A"] and Inputs["B"]: tensor [Note]: no limitation of tensor dimension
37		scale	[Note]: not supported	MPSCNNArithmeticNode.{primaryScale, secondaryScale}	[Note]: not supported		scales of sum_primitive_desc- vector pf scales tu multiply data in each source memory by	[Note]: not supported	[Note]: not suported
38		bias	[Note]: not supported	MPSCNNArithmeticNode.bias	[Note]: not supported		[Note]: not supported	[Note]: not supported	[Note]: not suported
39		fused activation	input[2]: RELU, RELU1, RELU6	[Note]: not supported	DML_OPERATOR_ACTIVATION_RELU DML_OPERATOR_ELEMENT_WISE_CLIP {-1, 1} DML_OPERATOR_ELEMENT_WISE_CLIP {0, 6}		[Note]: not supported	cldnn_eltwise_desc.with_activation [Note]: fused RELU1 and RELU6 are not supported	[Note]: not mentoined, might not be needed as IR level
40		output	output[0]: a tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_ELEMENT_WISE_ADD_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}		dst_desc of sum_primitive_desc	Output tensor of primitive	Outputs["C"]: tensor
41
42	Element-wise Multiply	op	ANEURALNETWORKS_MUL	MPSNNMultiplicationNode	DML_OPERATOR_ELEMENT_WISE_MULTIPLY + DML_ELEMENT_WISE_MULTIPLY_OPERATOR_DESC	vDSP_vmul	[Note]: not supported	cldnn_eltwise_desc.mode = cldnn_eltwise_prod	Mul
43		input	input[0, 1]: tensor 0 and 1, up to 4-D	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} primaryImage of MPSNNGraph.encode, up to 4-D secondaryImage of MPSNNGraph.encode, up to 4-D	4-D tensor (NCHW) DML_ELEMENT_WISE_MULTIPLY_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}			4-D tensor (HNWC or NCHW) cldnn_eltwise_desc.input (size = 2)	Inputs["A"] and Inputs["B"]: tensor [Note]: no limitation of tensor dimension
44		scale	[Note]: not supported	MPSCNNArithmeticNode.{primaryScale, secondaryScale}	[Note]: not supported			[Note]: not supported	[Note]: not suported
45		bias	[Note]: not supported	MPSCNNArithmeticNode.bias	[Note]: not supported			[Note]: not supported	[Note]: not suported
46		fused activation	input[2]: RELU, RELU1, RELU6	[Note]: not supported	DML_OPERATOR_ACTIVATION_RELU DML_OPERATOR_ELEMENT_WISE_CLIP {-1, 1} DML_OPERATOR_ELEMENT_WISE_CLIP {0, 6}			cldnn_eltwise_desc.with_activation [Note]: fused RELU1 and RELU6 are not supported	[Note]: not mentoined, might not be needed as IR level
47		output	output[0]: a tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_ELEMENT_WISE_MULTIPLY_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}			Output tensor of primitive	Outputs["C"]: tensor
48
49	Concatenation	op	ANEURALNETWORKS_CONCATENATION	MPSNNConcatenationNode	DML_OPERATOR_JOIN + DML_JOIN_OPERATOR_DESC	[Note]: not supported The pixel stored in BNNS is by the channel. When the concatenation axis is 4, it means concatenation by channel. So as a workaround, memcpy is used for concatenation.	dnnl_concat_primitive_desc	cldnn_concatenation_desc	Concat
50		inputs	input[0 ~ n-1]: The list of n input tensors	MPSNNImageNode of input tensors	DML_JOIN_OPERATOR_DESC.InputTensors DML_JOIN_OPERATOR_DESC.InputCount		src[0](dnnl_query_src_md.0) src[1](dnnl_query_src_md,1) ... src[n-1](dnnl_query_src_md,n-1)	cldnn_concatenation_desc.input (size 2 - n)	Inputs["inputs"]: list of tensors
51		axis	input[n]: specifying the concatenation axis	[Note]: only support concatenation along depth channel	DML_JOIN_OPERATOR_DESC.Axis		concat_dimension	cldnn_concatenation_desc.axis	Attributes["axis"]: which axis to concat on
52		output	output[0]: the output tensor	4-D tensor (NHWC) MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_JOIN_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}		dst(dnnl_query_dst_md, 0)	Output tensor of primitive	Outputs["concat_result"]: concatenated tensor
53
54	Reshape	op	ANEURALNETWORKS_RESHAPE	MPSNNReshape	DML_OPERATOR_CAST + DML_CAST_OPERATOR_DESC	[Note]: not supported	dnnl_reorder_primitive_desc	cldnn_reshape_desc	Reshape
55		input	tensor, up to 4-D input[0]	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} sourceImage of MPSNNGraph.encode, up to 4-D	4-D tensor (NCHW) DML_CAST_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}		src_desc of reorder_primitive_desc 4-D tensor {N,C,H,W}	cldnn_reshape_desc.input	Inputs["data"]: tensor
56		output shape	input[1]: A 1-D tensor of int32 defining the shape of the output tensor.	[Note]: no need, output shape specificed by destination image	[Note]: no need, output shape specificed by destination tensor		[Note]: no need, output shape specificed by destination image	cldnn_reshape_desc.output_shape	Inputs["shape"]: tensor(int64)
57		output	output[0]: The output tensor, of shape specified by the input shape.	MPSImageDescriptor.{numberOfImages, featureChannels, width, height} destinationImage of MPSNNGraph.encode, up to 4-D	4-D tensor (NCHW) DML_CAST_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}		dst_desc of reorder_primitive_desc	Output tensor of primitive	Outputs["reshaped"]: reshaped tensor
58							4-D tensor{NCWH}
59	Fully Connected	op	ANEURALNETWORKS_FULLY_CONNECTED	MPSCNNFullyConnectedNode	DML_OPERATOR_GEMM + DML_GEMM_OPERATOR_DESC	BNNSFilterCreateFullyConnectedLayer	dnnl_inner_product_forward_desc	cldnn_fully_connected_desc
60		input	input[0]: a tensor of at least rank 2	Reshape to MPSImageDescriptor.{1, 1 product(dimensions) / weights[1], weights[1]}	Reshape {n, c, h, w} to {1, 1, input_batch_size, input_size} dimensions = {1, 1, input_batch_size, input_size} 4-D tensor (NCHW) DML_GEMM_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}"	BNNSVectorDescriptor: Represents a vector of dimension size. Each vector element is a scalar value, stored using the type specified in data_type.	input logical order is nc {input_batch_size, input_size)	cldnn_fully_connected_desc.input (reshape to 2D tensor)
61		weights	input[1]: a 2-D tensor	MPSCNNConvolutionDescriptor.{1, 1, weights[0], weights[1]} MPSCNNConvolutionDataSource.weights	4-D tensor (NCHW) DML_GEMM_OPERATOR_DESC.BTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSFullyConnectedLayerParameters.weights	weights logicl order is oi {num_units, input_size}	cldnn_fully_connected_desc.weights
62		bias	input[2]: a 1-D tensor	the size of 1-D tensor same as outputFeatureChannels MPSCNNConvolutionDataSource.biasTerms	dimensions = {1, 1, output_batch_size, output_num_units} bias_strides = {1, 1, 0, 1} 4-D tensor (NCHW) DML_GEMM_OPERATOR_DESC.CTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSFullyConnectedLayerParameters.bias	1-D {bias_num_units}	cldnn_fully_connected_desc.bias
63		fused activation	input[3]: RELU, RELU1, RELU6	MPSCNNConvolutionDescriptor.neuron MPSCNNNeuronReLU MPSCNNNeuronReLUN	DML_OPERATOR_ACTIVATION_RELU DML_OPERATOR_ELEMENT_WISE_CLIP {-1, 1} DML_OPERATOR_ELEMENT_WISE_CLIP {0, 6}	BNNSFullyConnectedLayerParameters.activation	post_ops::append_eltwise: eltwise_relu eltwise_bounded_relu	cldnn_fully_connected_desc.with_activation [Note]: fused RELU1 and RELU6 are not supported
64		output	output[0]: the output tensor	MPSImageDescriptor.{1, 1, output_size, outputFeatureChannels} destinationImage of MPSNNGraph.encode	4-D tensor (NCHW) DML_GEMM_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSVectorDescriptor: Represents a vector of dimension size. Each vector element is a scalar value, stored using the type specified in data_type	output logical order is nc: {output_batch_size, output_num_units}	cldnn_fully_connected_desc.output_shape
65
66	Resize Bilinear	op	ANEURALNETWORKS_RESIZE_BILINEAR	MPSCNNUpsamplingBilinearNode	DML_OPERATOR_UPSAMPLE_2D + DML_UPSAMPLE_2D_OPERATOR_DESC	vImageVerticalShear and vImageHorizontalShear	dnnl_resampling_forward_desc alg_kind = dnnl_resampling_linear	[Note]: not supproted. (could implement by cldnn_custom_gpu_primitive_desc for custom opencl kernel)
67		input	input[0]: a 4-D tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_UPSAMPLE_2D_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	vImage_Buffer	src_desc of resampling_forward_desc 4-D tensor{N,C,H,W}
68		output height	input[1]: height of output tensor	[Note]: It must be integral multiple for input height	[Note]: It must be integral multiple for input height	vImage_Buffer.height	the height of output tensor
69		output width	input[2]: width of output tensor	[Note]: It must be integral multiple for input width	[Note]: It must be integral multiple for input width	vImage_Buffer.width	the width of output tensor
70		output	output[0]: output tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_UPSAMPLE_2D_OPERATOR_DESC.OutputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	vImage_Buffer	dst_desc of resampling_forward_desc 4D tensor {N,C,W,H}
71
72	Transpose	op	ANEURALNETWORKS_TRANSPOSE
73		input	input[0]: An n-D tensor
74		permutation	An optional 1-D Tensor
75		output	output[0]: 0: A tensor of the same type as input0.
76
77	BatchToSpace	op	ANEURALNETWORKS_BATCH_TO_SPACE_ND
78		input	Input[0]: An n-D tensor to be reshaped Input[2]: An optional boolean scalar
79		block sizes	Input[1]: A 1-D Tensor
80		output	Output[0]: A tensor of the same type as input0.
81
82	Element-wise Maximum	op	ANEURALNETWORKS_MAXIMUM
83		input	Inputs[0]: A tensor. Inputs[1]: A tensor of the same type and compatible dimensions with input0.
84		output	output[0]: A tensor of the same type as input0.
85
86	Tanh	op	ANEURALNETWORKS_TANH
87		input	input[0]: A tensor
88		output	output[0]: A tensor of same shape as input0.
89
90	Argmax	op	ANEURALNETWORKS_ARGMAX	MPSNNReductionFeatureChannelsArgumentMaxNode	DML_OPERATOR_REDUCE+ DML_REDUCE_OPERATOR_DESC			cldnn_arg_max_min_desc
91		input	input[0]: An n-D tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_REDUCE_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}			cldnn_arg_max_min_desc.input
92		axis	input[1]: An int32 scalar	[Note]: only support argmax along depth channel	DML_REDUCE_OPERATOR_DESC.Axis			cldnn_arg_max_min_desc.axis
93		output	output[0]: An (n - 1)-D int32 tensor.	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_REDUCE_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE_UINT32, DML_TENSOR_FLAG_NONE, dimensions, strides}			cldnn_arg_max_min_desc.output
94
95	Sigmoid	op	ANEURALNETWORKS_LOGISTIC	MPSCNNNeuronSigmoidNode	DML_OPERATOR_ACTIVATION_SIGMOID+ DML_ACTIVATION_SIGMOID_OPERATOR_DESC	BNNSFilterCreateVectorActivationLayer BNNSActivation = BNNSActivationSigmod	dnnl_eltwise_forward_desc alg_kind = dnnl_eltwise_logistic	cldnn_activation_desc with activation_desc.activation_func = activation_logistic
96		input	input[0]: A tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_REDUCE_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSVectorDescription{size, data_type, data_scale, data_bias} size represents vector dimension	src_desc 4-D tensor {N,C,H,W}	cldnn_activation_desc.input
97		output	output[0]: The output tensor of same shape as input0.	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_REDUCE_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}	BNNSVectorDescription{size, data_type, data_scale, data_bias} size represents vector dimension	desc_desc 4-D tensor {N,C,H,W}	cldnn_activation_desc.output
98
99	Prelu	op	ANEURALNETWORKS_PRELU	MPSCNNNeuronPReLUNode	DML_OPERATOR_ACTIVATION_PARAMETERIZED_RELU+ DML_ACTIVATION_PARAMETERIZED_RELU_OPERATOR_DESC			cldnn_activation_desc with activation_desc.activation_func = activation_relu_negative_slope
100		input	input[0]: A tensor	MPSImageDescriptor.{numberOfImages, featureChannels, width, height}	4-D tensor (NCHW) DML_REDUCE_OPERATOR_DESC.InputTensor {DML_TENSOR_DATA_TYPE, DML_TENSOR_FLAG_NONE, dimensions, strides}			cldnn_activation_desc.input