Open Inference Protocol

The Open Inference Protocol is an industry-wide effort to provide a standardized protocol to communicate with different inference servers (e.g. MLServer, Triton, etc.) and orchestrating frameworks (e.g. Seldon Core, KServe, etc.). The spec of the Open Inference Protocol defines both the endpoints and payload schemas for REST and gRPC interfaces.

As part of the Open Inference Protocol definition, you can find dedicated endpoints for:

  • Model controls: Call model inference, interact with your model, and load and unload models dynamically

  • Health: Assess liveness and readiness of your model.

  • Metadata: Query your model metadata (e.g. expected inputs, expected outputs, etc.).

REST

gRPC

ServerLive

Check liveness of the inference server.

rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest)

returns ServerLiveResponse

ServerReady

Check readiness of the inference server.

rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest)

returns ServerReadyResponse

ModelReady

Check readiness of a model in the inference server.

rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest)

returns ModelReadyResponse

ServerMetadata

Get server metadata.

rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest)

returns ServerMetadataResponse

ModelMetadata

Get model metadata.

rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest)

returns ModelMetadataResponse

ModelInfer

Perform inference using a specific model.

rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest)

returns ModelInferResponse

RepositoryIndex

Get the index of model repository contents.

rpc inference.GRPCInferenceService/RepositoryIndex(RepositoryIndexRequest)

returns RepositoryIndexResponse

RepositoryModelLoad

Load or reload a model from a repository.

rpc inference.GRPCInferenceService/RepositoryModelLoad(RepositoryModelLoadRequest)

returns RepositoryModelLoadResponse

RepositoryModelUnload

Unload a model.

rpc inference.GRPCInferenceService/RepositoryModelUnload(RepositoryModelUnloadRequest)

returns RepositoryModelUnloadResponse

Messages

InferParameter

An inference parameter value.

Field

Type

Description

oneof parameter_choice.bool_param

bool

A boolean parameter value.

oneof parameter_choice.int64_param

int64

An int64 parameter value.

oneof parameter_choice.string_param

string

A string parameter value.

InferTensorContents

The data contained in a tensor. For a given data type the tensor contents can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.

Field

Type

Description

bool_contents

repeated bool

Representation for BOOL data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

int_contents

repeated int32

Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

int64_contents

repeated int64

Representation for INT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

uint_contents

repeated uint32

Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

uint64_contents

repeated uint64

Representation for UINT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

fp32_contents

repeated float

Representation for FP32 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

fp64_contents

repeated double

Representation for FP64 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

bytes_contents

repeated bytes

Representation for BYTES data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

ModelInferRequest

ModelInfer messages.

Field

Type

Description

model_name

string

The name of the model to use for inferencing.

model_version

string

The version of the model to use for inference. If not given the server will choose a version based on the model and internal policy.

id

string

Optional identifier for the request. If specified will be returned in the response.

parameters

map ModelInferRequest.ParametersEntry

Optional inference parameters.

inputs

repeated ModelInferRequest.InferInputTensor

The input tensors for the inference.

outputs

repeated ModelInferRequest.InferRequestedOutputTensor

The requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned.

raw_input_contents

repeated bytes

The data contained in an input tensor can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Using the “raw” bytes form will typically allow higher performance due to the way protobuf allocation and reuse interacts with GRPC. For example, see https://github.com/grpc/grpc/issues/23231.

To use the raw representation ‘raw_input_contents’ must be initialized with data for each tensor in the same order as ‘inputs’. For each tensor, the size of this content must match what is expected by the tensor’s shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferInputTensor::contents must not be specified for any input tensor. |

ModelInferRequest.InferInputTensor

An input tensor for an inference request.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape.

parameters

map ModelInferRequest.InferInputTensor.ParametersEntry

Optional inference input tensor parameters.

contents

InferTensorContents

The input tensor data. This field must not be specified if tensor contents are being specified in ModelInferRequest.raw_input_contents.

ModelInferRequest.InferInputTensor.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferRequest.InferRequestedOutputTensor

An output tensor requested for an inference request.

Field

Type

Description

name

string

The tensor name.

parameters

map ModelInferRequest.InferRequestedOutputTensor.ParametersEntry

Optional requested output tensor parameters.

ModelInferRequest.InferRequestedOutputTensor.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferRequest.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferResponse

Field

Type

Description

model_name

string

The name of the model used for inference.

model_version

string

The version of the model used for inference.

id

string

The id of the inference request if one was specified.

parameters

map ModelInferResponse.ParametersEntry

Optional inference response parameters.

outputs

repeated ModelInferResponse.InferOutputTensor

The output tensors holding inference results.

raw_output_contents

repeated bytes

The data contained in an output tensor can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Using the “raw” bytes form will typically allow higher performance due to the way protobuf allocation and reuse interacts with GRPC. For example, see https://github.com/grpc/grpc/issues/23231.

To use the raw representation ‘raw_output_contents’ must be initialized with data for each tensor in the same order as ‘outputs’. For each tensor, the size of this content must match what is expected by the tensor’s shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferOutputTensor::contents must not be specified for any output tensor. |

ModelInferResponse.InferOutputTensor

An output tensor returned for an inference request.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape.

parameters

map ModelInferResponse.InferOutputTensor.ParametersEntry

Optional output tensor parameters.

contents

InferTensorContents

The output tensor data. This field must not be specified if tensor contents are being specified in ModelInferResponse.raw_output_contents.

ModelInferResponse.InferOutputTensor.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferResponse.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelMetadataRequest

ModelMetadata messages.

Field

Type

Description

name

string

The name of the model.

version

string

The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelMetadataResponse

Field

Type

Description

name

string

The model name.

versions

repeated string

The versions of the model available on the server.

platform

string

The model’s platform. See Platforms.

inputs

repeated ModelMetadataResponse.TensorMetadata

The model’s inputs.

outputs

repeated ModelMetadataResponse.TensorMetadata

The model’s outputs.

parameters

map ModelMetadataResponse.ParametersEntry

Optional default parameters for the request / response. NOTE: This is an extension to the standard

ModelMetadataResponse.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelMetadataResponse.TensorMetadata

Metadata for a tensor.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape. A variable-size dimension is represented by a -1 value.

parameters

map ModelMetadataResponse.TensorMetadata.ParametersEntry

Optional default parameters for input. NOTE: This is an extension to the standard

ModelMetadataResponse.TensorMetadata.ParametersEntry

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelReadyRequest

ModelReady messages.

Field

Type

Description

name

string

The name of the model to check for readiness.

version

string

The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelReadyResponse

Field

Type

Description

ready

bool

True if the model is ready, false if not ready.

ModelRepositoryParameter

An model repository parameter value.

Field

Type

Description

oneof parameter_choice.bool_param

bool

A boolean parameter value.

oneof parameter_choice.int64_param

int64

An int64 parameter value.

oneof parameter_choice.string_param

string

A string parameter value.

oneof parameter_choice.bytes_param

bytes

A bytes parameter value.

RepositoryIndexRequest

Field

Type

Description

repository_name

string

The name of the repository. If empty the index is returned for all repositories.

ready

bool

If true return only models currently ready for inferencing.

RepositoryIndexResponse

Field

Type

Description

models

repeated RepositoryIndexResponse.ModelIndex

An index entry for each model.

RepositoryIndexResponse.ModelIndex

Index entry for a model.

Field

Type

Description

name

string

The name of the model.

version

string

The version of the model.

state

string

The state of the model.

reason

string

The reason, if any, that the model is in the given state.

RepositoryModelLoadRequest

Field

Type

Description

repository_name

string

The name of the repository to load from. If empty the model is loaded from any repository.

model_name

string

The name of the model to load, or reload.

parameters

map RepositoryModelLoadRequest.ParametersEntry

Optional model repository request parameters.

RepositoryModelLoadRequest.ParametersEntry

Field

Type

Description

key

string

N/A

value

ModelRepositoryParameter

N/A

RepositoryModelLoadResponse

RepositoryModelUnloadRequest

Field

Type

Description

repository_name

string

The name of the repository from which the model was originally loaded. If empty the repository is not considered.

model_name

string

The name of the model to unload.

parameters

map RepositoryModelUnloadRequest.ParametersEntry

Optional model repository request parameters.

RepositoryModelUnloadRequest.ParametersEntry

Field

Type

Description

key

string

N/A

value

ModelRepositoryParameter

N/A

RepositoryModelUnloadResponse

ServerLiveRequest

ServerLive messages.

ServerLiveResponse

Field

Type

Description

live

bool

True if the inference server is live, false if not live.

ServerMetadataRequest

ServerMetadata messages.

ServerMetadataResponse

Field

Type

Description

name

string

The server name.

version

string

The server version.

extensions

repeated string

The extensions supported by the server.

ServerReadyRequest

ServerReady messages.

ServerReadyResponse

Field

Type

Description

ready

bool

True if the inference server is ready, false if not ready.

Scalar Value Types

double

Notes

C++ Type

Java Type

Python Type

double

double

float

float

Notes

C++ Type

Java Type

Python Type

float

float

float

int32

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int32

int

int

int64

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

int64

long

int/long

uint32

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding.

uint32

int

int/long

uint64

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding.

uint64

long

int/long

sint32

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

int32

int

int

sint64

Notes

C++ Type

Java Type

Python Type

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

int64

long

int/long

fixed32

Notes

C++ Type

Java Type

Python Type

Always four bytes. More efficient than uint32 if values are often greater than 2^28.

uint32

int

int

fixed64

Notes

C++ Type

Java Type

Python Type

Always eight bytes. More efficient than uint64 if values are often greater than 2^56.

uint64

long

int/long

sfixed32

Notes

C++ Type

Java Type

Python Type

Always four bytes.

int32

int

int

sfixed64

Notes

C++ Type

Java Type

Python Type

Always eight bytes.

int64

long

int/long

bool

Notes

C++ Type

Java Type

Python Type

bool

boolean

boolean

string

Notes

C++ Type

Java Type

Python Type

A string must always contain UTF-8 encoded or 7-bit ASCII text.

string

String

str/unicode

bytes

Notes

C++ Type

Java Type

Python Type

May contain any arbitrary sequence of bytes.

string

ByteString

str