Open Inference Protocol¶
The Open Inference Protocol is an industry-wide effort to provide a standardized protocol to communicate with different inference servers (e.g. MLServer, Triton, etc.) and orchestrating frameworks (e.g. Seldon Core, KServe, etc.). The spec of the Open Inference Protocol defines both the endpoints and payload schemas for REST and gRPC interfaces.
As part of the Open Inference Protocol definition, you can find dedicated endpoints for:
Model controls: Call model inference, interact with your model, and load and unload models dynamically
Health: Assess liveness and readiness of your model.
Metadata: Query your model metadata (e.g. expected inputs, expected outputs, etc.).
REST¶
gRPC¶
ServerLive¶
Check liveness of the inference server.
- rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest)
returns ServerLiveResponse
ServerReady¶
Check readiness of the inference server.
- rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest)
returns ServerReadyResponse
ModelReady¶
Check readiness of a model in the inference server.
- rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest)
returns ModelReadyResponse
ServerMetadata¶
Get server metadata.
- rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest)
returns ServerMetadataResponse
ModelMetadata¶
Get model metadata.
- rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest)
returns ModelMetadataResponse
ModelInfer¶
Perform inference using a specific model.
- rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest)
returns ModelInferResponse
RepositoryIndex¶
Get the index of model repository contents.
- rpc inference.GRPCInferenceService/RepositoryIndex(RepositoryIndexRequest)
returns RepositoryIndexResponse
RepositoryModelLoad¶
Load or reload a model from a repository.
- rpc inference.GRPCInferenceService/RepositoryModelLoad(RepositoryModelLoadRequest)
returns RepositoryModelLoadResponse
RepositoryModelUnload¶
Unload a model.
- rpc inference.GRPCInferenceService/RepositoryModelUnload(RepositoryModelUnloadRequest)
returns RepositoryModelUnloadResponse
Messages¶
InferParameter¶
An inference parameter value.
Field |
Type |
Description |
---|---|---|
oneof parameter_choice.bool_param |
A boolean parameter value. |
|
oneof parameter_choice.int64_param |
An int64 parameter value. |
|
oneof parameter_choice.string_param |
A string parameter value. |
InferTensorContents¶
The data contained in a tensor. For a given data type the tensor contents can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.
Field |
Type |
Description |
---|---|---|
bool_contents |
Representation for BOOL data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
int_contents |
Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
int64_contents |
Representation for INT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
uint_contents |
Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
uint64_contents |
Representation for UINT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
fp32_contents |
Representation for FP32 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
fp64_contents |
Representation for FP64 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
|
bytes_contents |
Representation for BYTES data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements. |
ModelInferRequest¶
ModelInfer messages.
Field |
Type |
Description |
---|---|---|
model_name |
The name of the model to use for inferencing. |
|
model_version |
The version of the model to use for inference. If not given the server will choose a version based on the model and internal policy. |
|
id |
Optional identifier for the request. If specified will be returned in the response. |
|
parameters |
Optional inference parameters. |
|
inputs |
The input tensors for the inference. |
|
outputs |
The requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned. |
|
raw_input_contents |
The data contained in an input tensor can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Using the “raw” bytes form will typically allow higher performance due to the way protobuf allocation and reuse interacts with GRPC. For example, see https://github.com/grpc/grpc/issues/23231. |
To use the raw representation ‘raw_input_contents’ must be initialized with data for each tensor in the same order as ‘inputs’. For each tensor, the size of this content must match what is expected by the tensor’s shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.
If this field is specified then InferInputTensor::contents must not be specified for any input tensor. |
ModelInferRequest.InferInputTensor¶
An input tensor for an inference request.
Field |
Type |
Description |
---|---|---|
name |
The tensor name. |
|
datatype |
The tensor data type. |
|
shape |
The tensor shape. |
|
parameters |
Optional inference input tensor parameters. |
|
contents |
The input tensor data. This field must not be specified if tensor contents are being specified in ModelInferRequest.raw_input_contents. |
ModelInferRequest.InferInputTensor.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelInferRequest.InferRequestedOutputTensor¶
An output tensor requested for an inference request.
Field |
Type |
Description |
---|---|---|
name |
The tensor name. |
|
parameters |
map ModelInferRequest.InferRequestedOutputTensor.ParametersEntry |
Optional requested output tensor parameters. |
ModelInferRequest.InferRequestedOutputTensor.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelInferRequest.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelInferResponse¶
Field |
Type |
Description |
---|---|---|
model_name |
The name of the model used for inference. |
|
model_version |
The version of the model used for inference. |
|
id |
The id of the inference request if one was specified. |
|
parameters |
Optional inference response parameters. |
|
outputs |
The output tensors holding inference results. |
|
raw_output_contents |
The data contained in an output tensor can be represented in “raw” bytes form or in the repeated type that matches the tensor’s data type. Using the “raw” bytes form will typically allow higher performance due to the way protobuf allocation and reuse interacts with GRPC. For example, see https://github.com/grpc/grpc/issues/23231. |
To use the raw representation ‘raw_output_contents’ must be initialized with data for each tensor in the same order as ‘outputs’. For each tensor, the size of this content must match what is expected by the tensor’s shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.
If this field is specified then InferOutputTensor::contents must not be specified for any output tensor. |
ModelInferResponse.InferOutputTensor¶
An output tensor returned for an inference request.
Field |
Type |
Description |
---|---|---|
name |
The tensor name. |
|
datatype |
The tensor data type. |
|
shape |
The tensor shape. |
|
parameters |
Optional output tensor parameters. |
|
contents |
The output tensor data. This field must not be specified if tensor contents are being specified in ModelInferResponse.raw_output_contents. |
ModelInferResponse.InferOutputTensor.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelInferResponse.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelMetadataRequest¶
ModelMetadata messages.
Field |
Type |
Description |
---|---|---|
name |
The name of the model. |
|
version |
The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy. |
ModelMetadataResponse¶
Field |
Type |
Description |
---|---|---|
name |
The model name. |
|
versions |
The versions of the model available on the server. |
|
platform |
The model’s platform. See Platforms. |
|
inputs |
The model’s inputs. |
|
outputs |
The model’s outputs. |
|
parameters |
Optional default parameters for the request / response. NOTE: This is an extension to the standard |
ModelMetadataResponse.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelMetadataResponse.TensorMetadata¶
Metadata for a tensor.
Field |
Type |
Description |
---|---|---|
name |
The tensor name. |
|
datatype |
The tensor data type. |
|
shape |
The tensor shape. A variable-size dimension is represented by a -1 value. |
|
parameters |
Optional default parameters for input. NOTE: This is an extension to the standard |
ModelMetadataResponse.TensorMetadata.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
ModelReadyRequest¶
ModelReady messages.
Field |
Type |
Description |
---|---|---|
name |
The name of the model to check for readiness. |
|
version |
The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy. |
ModelReadyResponse¶
Field |
Type |
Description |
---|---|---|
ready |
True if the model is ready, false if not ready. |
ModelRepositoryParameter¶
An model repository parameter value.
Field |
Type |
Description |
---|---|---|
oneof parameter_choice.bool_param |
A boolean parameter value. |
|
oneof parameter_choice.int64_param |
An int64 parameter value. |
|
oneof parameter_choice.string_param |
A string parameter value. |
|
oneof parameter_choice.bytes_param |
A bytes parameter value. |
RepositoryIndexRequest¶
Field |
Type |
Description |
---|---|---|
repository_name |
The name of the repository. If empty the index is returned for all repositories. |
|
ready |
If true return only models currently ready for inferencing. |
RepositoryIndexResponse¶
Field |
Type |
Description |
---|---|---|
models |
An index entry for each model. |
RepositoryIndexResponse.ModelIndex¶
Index entry for a model.
Field |
Type |
Description |
---|---|---|
name |
The name of the model. |
|
version |
The version of the model. |
|
state |
The state of the model. |
|
reason |
The reason, if any, that the model is in the given state. |
RepositoryModelLoadRequest¶
Field |
Type |
Description |
---|---|---|
repository_name |
The name of the repository to load from. If empty the model is loaded from any repository. |
|
model_name |
The name of the model to load, or reload. |
|
parameters |
Optional model repository request parameters. |
RepositoryModelLoadRequest.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
RepositoryModelLoadResponse¶
RepositoryModelUnloadRequest¶
Field |
Type |
Description |
---|---|---|
repository_name |
The name of the repository from which the model was originally loaded. If empty the repository is not considered. |
|
model_name |
The name of the model to unload. |
|
parameters |
Optional model repository request parameters. |
RepositoryModelUnloadRequest.ParametersEntry¶
Field |
Type |
Description |
---|---|---|
key |
N/A |
|
value |
N/A |
RepositoryModelUnloadResponse¶
ServerLiveRequest¶
ServerLive messages.
ServerLiveResponse¶
Field |
Type |
Description |
---|---|---|
live |
True if the inference server is live, false if not live. |
ServerMetadataRequest¶
ServerMetadata messages.
ServerMetadataResponse¶
Field |
Type |
Description |
---|---|---|
name |
The server name. |
|
version |
The server version. |
|
extensions |
The extensions supported by the server. |
ServerReadyRequest¶
ServerReady messages.
ServerReadyResponse¶
Field |
Type |
Description |
---|---|---|
ready |
True if the inference server is ready, false if not ready. |
Scalar Value Types¶
double¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
double |
double |
float |
float¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
float |
float |
float |
int32¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int64¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long |
uint32¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. |
uint32 |
int |
int/long |
uint64¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. |
uint64 |
long |
int/long |
sint32¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
sint64¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long |
fixed32¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Always four bytes. More efficient than uint32 if values are often greater than 2^28. |
uint32 |
int |
int |
fixed64¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Always eight bytes. More efficient than uint64 if values are often greater than 2^56. |
uint64 |
long |
int/long |
sfixed32¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Always four bytes. |
int32 |
int |
int |
sfixed64¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
Always eight bytes. |
int64 |
long |
int/long |
bool¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
bool |
boolean |
boolean |
string¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
str/unicode |
bytes¶
Notes |
C++ Type |
Java Type |
Python Type |
---|---|---|---|
May contain any arbitrary sequence of bytes. |
string |
ByteString |
str |