V2 Inference Protocol¶

The V2 Inference Protocol is an industry-wide effort to provide an standardised protocol to communicate with different inference servers (e.g. MLServer, Triton, etc.) and orchestrating frameworks (e.g. Seldon Core, KServe, etc.). The spec of the V2 Inference Protocol defines both the endpoints and payload schemas for REST and gRPC interfaces.

As part of the V2 Protocol definition, you can find dedicated endpoints for:

  • Health endpoints, to assess liveness and readiness of your model.

  • Inference endpoints, to interact with your model.

  • Metadata endpoints, to query your model metadata (e.g. expected inputs, expected outputs, etc.).

  • Model repository endpoints, to load and unload models dynamically.

REST¶

gRPC¶

ServerLive¶

Check liveness of the inference server.

rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest)

returns ServerLiveResponse

ServerReady¶

Check readiness of the inference server.

rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest)

returns ServerReadyResponse

ModelReady¶

Check readiness of a model in the inference server.

rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest)

returns ModelReadyResponse

ServerMetadata¶

Get server metadata.

rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest)

returns ServerMetadataResponse

ModelMetadata¶

Get model metadata.

rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest)

returns ModelMetadataResponse

ModelInfer¶

Perform inference using a specific model.

rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest)

returns ModelInferResponse

RepositoryIndex¶

Get the index of model repository contents.

rpc inference.GRPCInferenceService/RepositoryIndex(RepositoryIndexRequest)

returns RepositoryIndexResponse

RepositoryModelLoad¶

Load or reload a model from a repository.

rpc inference.GRPCInferenceService/RepositoryModelLoad(RepositoryModelLoadRequest)

returns RepositoryModelLoadResponse

RepositoryModelUnload¶

Unload a model.

rpc inference.GRPCInferenceService/RepositoryModelUnload(RepositoryModelUnloadRequest)

returns RepositoryModelUnloadResponse


Messages¶

InferParameter¶

An inference parameter value.

Field

Type

Description

oneof parameter_choice.bool_param

bool

A boolean parameter value.

oneof parameter_choice.int64_param

int64

An int64 parameter value.

oneof parameter_choice.string_param

string

A string parameter value.

InferTensorContents¶

The data contained in a tensor. For a given data type the tensor contents can be represented in ‚Äúraw‚ÄĚ bytes form or in the repeated type that matches the tensor‚Äôs data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.

Field

Type

Description

bool_contents

repeated bool

Representation for BOOL data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

int_contents

repeated int32

Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

int64_contents

repeated int64

Representation for INT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

uint_contents

repeated uint32

Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

uint64_contents

repeated uint64

Representation for UINT64 data types. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

fp32_contents

repeated float

Representation for FP32 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

fp64_contents

repeated double

Representation for FP64 data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

bytes_contents

repeated bytes

Representation for BYTES data type. The size must match what is expected by the tensor’s shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

ModelInferRequest¶

ModelInfer messages.

Field

Type

Description

model_name

string

The name of the model to use for inferencing.

model_version

string

The version of the model to use for inference. If not given the server will choose a version based on the model and internal policy.

id

string

Optional identifier for the request. If specified will be returned in the response.

parameters

map ModelInferRequest.ParametersEntry

Optional inference parameters.

inputs

repeated ModelInferRequest.InferInputTensor

The input tensors for the inference.

outputs

repeated ModelInferRequest.InferRequestedOutputTensor

The requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned.

ModelInferRequest.InferInputTensor¶

An input tensor for an inference request.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape.

parameters

map ModelInferRequest.InferInputTensor.ParametersEntry

Optional inference input tensor parameters.

contents

InferTensorContents

The input tensor data.

ModelInferRequest.InferInputTensor.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferRequest.InferRequestedOutputTensor¶

An output tensor requested for an inference request.

Field

Type

Description

name

string

The tensor name.

parameters

map ModelInferRequest.InferRequestedOutputTensor.ParametersEntry

Optional requested output tensor parameters.

ModelInferRequest.InferRequestedOutputTensor.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferRequest.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferResponse¶

Field

Type

Description

model_name

string

The name of the model used for inference.

model_version

string

The version of the model used for inference.

id

string

The id of the inference request if one was specified.

parameters

map ModelInferResponse.ParametersEntry

Optional inference response parameters.

outputs

repeated ModelInferResponse.InferOutputTensor

The output tensors holding inference results.

ModelInferResponse.InferOutputTensor¶

An output tensor returned for an inference request.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape.

parameters

map ModelInferResponse.InferOutputTensor.ParametersEntry

Optional output tensor parameters.

contents

InferTensorContents

The output tensor data.

ModelInferResponse.InferOutputTensor.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelInferResponse.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelMetadataRequest¶

ModelMetadata messages.

Field

Type

Description

name

string

The name of the model.

version

string

The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelMetadataResponse¶

Field

Type

Description

name

string

The model name.

versions

repeated string

The versions of the model available on the server.

platform

string

The model’s platform. See Platforms.

inputs

repeated ModelMetadataResponse.TensorMetadata

The model’s inputs.

outputs

repeated ModelMetadataResponse.TensorMetadata

The model’s outputs.

parameters

map ModelMetadataResponse.ParametersEntry

Optional default parameters for the request / response. NOTE: This is an extension to the standard

ModelMetadataResponse.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelMetadataResponse.TensorMetadata¶

Metadata for a tensor.

Field

Type

Description

name

string

The tensor name.

datatype

string

The tensor data type.

shape

repeated int64

The tensor shape. A variable-size dimension is represented by a -1 value.

parameters

map ModelMetadataResponse.TensorMetadata.ParametersEntry

Optional default parameters for input. NOTE: This is an extension to the standard

ModelMetadataResponse.TensorMetadata.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

InferParameter

N/A

ModelReadyRequest¶

ModelReady messages.

Field

Type

Description

name

string

The name of the model to check for readiness.

version

string

The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelReadyResponse¶

Field

Type

Description

ready

bool

True if the model is ready, false if not ready.

ModelRepositoryParameter¶

An model repository parameter value.

Field

Type

Description

oneof parameter_choice.bool_param

bool

A boolean parameter value.

oneof parameter_choice.int64_param

int64

An int64 parameter value.

oneof parameter_choice.string_param

string

A string parameter value.

oneof parameter_choice.bytes_param

bytes

A bytes parameter value.

RepositoryIndexRequest¶

Field

Type

Description

repository_name

string

The name of the repository. If empty the index is returned for all repositories.

ready

bool

If true return only models currently ready for inferencing.

RepositoryIndexResponse¶

Field

Type

Description

models

repeated RepositoryIndexResponse.ModelIndex

An index entry for each model.

RepositoryIndexResponse.ModelIndex¶

Index entry for a model.

Field

Type

Description

name

string

The name of the model.

version

string

The version of the model.

state

string

The state of the model.

reason

string

The reason, if any, that the model is in the given state.

RepositoryModelLoadRequest¶

Field

Type

Description

repository_name

string

The name of the repository to load from. If empty the model is loaded from any repository.

model_name

string

The name of the model to load, or reload.

parameters

map RepositoryModelLoadRequest.ParametersEntry

Optional model repository request parameters.

RepositoryModelLoadRequest.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

ModelRepositoryParameter

N/A

RepositoryModelLoadResponse¶

RepositoryModelUnloadRequest¶

Field

Type

Description

repository_name

string

The name of the repository from which the model was originally loaded. If empty the repository is not considered.

model_name

string

The name of the model to unload.

parameters

map RepositoryModelUnloadRequest.ParametersEntry

Optional model repository request parameters.

RepositoryModelUnloadRequest.ParametersEntry¶

Field

Type

Description

key

string

N/A

value

ModelRepositoryParameter

N/A

RepositoryModelUnloadResponse¶

ServerLiveRequest¶

ServerLive messages.

ServerLiveResponse¶

Field

Type

Description

live

bool

True if the inference server is live, false if not live.

ServerMetadataRequest¶

ServerMetadata messages.

ServerMetadataResponse¶

Field

Type

Description

name

string

The server name.

version

string

The server version.

extensions

repeated string

The extensions supported by the server.

ServerReadyRequest¶

ServerReady messages.

ServerReadyResponse¶

Field

Type

Description

ready

bool

True if the inference server is ready, false if not ready.