HuggingFace Server¶

Thanks to our collaboration with the HuggingFace team you can now easily deploy your models from the HuggingFace Hub with Seldon Core.

We also support the high performance optimizations provided by the Transformer Optimum framework.

Pipeline parameters¶

The parameters that are available for you to configure include:

Name	Description
`task`	The transformer pipeline task
`pretrained_model`	The name of the pretrained model in the Hub
`pretrained_tokenizer`	Transformer name in Hub if different to the one provided with model
`optimum_model`	Boolean to enable loading model with Optimum framework

Simple Example¶

You can deploy a HuggingFace model by providing parameters to your pipeline.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
    name: default
    replicas: 1

Quantized & Optimized Models with Optimum¶

You can deploy a HuggingFace model loaded using the Optimum library by using the optimum_model parameter.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
      - name: optimum_model
        type: BOOL
        value: true
    name: default
    replicas: 1

Custom Model Example¶

You can deploy a custom HuggingFace model by providing the location of the model artefacts using the modelUri field.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: custom-tiny-stories-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      modelUri: gs://seldon-models/v1.18.0/huggingface/text-gen-custom-tiny-stories
      parameters:
      - name: task
        type: STRING
        value: text-generation
    name: default
    replicas: 1

Note

As a next step, why not try running a larger-scale model? You can find one in gs://seldon-models/v1.18.0/huggingface/text-gen-custom-gpt2. However, you may need to request more memory!