# HuggingFace Server

Thanks to our collaboration with the HuggingFace team you can now easily deploy your models from the [HuggingFace Hub](https://huggingface.co/models) with Seldon Core.

We also support the high performance optimizations provided by the [Transformer Optimum framework](https://huggingface.co/docs/optimum/index).

## Pipeline parameters

The parameters that are available for you to configure include:

| Name                   | Description                                                         |
|------------------------|---------------------------------------------------------------------|
| `task`                 | The transformer pipeline task                                       |
| `pretrained_model`     | The name of the pretrained model in the Hub                         |
| `pretrained_tokenizer` | Transformer name in Hub if different to the one provided with model |
| `optimum_model`        | Boolean to enable loading model with Optimum framework              |

## Simple Example

You can deploy a HuggingFace model by providing parameters to your [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines).

```yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
    name: default
    replicas: 1
```

## Quantized & Optimized Models with Optimum

You can deploy a HuggingFace model loaded using the Optimum library by using the `optimum_model` parameter.

```yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
      - name: optimum_model
        type: BOOL
        value: true
    name: default
    replicas: 1
```

## Custom Model Example

You can deploy a custom HuggingFace model by providing the location of the model artefacts using the `modelUri` field.

```yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: custom-tiny-stories-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      modelUri: gs://seldon-models/v1.18.0/huggingface/text-gen-custom-tiny-stories
      parameters:
      - name: task
        type: STRING
        value: text-generation
    name: default
    replicas: 1
```

.. note::
  As a next step, why not try running a larger-scale model? You can find one in gs://seldon-models/v1.18.0/huggingface/text-gen-custom-gpt2. However, you may need to request more memory!