Basics

Empirical can test how different models and model configurations work for your application. You can define which models and configurations to test in the configuration file.

Empirical supports two types of model providers:

model: API calls to off-the-shelf LLMs, like OpenAI’s GPT4
py-script: Custom models or applications defined by a Python module

To configure a custom model with Python, see the Python guide.

The rest of this doc focuses on the model type.

Run configuration for LLMs

To test an LLM, specify the following properties in the configuration:

provider: Name of the inference provider (e.g. openai, or other supported providers)
model: Name of the model (e.g. gpt-3.5-turbo or claude-3-haiku)
prompt: Prompt sent to the model, with optional placeholders
name [optional]: A name or label for this run (auto-generated if not specified)

You can configure as many model providers as you like. These models will be shown in a side-by-side comparison view in the web reporter.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}"
  }
]

Placeholders

Define placeholders in the prompt with Handlebars syntax (like {{user_name}}) to inject values from the dataset sample. These placeholders will be replaced with the corresponding input value during execution.

See dataset to learn more about sample inputs.

Supported providers

Provider	Description
`openai`	All chat models are supported. Requires `OPENAI_API_KEY` environment variable.
`anthropic`	Claude 3 models are supported. Requires `ANTHROPIC_API_KEY` environment variable.
`mistral`	All chat models are supported. Requires `MISTRAL_API_KEY` environment variable.
`google`	Gemini Pro models are supported. Requires `GOOGLE_API_KEY` environment variable.
`fireworks`	Models hosted on Fireworks (e.g. `dbrx-instruct`) are supported. Requires `FIREWORKS_API_KEY` environment variable.

Environment variables

API calls to model providers require API keys, which are stored as environment variables. The CLI can work with:

Existing environment variables (using process.env)
Environment variables defined in .env or .env.local files, in the current working directory
- For .env files that are located elsewhere, you can pass the --env-file flag

npx @empiricalrun/cli --env-file <PATH_TO_ENV_FILE>

Model parameters

To override parameters like temperature or max_tokens, you can pass parameters alongwith the provider configuration. All OpenAI parameters (see their API reference) are supported, except for a few limitations.

For non-OpenAI models, we coerce these parameters to the most appropriate target parameter (e.g. stop in OpenAI becomes stop_sequences for Anthropic.)

You can add other parameters or override this behavior with passthrough.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1
    }
  }
]

Passthrough

If your models rely on other parameters, you can still specify them in the configuration. These parameters will be passed as-is to the model.

For example, Mistral models support a safePrompt parameter for guardrailing.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "mistral",
    "model": "mistral-tiny",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1,
      "safePrompt": true
    }
  }
]

Configuring request timeout

You can set the timeout duration in milliseconds under model parameters in the empiricalrc.json file. This might be required for prompt completions that are expected to take more time, for example while running models like Claude Opus. If no specific value is assigned, the default timeout duration of 30 seconds will be applied.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "anthropic",
    "model": "claude-3-opus",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "timeout": 10000
    }
  }
]

Limitations

These parameters are not supported today: logit_bias, tools, tool_choice, user, stream
The Google Gemini model provider does not support any configuration parameter

If any of these limitations are blocking your use of Empirical, please file a feature request.

Get started

Model providers

Test dataset

Scoring outputs

Reporter

Run configuration for LLMs

Placeholders

Supported providers

Get API key

Supported models

Environment variables

Model parameters

Passthrough

Configuring request timeout

Limitations

Get started

Model providers

Test dataset

Scoring outputs

Reporter

​Run configuration for LLMs

​Placeholders

​Supported providers

Get API key

Supported models

​Environment variables

​Model parameters

​Passthrough

​Configuring request timeout

​Limitations

Run configuration for LLMs

Placeholders

Supported providers

Environment variables

Model parameters

Passthrough

Configuring request timeout

Limitations