`pydantic_ai.settings`

ModelSettings

Bases: TypedDict

Settings to configure an LLM.

Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.

Source code in pydantic_ai_slim/pydantic_ai/settings.py

class ModelSettings(TypedDict, total=False):
    """Settings to configure an LLM.

    Here we include only settings which apply to multiple models / model providers,
    though not all of these settings are supported by all models.
    """

    max_tokens: int
    """The maximum number of tokens to generate before stopping.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    temperature: float
    """Amount of randomness injected into the response.

    Use `temperature` closer to `0.0` for analytical / multiple choice, and closer to a model's
    maximum `temperature` for creative and generative tasks.

    Note that even with `temperature` of `0.0`, the results will not be fully deterministic.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    top_p: float
    """An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

    So 0.1 means only the tokens comprising the top 10% probability mass are considered.

    You should either alter `temperature` or `top_p`, but not both.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    timeout: float | Timeout
    """Override the client-level default timeout for a request, in seconds.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Mistral
    """

    parallel_tool_calls: bool
    """Whether to allow parallel tool calls.

    Supported by:
    * OpenAI
    * Groq
    * Anthropic
    """

max_tokens `instance-attribute`

max_tokens: int

The maximum number of tokens to generate before stopping.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

temperature `instance-attribute`

temperature: float

Amount of randomness injected into the response.

Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model's maximum temperature for creative and generative tasks.

Note that even with temperature of 0.0, the results will not be fully deterministic.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

top_p `instance-attribute`

top_p: float

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

So 0.1 means only the tokens comprising the top 10% probability mass are considered.

You should either alter temperature or top_p, but not both.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

timeout `instance-attribute`

timeout: float | Timeout

Override the client-level default timeout for a request, in seconds.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Mistral

parallel_tool_calls `instance-attribute`

parallel_tool_calls: bool

Whether to allow parallel tool calls.

Supported by: * OpenAI * Groq * Anthropic

pydantic_ai.settings

ModelSettings

max_tokens instance-attribute

temperature instance-attribute

top_p instance-attribute

timeout instance-attribute

parallel_tool_calls instance-attribute

`pydantic_ai.settings`

max_tokens `instance-attribute`

temperature `instance-attribute`

top_p `instance-attribute`

timeout `instance-attribute`

parallel_tool_calls `instance-attribute`