Skip to content

Completion

Completion Types

Data models and types for completion operations.

any_llm.types.completion

CompletionParams

Bases: BaseModel

Normalized parameters for chat completions.

This model is used internally to pass structured parameters from the public API layer to provider implementations, avoiding very long function signatures while keeping type safety.

Source code in src/any_llm/types/completion.py
class CompletionParams(BaseModel):
    """Normalized parameters for chat completions.

    This model is used internally to pass structured parameters from the public
    API layer to provider implementations, avoiding very long function
    signatures while keeping type safety.
    """

    model_config = ConfigDict(extra="forbid")

    model_id: str
    """Model identifier (e.g., 'mistral-small-latest')"""

    messages: list[dict[str, Any]]
    """List of messages for the conversation"""

    @field_validator("messages")
    def check_messages_not_empty(cls, v: list[dict[str, Any]]) -> list[dict[str, Any]]:  # noqa: N805
        if not v:
            msg = "The `messages` list cannot be empty."
            raise ValueError(msg)
        return v

    tools: list[dict[str, Any]] | None = None
    """List of tools for tool calling. Should be converted to OpenAI tool format dicts"""

    tool_choice: str | dict[str, Any] | None = None
    """Controls which tools the model can call"""

    temperature: float | None = None
    """Controls randomness in the response (0.0 to 2.0)"""

    top_p: float | None = None
    """Controls diversity via nucleus sampling (0.0 to 1.0)"""

    max_tokens: int | None = None
    """Maximum number of tokens to generate"""

    response_format: dict[str, Any] | type[BaseModel] | None = None
    """Format specification for the response"""

    stream: bool | None = None
    """Whether to stream the response"""

    n: int | None = None
    """Number of completions to generate"""

    stop: str | list[str] | None = None
    """Stop sequences for generation"""

    presence_penalty: float | None = None
    """Penalize new tokens based on presence in text"""

    frequency_penalty: float | None = None
    """Penalize new tokens based on frequency in text"""

    seed: int | None = None
    """Random seed for reproducible results"""

    user: str | None = None
    """Unique identifier for the end user"""

    parallel_tool_calls: bool | None = None
    """Whether to allow parallel tool calls"""

    logprobs: bool | None = None
    """Include token-level log probabilities in the response"""

    top_logprobs: int | None = None
    """Number of top alternatives to return when logprobs are requested"""

    logit_bias: dict[str, float] | None = None
    """Bias the likelihood of specified tokens during generation"""

    stream_options: dict[str, Any] | None = None
    """Additional options controlling streaming behavior"""

    max_completion_tokens: int | None = None
    """Maximum number of tokens for the completion (provider-dependent)"""

    reasoning_effort: Literal["minimal", "low", "medium", "high", "auto"] | None = "auto"
    """Reasoning effort level for models that support it. "auto" will map to each provider's default."""
frequency_penalty = None class-attribute instance-attribute

Penalize new tokens based on frequency in text

logit_bias = None class-attribute instance-attribute

Bias the likelihood of specified tokens during generation

logprobs = None class-attribute instance-attribute

Include token-level log probabilities in the response

max_completion_tokens = None class-attribute instance-attribute

Maximum number of tokens for the completion (provider-dependent)

max_tokens = None class-attribute instance-attribute

Maximum number of tokens to generate

messages instance-attribute

List of messages for the conversation

model_id instance-attribute

Model identifier (e.g., 'mistral-small-latest')

n = None class-attribute instance-attribute

Number of completions to generate

parallel_tool_calls = None class-attribute instance-attribute

Whether to allow parallel tool calls

presence_penalty = None class-attribute instance-attribute

Penalize new tokens based on presence in text

reasoning_effort = 'auto' class-attribute instance-attribute

Reasoning effort level for models that support it. "auto" will map to each provider's default.

response_format = None class-attribute instance-attribute

Format specification for the response

seed = None class-attribute instance-attribute

Random seed for reproducible results

stop = None class-attribute instance-attribute

Stop sequences for generation

stream = None class-attribute instance-attribute

Whether to stream the response

stream_options = None class-attribute instance-attribute

Additional options controlling streaming behavior

temperature = None class-attribute instance-attribute

Controls randomness in the response (0.0 to 2.0)

tool_choice = None class-attribute instance-attribute

Controls which tools the model can call

tools = None class-attribute instance-attribute

List of tools for tool calling. Should be converted to OpenAI tool format dicts

top_logprobs = None class-attribute instance-attribute

Number of top alternatives to return when logprobs are requested

top_p = None class-attribute instance-attribute

Controls diversity via nucleus sampling (0.0 to 1.0)

user = None class-attribute instance-attribute

Unique identifier for the end user