aleph_alpha_client package

Module contents

class aleph_alpha_client.AsyncClient(token: str, host: str, limit: int = 100, hosting: str | None = None, request_timeout_seconds: int = 305, total_retries: int = 8, nice: bool = False, verify_ssl=True, tags: Sequence[str] | None = None)[source]

Bases: object

Construct a context object for asynchronous requests given a user token

Parameters:

token (string, required):

The API token that will be used for authentication.

host (string, required):

The hostname of the API host.

hosting(string, optional, default None):

Determines in which datacenters the request may be processed. You can either set the parameter to “aleph-alpha” or omit it (defaulting to None).

Not setting this value, or setting it to None, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to “aleph-alpha” allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

request_timeout_seconds (int, optional, default 305):

Client timeout that will be set for HTTP requests in the aiohttp library’s API calls. Server will close all requests after 300 seconds with an internal server error.

total_retries(int, optional, default 8)

The number of retries made in case requests fail with certain retryable status codes. If the last retry fails a corresponding exception is raised. Note, that between retries an exponential backoff is applied, starting with 0.25 s after the first request and doubling for each retry made. So with the default setting of 8 retries a total wait time of 63.75 s is added between the retries.

nice(bool, required, default False):

Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

verify_ssl(bool, optional, default True)

Setting this to False will disable checking for SSL when doing requests.

tags(Optional[Sequence[str]], optional, default None)

Internal feature.

Example usage:

>>> request = CompletionRequest(prompt=Prompt.from_text(f"Request"), maximum_tokens=64)
>>> async with AsyncClient(
        token=os.environ["TEST_TOKEN"],
        host=os.environ["TEST_API_URL"],
    ) as client:
        response: CompletionResponse = await client.complete(request, "pharia-1-llm-7b-control")

SSE_DATA_PREFIX = 'data: '

async batch_semantic_embed(request: BatchSemanticEmbeddingRequest, model: str | None = None, num_concurrent_requests: int = 1, batch_size: int = 100, progress_bar: bool = False) → BatchSemanticEmbeddingResponse[source]

Embeds a sequence of texts or images and returns vectors in the same order as they were provided. If more than batch_size prompts are provided then this method will chunk them into batches of up to batch_size prompts that will be sent to the API.

Parameters:

request (BatchSemanticEmbeddingRequest, required):: Parameters for the requested semantic embeddings.
model (string, optional, default None):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.
num_concurrent_requests (int, optional, default 1):: Maximum number of concurrent requests to send to the API.
batch_size (int, optional, default 100):: Number of prompts per batch sent to the API. This value must be between 1 and 100 (inclusive).
progress_bar (bool, optional, default False):: Whether to show a progress bar using tqdm.

Examples:

>>> # function for symmetric embedding
>>> def embed_symmetric(texts: Sequence[str]):
        # Create an embeddingrequest with the type set to symmetric
        request = BatchSemanticEmbeddingRequest(
            prompts=[Prompt.from_text(text) for text in texts],
            representation=SemanticRepresentation.Symmetric
        )
        # create the embedding
        result = client.batch_semantic_embed(request, model=model_name)
        return result.embedding

async chat(request: ChatRequest, model: str) → ChatResponse[source]

Chat with a model.

Parameters:

request (ChatRequest, required):: Parameters for the requested chat.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> import os
>>> from aleph_alpha_client import AsyncClient, ChatRequest, Message
>>>
>>> client = AsyncClient(token=os.environ["TOKEN"], host="https://inference-api.your.domain")
>>> model = "llama-3.1-8b-instruct"
>>> # create a chat request
>>> request = ChatRequest(
        messages=[Message(role="user", content="Hello, how are you?")],
        model=model,
    )
>>>
>>> # chat with the model
>>> result = await client.chat(request, model=model)
>>> print(result.message)

async chat_with_streaming(request: ChatRequest, model: str) → AsyncGenerator[ChatStreamChunk | Usage | ToolCall | FinishReason, None][source]

Generates streamed chat completions.

The first yielded chunk contains the role, while subsequent chunks only contain the content delta.

Parameters:

request (ChatRequest, required):: Parameters for the requested chat.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> import os
>>> from aleph_alpha_client import AsyncClient, ChatRequest, Message
>>>
>>> client = AsyncClient(token=os.environ["TOKEN"], host="https://inference-api.your.domain")
>>> model = "llama-3.1-8b-instruct"
>>> # create a chat request
>>> request = ChatRequest(
        messages=[Message(role="user", content="Hello, how are you?")],
        model=model,
    )
>>>
>>> # chat with the model
>>> result = client.chat_with_streaming(request, model=model)
>>>
>>> # consume the chat stream
>>> async for stream_item in result:
>>>     print(stream_item)

async close()[source]: Needs to be called at end of lifetime if the AsyncClient object is not used as a context manager.

async complete(request: CompletionRequest, model: str) → CompletionResponse[source]

Generates completions given a prompt.

Parameters:

request (CompletionRequest, required):: Parameters for the requested completion.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # create a prompt
>>> prompt = Prompt.from_text("An apple a day, ")
>>>
>>> # create a completion request
>>> request = CompletionRequest(
        prompt=prompt,
        maximum_tokens=32,
        stop_sequences=["###","\n"],
        temperature=0.12
    )
>>>
>>> # complete the prompt
>>> result = await client.complete(request, model=model_name)

async complete_with_streaming(request: CompletionRequest, model: str) → AsyncGenerator[StreamChunk | StreamSummary | CompletionSummary, None][source]

Generates streamed completions given a prompt.

Parameters:

request (CompletionRequest, required):: Parameters for the requested completion.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # create a prompt
>>> prompt = Prompt.from_text("An apple a day, ")
>>>
>>> # create a completion request
>>> request = CompletionRequest(
        prompt=prompt,
        maximum_tokens=32,
        stop_sequences=["###","\n"],
        temperature=0.12
    )
>>>
>>> # complete the prompt
>>> result = await client.complete_with_streaming(request, model=model_name)
>>>
>>> # consume the completion stream
>>> async for stream_item in result:
>>>     do_something_with(stream_item)

async create_steering_concept(request: SteeringConceptCreationRequest) → SteeringConceptCreationResponse[source]

Creates a steering concept.

A steering concept consists of a list of steering examples. A steering example is a pair of a “negative” and a “positive” string, describing how you want to alter the model’s output.

This request will return a unique ID for the newly created steering concept that you can then use in completion and chat requests.

Parameters:

request (SteeringConceptCreationRequest, required): Parameters for the steering concepts to create.

Examples:

>>> request = SteeringConceptCreationRequest(
>>>     examples=[
>>>         SteeringPairedExample(
>>>             negative="I appreciate your valuable feedback on this matter.",
>>>             positive="Thanks for the real talk, fam.",
>>>         ),
>>>         SteeringPairedExample(
>>>             negative="The financial projections indicate significant growth potential.",
>>>             positive="Yo, these numbers are looking mad stacked!",
>>>         ),
>>>     ]
>>> )
>>> response = client.create_steering_concept(request)

async detokenize(request: DetokenizationRequest, model: str) → DetokenizationResponse[source]

Detokenizes the given prompt for the given model.

Parameters:

request (DetokenizationRequest, required):: Parameters for the requested detokenization.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = DetokenizationRequest(token_ids=[2, 3, 4])
>>> response = await client.detokenize(request, model=model_name)

async embed(request: EmbeddingRequest, model: str) → EmbeddingResponse[source]

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

request (EmbeddingRequest, required):: Parameters for the requested embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = EmbeddingRequest(prompt=Prompt.from_text("This is an example."), layers=[-1], pooling=["mean"])
>>> result = await client.embed(request, model=model_name)

async evaluate(request: EvaluationRequest, model: str) → EvaluationResponse[source]

Evaluates the model’s likelihood to produce a completion given a prompt.

Parameters:

request (EvaluationRequest, required):: Parameters for the requested evaluation.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = EvaluationRequest(
        prompt=Prompt.from_text("hello"), completion_expected=" world"
    )
>>> response = await client.evaluate(request, model=model_name)

async explain(request: ExplanationRequest, model: str) → ExplanationResponse[source]

Better understand the source of a completion, specifically on how much each section of a prompt impacts each token of the completion.

Parameters:

request (ExplanationRequest, required):: Parameters for the requested explanation.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = ExplanationRequest(
        prompt=Prompt.from_text("Andreas likes"),
        target=" pizza."
    )
>>> response = await client.explain(request, model="luminous-base")

async get_version() → str[source]: Gets version of the AlephAlpha HTTP API.

async instructable_embed(request: InstructableEmbeddingRequest, model: str) → InstructableEmbeddingResponse[source]

Embeds a text and returns vectors that can be used for classification according to a given instruction.

Parameters:

request (InstructableEmbeddingRequest, required):: Parameters for the requested instructable embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # function for salutation embedding
>>> async def embed_salutation(text: str):
        # Create an embeddingrequest with a given instruction
        request = InstructableEmbeddingRequest(
            input=Prompt.from_text(text),
            instruction="Represent the text to query a database of salutations"
        )
        # create the embedding
        result = await client.instructable_embed(request, model=model_name)
        return result.embedding
>>>
>>> # function to calculate similarity
>>> def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
        "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
        sumxx, sumxy, sumyy = 0, 0, 0
        for i in range(len(v1)):
            x = v1[i]; y = v2[i]
            sumxx += x*x
            sumyy += y*y
            sumxy += x*y
        return sumxy/math.sqrt(sumxx*sumyy)
>>>
>>> # define the texts
>>> text_a = "Hello"
>>> text_b = "Good morning"
>>>
>>> # show the similarity
>>> print(cosine_similarity(await embed_salutation(text_a), await embed_salutation(text_b)))

async models() → List[Mapping[str, Any]][source]

Queries all models which are currently available.

For documentation of the response, see https://docs.aleph-alpha.com/products/apis/pharia-inference/available-models/

async rerank(request: RerankRequest, model: str) → RerankResponse[source]

Reranks documents against a query.

This endpoint takes in a query and a list of documents and produces an array with each document assigned a relevance score.

Parameters:

request (RerankRequest, required):: Parameters for the requested reranking.
model (string, required):: Name of the model to use for reranking.

Examples:

>>> request = RerankRequest(
        query="What is the capital of France?",
        documents=[
            "The capital of Brazil is Brasilia.",
            "The capital of France is Paris.",
            "Horses and cows are both animals.",
        ],
        top_n=2,
    )
>>> response = await client.rerank(request, model="your-reranker-model")
>>> for result in response.results:
>>>     print(f"Document {result.index}: {result.relevance_score}")

async semantic_embed(request: SemanticEmbeddingRequest, model: str) → SemanticEmbeddingResponse[source]

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

request (SemanticEmbeddingRequest, required):: Parameters for the requested semantic embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # function for symmetric embedding
>>> async def embed_symmetric(text: str):
        # Create an embeddingrequest with the type set to symmetric
        request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
        # create the embedding
        result = await client.semantic_embed(request, model=model_name)
        return result.embedding
>>>
>>> # function to calculate similarity
>>> def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
        "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
        sumxx, sumxy, sumyy = 0, 0, 0
        for i in range(len(v1)):
            x = v1[i]; y = v2[i]
            sumxx += x*x
            sumyy += y*y
            sumxy += x*y
        return sumxy/math.sqrt(sumxx*sumyy)
>>>
>>> # define the texts
>>> text_a = "The sun is shining"
>>> text_b = "Il sole splende"
>>>
>>> # show the similarity
>>> print(cosine_similarity(await embed_symmetric(text_a), await embed_symmetric(text_b)))

async tokenize(request: TokenizationRequest, model: str) → TokenizationResponse[source]

Tokenizes the given prompt for the given model.

Parameters:

request (TokenizationRequest, required):: Parameters for the requested tokenization.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = TokenizationRequest(prompt="hello", token_ids=True, tokens=True)
>>> response = await client.tokenize(request, model=model_name)

async tokenizer(model: str) → Tokenizer[source]

Returns a Tokenizer instance with the settings that were used to train the model.

Examples:

>>> tokenizer = await client.tokenizer(model="luminous-base")
>>> tokenized_prompt = tokenizer.encode("Hello world")

async translate(request: TranslationRequest) → TranslationResponse[source]

Translates text from one language to another.

Parameters:

request (TranslationRequest, required):: Parameters for the requested translation.

Examples:

>>> request = TranslationRequest(
        model="pharia-1-mt-translation",
        source="Hello, how are you?",
        target_language="de"
    )
>>> response = await client.translate(request)
>>> print(response.translation)

async validate_version() → None[source]

class aleph_alpha_client.BatchSemanticEmbeddingRequest(prompts: Sequence[Prompt], representation: SemanticRepresentation, compress_to_size: int | None = None, normalize: bool = False, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True)[source]

Bases: object

Embeds multiple multi-modal prompts and returns their embeddings in the same order as they were supplied.

Parameters:

prompts

A list of texts and/or images to be embedded.

representation

Semantic representation to embed the prompt with.

compress_to_size

Options available: 128

The default behavior is to return the full embedding, but you can optionally request an embedding compressed to a smaller set of dimensions.

Full embedding sizes for supported models:

luminous-base: 5120

The 128 size is expected to have a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical.

The 128 size can also perform better if you are embedding really short texts or documents.

normalize

Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric.

contextual_control_threshold (float, default None)

If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive (bool, default True)

True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor

Examples

>>> texts = [
        "deep learning",
        "artificial intelligence",
        "deep diving",
        "artificial snow",
    ]
>>> # Texts to compare
>>> request = BatchSemanticEmbeddingRequest(prompts=[Prompt.from_text(text) for text in texts], representation=SemanticRepresentation.Symmetric)
    result = model.batch_semantic_embed(request)

compress_to_size: int | None = None

contextual_control_threshold: float | None = None

control_log_additive: bool | None = True

normalize: bool = False

prompts: Sequence[Prompt]

representation: SemanticRepresentation

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.BatchSemanticEmbeddingResponse(model_version: str, embeddings: Sequence[List[float]], num_tokens_prompt_total: int)[source]

Bases: object

Response of a batch semantic embedding request

Parameters:

model_version: Model name and version (if any) of the used model for inference
embeddings: A list of embeddings.

embeddings: Sequence[List[float]]

static from_json(json: Dict[str, Any]) → BatchSemanticEmbeddingResponse[source]

model_version: str

num_tokens_prompt_total: int

to_json() → Mapping[str, Any][source]

exception aleph_alpha_client.BusyError(*args, **kwargs)[source]: Bases: Exception

Bases: object

Describes a chat request.

Only supports a subset of the parameters of CompletionRequest for simplicity. See CompletionRequest for documentation on the parameters.

maximum_tokens: int | None = None

messages: Sequence[Message | TextMessage]

model: str

parallel_tool_calls: bool | None = None

response_format: JSONSchema | Type[BaseModel] | None = None

steering_concepts: List[str] | None = None

stream_options: StreamOptions | None = None

temperature: float | None = None

to_json() → Mapping[str, Any][source]

tool_choice: Literal['auto', 'required', 'none'] | ToolFunction | None = None

tools: List[Any] | None = None

top_k: int | None = None

top_p: float | None = None

class aleph_alpha_client.ChatResponse(finish_reason: FinishReason, message: TextMessage)[source]

Bases: object

A simplified version of the chat response.

As the ChatRequest does not support the n parameter (allowing for multiple return values), the ChatResponse assumes there to be only one choice.

finish_reason: FinishReason

static from_json(json: Dict[str, Any]) → ChatResponse[source]

message: TextMessage

class aleph_alpha_client.Client(token: str, host: str, hosting: str | None = None, request_timeout_seconds: int = 305, total_retries: int = 8, nice: bool = False, verify_ssl=True, tags: Sequence[str] | None = None, pool_size: int = 10)[source]

Bases: object

Construct a client for synchronous requests given a user token

Parameters:

token (string, required):

The API token that will be used for authentication.

host (string, required):

The hostname of the API host.

hosting(string, optional, default None):

Determines in which datacenters the request may be processed. You can either set the parameter to “aleph-alpha” or omit it (defaulting to None).

Not setting this value, or setting it to None, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to “aleph-alpha” allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

request_timeout_seconds (int, optional, default 305):

Client timeout that will be set for HTTP requests in the requests library’s API calls. Server will close all requests after 300 seconds with an internal server error.

total_retries(int, optional, default 8)

The number of retries made in case requests fail with certain retryable status codes. If the last retry fails a corresponding exception is raised. Note, that between retries an exponential backoff is applied, starting with 0.5 s after the first retry and doubling for each retry made. So with the default setting of 8 retries a total wait time of 63.5 s is added between the retries.

nice(bool, required, default False):

Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

verify_ssl(bool, optional, default True)

Setting this to False will disable checking for SSL when doing requests.

tags(Optional[Sequence[str]], optional, default None)

Internal feature.

Example usage:

>>> request = CompletionRequest(prompt=Prompt.from_text(f"Request"), maximum_tokens=64)
>>> client = Client(
        token=os.environ["TEST_TOKEN"],
        host=os.environ["TEST_API_URL"],
    )
>>> response: CompletionResponse = client.complete(request, "pharia-1-llm-7b-control")

batch_semantic_embed(request: BatchSemanticEmbeddingRequest, model: str | None = None) → BatchSemanticEmbeddingResponse[source]

Embeds a sequence of texts or images and returns vectors in the same order as they were provided. If more than 100 prompts are provided then this method will chunk them into batches of 100 prompts that will be sent to the API.

Parameters:

request (BatchSemanticEmbeddingRequest, required):: Parameters for the requested semantic embeddings.
model (string, optional, default None):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # function for symmetric embedding
>>> def embed_symmetric(texts: Sequence[str]):
        # Create an embeddingrequest with the type set to symmetric
        request = BatchSemanticEmbeddingRequest(
            prompts=[Prompt.from_text(text) for text in texts],
            representation=SemanticRepresentation.Symmetric
        )
        # create the embedding
        result = client.batch_semantic_embed(request, model=model_name)
        return result.embedding

chat(request: ChatRequest, model: str) → ChatResponse[source]

Chat with a model.

Parameters:

request (ChatRequest, required):: Parameters for the requested chat.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> import os
>>> from aleph_alpha_client import Client, ChatRequest, Message
>>>
>>> client = Client(token=os.environ["TOKEN"], host="https://inference-api.your.domain")
>>> model = "llama-3.1-8b-instruct"
>>> # create a chat request
>>> request = ChatRequest(
        messages=[Message(role="user", content="Hello, how are you?")],
        model=model,
    )
>>>
>>> result = client.chat(request, model=model)
>>> print(result.message)

complete(request: CompletionRequest, model: str) → CompletionResponse[source]

Generates completions given a prompt.

Parameters:

request (CompletionRequest, required):: Parameters for the requested completion.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # create a prompt
>>> prompt = Prompt.from_text("An apple a day, ")
>>>
>>> # create a completion request
>>> request = CompletionRequest(
        prompt=prompt,
        maximum_tokens=32,
        stop_sequences=["###","\n"],
        temperature=0.12
    )
>>>
>>> # complete the prompt
>>> result = client.complete(request, model=model_name)

create_steering_concept(request: SteeringConceptCreationRequest) → SteeringConceptCreationResponse[source]

Creates a steering concept.

A steering concept consists of a list of steering examples. A steering example is a pair of a “negative” and a “positive” string, describing how you want to alter the model’s output.

This request will return a unique ID for the newly created steering concept that you can then use in completion and chat requests.

Parameters:

request (SteeringConceptCreationRequest, required): Parameters for the steering concepts to create.

Examples:

>>> request = SteeringConceptCreationRequest(
>>>     examples=[
>>>         SteeringPairedExample(
>>>             negative="I appreciate your valuable feedback on this matter.",
>>>             positive="Thanks for the real talk, fam.",
>>>         ),
>>>         SteeringPairedExample(
>>>             negative="The financial projections indicate significant growth potential.",
>>>             positive="Yo, these numbers are looking mad stacked!",
>>>         ),
>>>     ]
>>> )
>>> response = client.create_steering_concept(request)

detokenize(request: DetokenizationRequest, model: str) → DetokenizationResponse[source]

Detokenizes the given prompt for the given model.

Parameters:

request (DetokenizationRequest, required):: Parameters for the requested detokenization.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = DetokenizationRequest(token_ids=[2, 3, 4])
>>> response = client.detokenize(request, model=model_name)

embed(request: EmbeddingRequest, model: str) → EmbeddingResponse[source]

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

request (EmbeddingRequest, required):: Parameters for the requested embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = EmbeddingRequest(prompt=Prompt.from_text(
        "This is an example."), layers=[-1], pooling=["mean"]
    )
>>> result = client.embed(request, model=model_name)

embeddings(request: EmbeddingV2Request, model: str) → EmbeddingV2Response[source]

Embeds a text and returns vectors that can be used for downstream tasks. This interface is compatible to the OpenAI /embeddings endpoint.

Parameters:

request (EmbeddingV2Request, required):: Parameters for the requested embedding.
model (string, required):: Name of model to use.

Examples:

>>> request = EmbeddingV2Request(input="This is an example", dimensions=20)
>>> result = client.embeddings(request, model=model_name)

evaluate(request: EvaluationRequest, model: str) → EvaluationResponse[source]

Evaluates the model’s likelihood to produce a completion given a prompt.

Parameters:

request (EvaluationRequest, required):: Parameters for the requested evaluation.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = EvaluationRequest(
        prompt=Prompt.from_text("hello"), completion_expected=" world"
    )
>>> response = client.evaluate(request, model=model_name)

explain(request: ExplanationRequest, model: str) → ExplanationResponse[source]

Better understand the source of a completion, specifically on how much each section of a prompt impacts each token of the completion.

Parameters:

request (ExplanationRequest, required):: Parameters for the requested explanation.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = ExplanationRequest(
        prompt=Prompt.from_text("Andreas likes"),
        target=" pizza."
    )
>>> response = client.explain(request, model="luminous-base")

get_version() → str[source]: Gets version of the AlephAlpha HTTP API.

instructable_embed(request: InstructableEmbeddingRequest, model: str) → InstructableEmbeddingResponse[source]

Embeds a text and returns vectors that can be used for classification according to a given instruction.

Parameters:

request (InstructableEmbeddingRequest, required):: Parameters for the requested instructable embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # function for salutation embedding
>>> def embed_salutation(text: str):
        # Create an embeddingrequest with a given instruction
        request = InstructableEmbeddingRequest(
            input=Prompt.from_text(text),
            instruction="Represent the text to query a database of salutations"
        )
        # create the embedding
        result = client.instructable_embed(request, model=model_name)
        return result.embedding
>>>
>>> # function to calculate similarity
>>> def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
        "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
        sumxx, sumxy, sumyy = 0, 0, 0
        for i in range(len(v1)):
            x = v1[i]; y = v2[i]
            sumxx += x*x
            sumyy += y*y
            sumxy += x*y
        return sumxy/math.sqrt(sumxx*sumyy)
>>>
>>> # define the texts
>>> text_a = "Hello"
>>> text_b = "Good morning"
>>>
>>> # show the similarity
>>> print(cosine_similarity(embed_salutation(text_a), embed_salutation(text_b)))

models() → List[Mapping[str, Any]][source]

Queries all models which are currently available.

For documentation of the response, see https://docs.aleph-alpha.com/products/apis/pharia-inference/available-models/

rerank(request: RerankRequest, model: str) → RerankResponse[source]

Reranks documents against a query.

This endpoint takes in a query and a list of documents and produces an array with each document assigned a relevance score.

Parameters:

request (RerankRequest, required):: Parameters for the requested reranking.
model (string, required):: Name of the model to use for reranking.

Examples:

>>> request = RerankRequest(
        query="What is the capital of France?",
        documents=[
            "The capital of Brazil is Brasilia.",
            "The capital of France is Paris.",
            "Horses and cows are both animals.",
        ],
        top_n=2,
    )
>>> response = client.rerank(request, model="your-reranker-model")
>>> for result in response.results:
>>>     print(f"Document {result.index}: {result.relevance_score}")

semantic_embed(request: SemanticEmbeddingRequest, model: str) → SemanticEmbeddingResponse[source]

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

request (SemanticEmbeddingRequest, required):: Parameters for the requested semantic embedding.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> # function for symmetric embedding
>>> def embed_symmetric(text: str):
        # Create an embeddingrequest with the type set to symmetric
        request = SemanticEmbeddingRequest(prompt=Prompt.from_text(
            text), representation=SemanticRepresentation.Symmetric)
        # create the embedding
        result = client.semantic_embed(request, model=model_name)
        return result.embedding
>>>
>>> # function to calculate similarity
>>> def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
        "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
        sumxx, sumxy, sumyy = 0, 0, 0
        for i in range(len(v1)):
            x = v1[i]; y = v2[i]
            sumxx += x*x
            sumyy += y*y
            sumxy += x*y
        return sumxy/math.sqrt(sumxx*sumyy)
>>>
>>> # define the texts
>>> text_a = "The sun is shining"
>>> text_b = "Il sole splende"
>>>
>>> # show the similarity
>>> print(cosine_similarity(embed_symmetric(text_a), embed_symmetric(text_b)))

tokenize(request: TokenizationRequest, model: str) → TokenizationResponse[source]

Tokenizes the given prompt for the given model.

Parameters:

request (TokenizationRequest, required):: Parameters for the requested tokenization.
model (string, required):: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used.

Examples:

>>> request = TokenizationRequest(
        prompt="hello", token_ids=True, tokens=True
    )
>>> response = client.tokenize(request, model=model_name)

tokenizer(model: str) → Tokenizer[source]

Returns a Tokenizer instance with the settings that were used to train the model.

Examples:

>>> tokenizer = client.tokenizer(model="luminous-base")
>>> tokenized_prompt = tokenizer.encode("Hello world")

translate(request: TranslationRequest) → TranslationResponse[source]

Translates text from one language to another.

Parameters:

request (TranslationRequest, required):: Parameters for the requested translation.

Examples:

>>> request = TranslationRequest(
        model="pharia-1-mt-translation",
        source="Hello, how are you?",
        target_language="de"
    )
>>> response = client.translate(request)
>>> print(response.translation)

validate_version() → None[source]: Gets version of the AlephAlpha HTTP API.

class aleph_alpha_client.CompletionRequest(prompt: Prompt, maximum_tokens: int | None = None, temperature: float = 0.0, top_k: int = 0, top_p: float = 0.0, presence_penalty: float = 0.0, frequency_penalty: float = 0.0, repetition_penalties_include_prompt: bool = False, use_multiplicative_presence_penalty: bool = False, penalty_bias: str | None = None, penalty_exceptions: List[str] | None = None, penalty_exceptions_include_stop_sequences: bool | None = None, best_of: int | None = None, n: int = 1, logit_bias: Dict[int, float] | None = None, log_probs: int | None = None, stop_sequences: List[str] | None = None, tokens: bool = False, disable_optimizations: bool = False, minimum_tokens: int = 0, echo: bool = False, use_multiplicative_frequency_penalty: bool = False, sequence_penalty: float = 0.0, sequence_penalty_min_length: int = 2, use_multiplicative_sequence_penalty: bool = False, completion_bias_inclusion: Sequence[str] | None = None, completion_bias_inclusion_first_token_only: bool = False, completion_bias_exclusion: Sequence[str] | None = None, completion_bias_exclusion_first_token_only: bool = False, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True, repetition_penalties_include_completion: bool = True, raw_completion: bool = False, steering_concepts: List[str] | None = None)[source]

Bases: object

Describes a completion request

Parameters:

prompt:

The text or image prompt to be completed. Unconditional completion can be started with an empty string (default). The prompt may contain a zero shot or few shot task.

maximum_tokens (int, optional, default None):

The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached. Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The maximum supported number of tokens depends on the model (for luminous-base, it may not exceed 2048 tokens). The prompt’s tokens plus the maximum_tokens request must not exceed this number. If set to None, the model will stop generating tokens either if it outputs a sequence specified in stop_sequences or if it reaches its technical limit. For most models, this means that the sum of input and output tokens is equal to its context window.

temperature (float, optional, default 0.0)

A higher sampling temperature encourages the model to produce less probable outputs (“be more creative”). Values are expected in a range from 0.0 to 1.0. Try high values (e.g. 0.9) for a more “creative” response and the default 0.0 for a well defined and repeatable answer.

It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_k (int, optional, default 0)

Introduces random sampling from generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0 if repeatable output is to be produced. It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_p (float, optional, default 0.0)

Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is to be produced. It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

presence_penalty (float, optional, default 0.0)

The presence penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). Presence penalty is independent of the number of occurrences. Increase the value to produce text that is not repeating the input.

frequency_penalty (float, optional, default 0.0)

The frequency penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). Frequency penalty is dependent on the number of occurrences of a token.

repetition_penalties_include_prompt (bool, optional, default False)

Flag deciding whether presence penalty or frequency penalty are updated from the prompt

use_multiplicative_presence_penalty (bool, optional, default True)

Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence and frequency penalty.

penalty_bias (string, optional)

If set, all tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens if repetition_penalties_include_completion is set to true and the prompt tokens, if repetition_penalties_include_prompt is set to true,

Potential use case for a chatbot-based completion:

Instead of using repetition_penalties_include_prompt, construct a new string with only the chatbot’s responses included. You would leave out any tokens you use for stop sequences (i.e. \nChatbot:), and all user messages.

With this bias, if you turn up the repetition penalties, you can avoid having your chatbot repeat itself, but not penalize the chatbot from mirroring language provided by the user.

penalty_exceptions (List(str), optional)

List of strings that may be generated without penalty, regardless of other penalty settings.

This is particularly useful for any completion that uses a structured few-shot prompt. For example, if you have a prompt such as:

I want to travel to a location, where I can enjoy both beaches and mountains.

- Lake Garda, Italy. This large Italian lake in the southern alps features gravel beaches and mountainside hiking trails.
- Mallorca, Spain. This island is famous for its sandy beaches, turquoise water and hilly landscape.
- Lake Tahoe, California. This famous lake in the Sierra Nevada mountains offers an amazing variety of outdoor activities.
-

You could set penalty_exceptions to ["\n-"] to not penalize the generation of a new list item, but still increase other penalty settings to encourage the generation of new list items without repeating itself.

By default, we will also include any stop_sequences you have set, since completion performance can be degraded if expected stop sequences are penalized. You can disable this behavior by settings penalty_exceptions_include_stop_sequences to false.

penalty_exceptions_include_stop_sequences (bool, optional, default true)

By default, we include any stop_sequences in penalty_exceptions, to not penalize the presence of stop sequences that are present in few-shot prompts to provide structure to your completions.

You can set this to false if you do not want this behavior.

See the description of penalty_exceptions above for more information on what penalty_exceptions are used for.

best_of (int, optional, default None)

Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n.

n (int, optional, default 1)

How many completions to generate for each prompt.

logit_bias (dict mapping token ids to score, optional, default None)

The logit bias allows to influence the likelihood of generating tokens. A dictionary mapping token ids (int) to a bias (float) can be provided. Such bias is added to the logits as generated by the model.

log_probs (int, optional, default None)

Number of top log probabilities to be returned for each generated token. Log probabilities may be used in downstream tasks or to assess the model’s certainty when producing tokens.

If set to 0, you will always get the log probability of the sampled token. 1 or more will return the argmax token(s) plus the sampled one, if not already included.

stop_sequences (List(str), optional, default None)

List of strings which will stop generation if they’re generated. Stop sequences may be helpful in structured texts.

Example: In a question answering scenario a text may consist of lines starting with either “Question: “ or “Answer: “ (alternating). After producing an answer, the model will be likely to generate “Question: “. “Question: “ may therefore be used as stop sequence in order not to have the model generate more questions but rather restrict text generation to the answers.

tokens (bool, optional, default False)

return tokens of completion

disable_optimizations (bool, optional, default False)

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you.

Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your prompt and completion untouched.

minimum_tokens (int, default 0)

Generate at least this number of tokens before an end-of-text token is generated.

echo (bool, default False)

Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt.

use_multiplicative_frequency_penalty (bool, default False)

Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False).

sequence_penalty (float, default 0.0)

Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion (if repetition_penalties_include_completion is True).

sequence_penalty_min_length (int, default 2)

Minimal number of tokens to be considered as sequence. Must be greater or equal 2.

use_multiplicative_sequence_penalty (bool, default False)

Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False).

completion_bias_inclusion (List[str], default [])

Bias the completion to only generate options within this list; all other tokens are disregarded at sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

completion_bias_inclusion_first_token_only (bool, default False)

Only consider the first token for the completion_bias_inclusion

completion_bias_exclusion (List[str], default [])

Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

completion_bias_exclusion_first_token_only (bool, default False)

Only consider the first token for the completion_bias_exclusion

contextual_control_threshold (float, default None)

If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive (bool, default True)

True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor

repetition_penalties_include_completion (bool, optional, default True)

Flag deciding whether presence penalty or frequency penalty are updated from the completion

raw_completion (bool, default False)

Setting this parameter to true forces the raw completion of the model to be returned. For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion.

steering_concepts (Optional[list[str]], default None)

Names of the steering vectors to apply on this task. This steers the output in the direction given by positive examples, and away from negative examples if provided.

Examples:

>>> prompt = Prompt.from_text("Provide a short description of AI:")
>>> request = CompletionRequest(prompt=prompt, maximum_tokens=20)

best_of: int | None = None

completion_bias_exclusion: Sequence[str] | None = None

completion_bias_exclusion_first_token_only: bool = False

completion_bias_inclusion: Sequence[str] | None = None

completion_bias_inclusion_first_token_only: bool = False

contextual_control_threshold: float | None = None

control_log_additive: bool | None = True

disable_optimizations: bool = False

echo: bool = False

frequency_penalty: float = 0.0

log_probs: int | None = None

logit_bias: Dict[int, float] | None = None

maximum_tokens: int | None = None

minimum_tokens: int = 0

n: int = 1

penalty_bias: str | None = None

penalty_exceptions: List[str] | None = None

penalty_exceptions_include_stop_sequences: bool | None = None

presence_penalty: float = 0.0

prompt: Prompt

raw_completion: bool = False

repetition_penalties_include_completion: bool = True

repetition_penalties_include_prompt: bool = False

sequence_penalty: float = 0.0

sequence_penalty_min_length: int = 2

steering_concepts: List[str] | None = None

stop_sequences: List[str] | None = None

temperature: float = 0.0

to_json() → Mapping[str, Any][source]

tokens: bool = False

top_k: int = 0

top_p: float = 0.0

use_multiplicative_frequency_penalty: bool = False

use_multiplicative_presence_penalty: bool = False

use_multiplicative_sequence_penalty: bool = False

class aleph_alpha_client.CompletionResponse(model_version: str, completions: Sequence[CompletionResult], num_tokens_prompt_total: int, num_tokens_generated: int, optimized_prompt: Prompt | None = None)[source]

Bases: object

Describes a completion response

Parameters:

model_version:: Model name and version (if any) of the used model for inference.
completions:: List of completions; may contain only one entry if no more are requested (see parameter n).
num_tokens_prompt_total:: Number of tokens combined across all completion tasks. In particular, if you set best_of or n to a number larger than 1 then we report the combined prompt token count for all best_of or n tasks.
num_tokens_generated:: Number of tokens combined across all completion tasks. If multiple completions are returned or best_of is set to a value greater than 1 then this value contains the combined generated token count.
optimized_prompt:: Describes prompt after optimizations. This field is only returned if the flag disable_optimizations flag is not set and the prompt has actually changed.

completions: Sequence[CompletionResult]

static from_json(json: Dict[str, Any]) → CompletionResponse[source]

model_version: str

num_tokens_generated: int

num_tokens_prompt_total: int

optimized_prompt: Prompt | None = None

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.ControlTokenOverlap(*values)[source]

Bases: Enum

What to do if a control partially overlaps with a text or image token.

Partial:: The factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5.
Complete:: The full factor will be applied as long as the control overlaps with the token at all. How many explanations should be returned in the output.

Complete = 'complete'

Partial = 'partial'

to_json() → str[source]

class aleph_alpha_client.CustomGranularity(delimiter: str)[source]

Bases: object

Allows for passing a custom delimiter to determine the granularity to to explain the prompt by. The text of the prompt will be split by the delimiter you provide.

Parameters:

delimiter (str, required):: String to split the text in the prompt by for generating explanations for your prompt.

delimiter: str

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.DetokenizationRequest(token_ids: Sequence[int])[source]

Bases: object

Describes a detokenization request.

Parameters

token_ids (Sequence[int]): Ids of the tokens for which the text should be returned.

Examples:

>>> DetokenizationRequest(token_ids=[1730, 387, 300, 4377, 17])

to_json() → Mapping[str, Any][source]

token_ids: Sequence[int]

class aleph_alpha_client.DetokenizationResponse(result: str)[source]

Bases: object

static from_json(json: Dict[str, Any]) → DetokenizationResponse[source]

result: str

Bases: object

A document that can be either a docx document or text/image prompts.

classmethod from_docx_bytes(bytes: bytes)[source]: Pass a docx file in bytes and prepare it to be used as a document

classmethod from_docx_file(path: str)[source]

Load a docx file from disk and prepare it to be used as a document

Examples:

>>> docx_file = "./tests/sample.docx"
>>> document = Document.from_docx_file(docx_file)

classmethod from_prompt(prompt: Prompt | Sequence[str | Image])[source]: Pass a prompt that can contain multiple strings and Image prompts and prepare it to be used as a document

classmethod from_text(text: str)[source]

Pass a single text and prepare it to be used as a document

Example:

>>> prompt = "This is an example."
>>> document = Document.from_text(prompt)

class aleph_alpha_client.EmbeddingRequest(prompt: Prompt, layers: List[int], pooling: List[str], type: str | None = None, tokens: bool = False, normalize: bool = False, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True)[source]

Bases: object

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

prompt

The text and/or image(s) to be embedded.

layers

A list of layer indices from which to return embeddings.

Index 0 corresponds to the word embeddings used as input to the first transformer layer
Index 1 corresponds to the hidden state as output by the first transformer layer, index 2 to the output of the second layer etc.
Index -1 corresponds to the last transformer layer (not the language modelling head), index -2 to the second last layer etc.

pooling

Pooling operation to use. Pooling operations include:

mean: aggregate token embeddings across the sequence dimension using an average
max: aggregate token embeddings across the sequence dimension using a maximum
last_token: just use the last token
abs_max: aggregate token embeddings across the sequence dimension using a maximum of absolute values

type

Type of the embedding (e.g. symmetric or asymmetric)

tokens

Flag indicating whether the tokenized prompt is to be returned (True) or not (False)

normalize

Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric.

contextual_control_threshold (float, default None)

If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive (bool, default True)

True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor

Examples:

>>> prompt = Prompt.from_text("This is an example.")
>>> EmbeddingRequest(prompt=prompt, layers=[-1], pooling=["mean"])

contextual_control_threshold: float | None = None

control_log_additive: bool | None = True

layers: List[int]

normalize: bool = False

pooling: List[str]

prompt: Prompt

to_json() → Mapping[str, Any][source]

tokens: bool = False

type: str | None = None

class aleph_alpha_client.EmbeddingResponse(model_version: str, num_tokens_prompt_total: int, embeddings: Dict[Tuple[str, str], List[float]] | None, tokens: List[str] | None, message: str | None = None)[source]

Bases: object

embeddings: Dict[Tuple[str, str], List[float]] | None

static from_json(json: Dict[str, Any]) → EmbeddingResponse[source]

message: str | None = None

model_version: str

num_tokens_prompt_total: int

tokens: List[str] | None

Bases: object

Embeds a text and returns embedding vectors following the OpenAI API specs.

Parameters:

input: The input to be embedded. Can be a string, list of strings, list of integers (i.e. tokens) or a list of lists of integers (i.e. multiple token strings).
dimensions: The number of dimensions the resulting output embeddings should have. Note, not all models support this parameter.

Examples

>>> request = EmbeddingV2Request(
        input="Hello World!",
        dimensions=64,
    )
    result = model.embeddings(request)

dimensions: int | None = None

encoding_format: Literal['float', 'base64'] | None = None

input: str | List[str] | List[int] | List[List[int]]

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.EmbeddingV2Response(object: str, data: List[EmbeddingV2ResponseData], model: str, usage: Usage)[source]

Bases: object

Response of an OpenAI compatible embedding request

Parameters:

object: The object type, which is always “embedding”.
data: Embeddings output data.
model: Name of the model used to generate the embeddings.
usage: Usage information, including the number of tokens in the input and output.

data: List[EmbeddingV2ResponseData]

static from_json(json: Dict[str, Any]) → EmbeddingV2Response[source]

model: str

object: str

usage: Usage

class aleph_alpha_client.EvaluationRequest(prompt: Prompt, completion_expected: str, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True)[source]

Bases: object

Evaluates the model’s likelihood to produce a completion given a prompt.

Parameters:

prompt (str, optional, default “”):: The text to be completed. Unconditional completion can be used with an empty string (default). The prompt may contain a zero shot or few shot task.
completion_expected (str, required):: The ground truth completion expected to be produced given the prompt.
contextual_control_threshold (float, default None): If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.
control_log_additive (bool, default True): True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor

Examples:

>>> request = EvaluationRequest(prompt=Prompt.from_text("The api works"), completion_expected=" well")

completion_expected: str

contextual_control_threshold: float | None = None

control_log_additive: bool | None = True

prompt: Prompt

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.EvaluationResponse(model_version: str, message: str | None, result: Dict[str, Any], num_tokens_prompt_total: int)[source]

Bases: object

static from_json(json: Dict[str, Any]) → EvaluationResponse[source]

message: str | None

model_version: str

num_tokens_prompt_total: int

result: Dict[str, Any]

class aleph_alpha_client.Explanation(target: str, items: List[TextPromptItemExplanation | TargetPromptItemExplanation | TokenPromptItemExplanation | ImagePromptItemExplanation])[source]

Bases: object

Explanations for a given portion of the target.

Parameters:

target (str, required): If target_granularity was set to “complete”, then this will be the entire target. If it was set to “token”, this will be a single target token.
items (List[Union[TextPromptItemExplanation, TargetPromptItemExplanation, TokenPromptItemExplanation, ImagePromptItemExplanation], required): Contains one item for each prompt item (in order), and the last item refers to the target.

static from_json(json: Dict[str, Any]) → Explanation[source]

items: List[TextPromptItemExplanation | TargetPromptItemExplanation | TokenPromptItemExplanation | ImagePromptItemExplanation]

prompt_item_from_json() → TextPromptItemExplanation | ImagePromptItemExplanation | TargetPromptItemExplanation | TokenPromptItemExplanation[source]

target: str

with_image_prompt_items_in_pixels(prompt: Prompt) → Explanation[source]

with_text_from_prompt(prompt: Prompt, target: str) → Explanation[source]

class aleph_alpha_client.ExplanationPostprocessing(*values)[source]

Bases: Enum

Available types of explanation postprocessing.

Square:: Square each score
Absolute:: Take the absolute value of each score

Absolute = 'absolute'

Square = 'square'

to_json() → str[source]

class aleph_alpha_client.ExplanationRequest(prompt: Prompt, target: str, contextual_control_threshold: float | None = None, control_factor: float | None = None, control_token_overlap: ControlTokenOverlap | None = None, control_log_additive: bool | None = None, prompt_granularity: PromptGranularity | str | CustomGranularity | None = None, target_granularity: TargetGranularity | None = None, postprocessing: ExplanationPostprocessing | None = None, normalize: bool | None = None)[source]

Bases: object

Describes an Explanation request you want to make agains the API.

Parameters:

prompt (Prompt, required)

Prompt you want to generate explanations for a target completion.

target (str, required)

The completion string to be explained based on model probabilities.

contextual_control_threshold (float, default None)

If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_factor (float, default None):

The amount to adjust model attention by. For Explanation, you want to supress attention, and the API will default to 0.1. Values between 0 and 1 will supress attention. A value of 1 will have no effect. Values above 1 will increase attention.

control_token_overlap (ControlTokenOverlap, default None)

What to do if a control partially overlaps with a text or image token. If set to “partial”, the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. If set to “complete”, the full factor will be applied as long as the control overlaps with the token at all.

control_log_additive (bool, default None)

True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor If None, the API will default to True

prompt_granularity (Union[PromptGranularity, str, CustomGranularity], default None)

At which granularity should the target be explained in terms of the prompt. If you choose, for example, “sentence” then we report the importance score of each sentence in the prompt towards generating the target output.

If you do not choose a granularity then we will try to find the granularity that brings you closest to around 30 explanations. For large documents, this would likely be sentences. For short prompts this might be individual words or even tokens.

If you choose a custom granularity then you must provide a custom delimiter. We then split your prompt by that delimiter. This might be helpful if you are using few-shot prompts that contain stop sequences.

We currently support providing the prompt_granularity as PromptGranularity (recommended) or CustomGranularity (if needed) or str (deprecated). Note that supplying plain strings only makes sense if you choose one of the values defined in the PromptGranularity enum. All other strings will be rejected by the API. In future versions we might cut support for plain str values.

For image prompt items, the granularities determine into how many tiles we divide the image for the explanation. “token” -> 12x12 “word” -> 6x6 “sentence” -> 3x3 “paragraph” -> 1

target_granularity (TargetGranularity, default None)

How many explanations should be returned in the output.

“complete” -> Return one explanation for the entire target. Helpful in many cases to determine which parts of the prompt contribute overall to the given completion. “token” -> Return one explanation for each token in the target.

If None, API will default to “complete”

postprocessing (ExplanationPostprocessing, default None)

Optionally apply postprocessing to the difference in cross entropy scores for each token. “none”: Apply no postprocessing. “absolute”: Return the absolute value of each value. “square”: Square each value

normalize (bool, default None)

Return normalized scores. Minimum score becomes 0 and maximum score becomes 1. Applied after any postprocessing

contextual_control_threshold: float | None = None

control_factor: float | None = None

control_log_additive: bool | None = None

control_token_overlap: ControlTokenOverlap | None = None

normalize: bool | None = None

postprocessing: ExplanationPostprocessing | None = None

prompt: Prompt

prompt_granularity: PromptGranularity | str | CustomGranularity | None = None

target: str

target_granularity: TargetGranularity | None = None

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.ExplanationResponse(model_version: str, explanations: List[Explanation])[source]

Bases: object

The top-level response data structure that will be returned from an explanation request.

Parameters:

model_version (str, required): Version of the model used to generate the explanation.
explanations (List[Explanation], required): This array will contain one explanation object for each portion of the target.

explanations: List[Explanation]

static from_json(json: Dict[str, Any]) → ExplanationResponse[source]

model_version: str

with_image_prompt_items_in_pixels(prompt: Prompt) → ExplanationResponse[source]

with_text_from_prompt(request: ExplanationRequest) → ExplanationResponse[source]

class aleph_alpha_client.Image(base_64: str, cropping: Cropping | None, controls: Sequence[ImageControl])[source]

Bases: object

An image send as part of a prompt to a model. The image is represented as base64.

Note: The models operate on square images. All non-square images are center-cropped before going to the model, so portions of the image may not be visible.

You can supply specific cropping parameters if you like, to choose a different area of the image than a center-crop. Or, you can always transform the image yourself to a square before sending it.

Examples:

>>> # You need to choose a model with multimodal capabilities for this example.
>>> url = "https://cdn-images-1.medium.com/max/1200/1*HunNdlTmoPj8EKpl-jqvBA.png"
>>> image = Image.from_url(url)

base_64: str

controls: Sequence[ImageControl]

cropping: Cropping | None

dimensions() → Tuple[int, int][source]

classmethod from_bytes(bytes: bytes, cropping: Cropping | None = None, controls: Sequence[ImageControl] | None = None)[source]

classmethod from_file(path: str | Path, controls: Sequence[ImageControl] | None = None)[source]: Load an image from disk and prepare it to be used in a prompt If they are not provided then the image will be [center cropped](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.CenterCrop)

classmethod from_file_with_cropping(path: str, upper_left_x: int, upper_left_y: int, crop_size: int, controls: Sequence[ImageControl] | None = None)[source]: Load an image from disk and prepare it to be used in a prompt upper_left_x, upper_left_y and crop_size are used to crop the image.

classmethod from_image_source(image_source: str | Path | bytes, controls: Sequence[ImageControl] | None = None)[source]: Abstraction on top of the existing methods of image initialization. If you are not sure what the exact type of your image, but you know it is either a Path object, URL, a file path, or a bytes array, just use the method and we will figure out which of the methods of image initialization to use

static from_json(json: Mapping[str, Any]) → Image[source]

classmethod from_url(url: str, controls: Sequence[ImageControl] | None = None)[source]: Downloads a file and prepare it to be used in a prompt. The image will be [center cropped](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.CenterCrop)

classmethod from_url_with_cropping(url: str, upper_left_x: int, upper_left_y: int, crop_size: int, controls: Sequence[ImageControl] | None = None)[source]: Downloads a file and prepare it to be used in a prompt. upper_left_x, upper_left_y and crop_size are used to crop the image.

to_image() → Image[source]

to_json() → Mapping[str, Any][source]: A dict if serialized to JSON is suitable as a prompt element

class aleph_alpha_client.ImageControl(left: float, top: float, width: float, height: float, factor: float, token_overlap: ControlTokenOverlap | None = None)[source]

Bases: object

Attention manipulation for an Image PromptItem.

All coordinates of the bounding box are logical coordinates (between 0 and 1) and relative to the entire image.

Keep in mind, non-square images are center-cropped by default before going to the model. (You can specify a custom cropping if you want.). Since control coordinates are relative to the entire image, all or a portion of your control may be outside the “model visible area”.

Parameters:

left (float, required):

x-coordinate of top left corner of the control bounding box. Must be a value between 0 and 1, where 0 is the left corner and 1 is the right corner.

top (float, required):

y-coordinate of top left corner of the control bounding box Must be a value between 0 and 1, where 0 is the top pixel row and 1 is the bottom row.

width (float, required):

width of the control bounding box Must be a value between 0 and 1, where 1 means the full width of the image.

height (float, required):

height of the control bounding box Must be a value between 0 and 1, where 1 means the full height of the image.

factor (float, required):

The amount to adjust model attention by. Values between 0 and 1 will suppress attention. A value of 1 will have no effect. Values above 1 will increase attention.

token_overlap (ControlTokenOverlap, optional):

What to do if a control partially overlaps with an image token.

If set to “partial”, the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only half of the image “tile”, would be adjusted to 1.5.

If set to “complete”, the full factor will be applied as long as the control overlaps with the token at all.

If not set, the API will default to “partial”.

factor: float

height: float

left: float

to_json() → Mapping[str, Any][source]

token_overlap: ControlTokenOverlap | None = None

top: float

width: float

class aleph_alpha_client.ImagePromptItemExplanation(scores: List[ImageScore])[source]

Bases: object

Explains the importance of an image prompt item. The amount of items in the “scores” array depends on the granularity setting. Each score object contains the top-left corner of a rectangular area in the image prompt. The coordinates are all between 0 and 1 in terms of the total image size

static from_json(item: Dict[str, Any]) → ImagePromptItemExplanation[source]

in_pixels(prompt_item: Text | Tokens | Image) → ImagePromptItemExplanation[source]

scores: List[ImageScore]

class aleph_alpha_client.ImageScore(left: float, top: float, width: float, height: float, score: float)[source]

Bases: object

static from_json(score: Any) → ImageScore[source]

height: float

left: float

score: float

top: float

width: float

class aleph_alpha_client.Message(role: Role, content: str | List[str | Image], tool_call_id: str | None = None, tool_calls: List[ToolCall] | None = None)[source]

Bases: object

Describes a message in a chat.

Parameters:

role (Role, required):: The role of the message.
content (str | List[Union[str | Image]], required):: The content of the message.

content: str | List[str | Image]

role: Role

to_json() → Mapping[str, Any][source]

tool_call_id: str | None = None

tool_calls: List[ToolCall] | None = None

class aleph_alpha_client.Prompt(items: str | Sequence[Text | Tokens | Image])[source]

Bases: object

Examples:

>>> prompt = Prompt.from_text("Provide a short description of AI:")
>>> prompt = Prompt([
        Image.from_url(url),
        Text.from_text("Provide a short description of AI:"),
    ])

static from_image(image: Image) → Prompt[source]

static from_json(items_json: Sequence[Mapping[str, Any]]) → Prompt[source]

static from_text(text: str, controls: Sequence[TextControl] | None = None) → Prompt[source]

static from_tokens(tokens: Sequence[int], controls: Sequence[TokenControl] | None = None) → Prompt[source]

Examples:

>>> prompt = Prompt.from_tokens([1, 2, 3])

items: Sequence[Text | Tokens | Image]

to_json() → Sequence[Mapping[str, Any]][source]

class aleph_alpha_client.PromptGranularity(*values)[source]

Bases: Enum

Paragraph = 'paragraph'

Sentence = 'sentence'

Token = 'token'

Word = 'word'

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.PromptTemplate(template_str: str)[source]

Bases: object

Allows to build a Prompt using the liquid template language.

To add non-text prompt items first you have to save it to the template with the template.placeholder() function. To embed the items in the template, pass the placeholder in the place(s) where you would like the items.

Example:

>>> image = Image.from_file(Path("path-to-image"))
>>> template = PromptTemplate(
    '''{%- for name in names -%}
    Hello {{name}}!
    {% endfor -%}
    {{ image }}
    ''')
>>> placeholder = template.placeholder(image)
>>> names = ["World", "Rutger"]
>>> prompt = template.to_prompt(names=names, image=placeholder)
>>> request = CompletionRequest(prompt=prompt)

embed_prompt(prompt: Prompt) → str[source]

Embeds a prompt in a prompt template

Adds whitespace between text items if there is no whitespace between them. In case of non-text prompt items, this embeds them into the end result.

Example:

>>> user_prompt = Prompt(
        [
            Tokens.from_token_ids([1, 2, 3]),
            Text.from_text("cool"),
            Image.from_file(Path("path-to-image")),
        ]
    )
>>> template = PromptTemplate("Question: {{user_prompt}}\n Answer: ")
>>> prompt = template.to_prompt(user_prompt=template.embed_prompt(user_prompt))

Parameters:

prompt: prompt to embed in the template

placeholder(prompt_item: Image | Tokens) → Placeholder[source]

Saves a non-text prompt item to the template and returns a placeholder

The placeholder is used to embed the prompt item in the template

to_prompt(**kwargs) → Prompt[source]

Creates a Prompt from the template string and the given parameters.

Provided parameters are passed to liquid.Template.render.

exception aleph_alpha_client.QuotaError(*args, **kwargs)[source]: Bases: Exception

class aleph_alpha_client.RerankRequest(query: str, documents: Sequence[str], top_n: int | None = None)[source]

Bases: object

Request for reranking documents against a query.

This endpoint takes in a query and a list of documents and produces an array with each document assigned a relevance score.

Parameters:

query (str, required):: The query to rerank the documents against.
documents (Sequence[str], required):: The list of documents to rerank.
top_n (int, optional):: The number of documents to return. Defaults to the number of documents if not provided.

Examples:

>>> request = RerankRequest(
...     query="What is the capital of France?",
...     documents=[
...         "The capital of Brazil is Brasilia.",
...         "The capital of France is Paris.",
...         "Horses and cows are both animals.",
...     ],
...     top_n=2,
... )
>>> response = client.rerank(request, model="your-reranker-model")

documents: Sequence[str]

query: str

to_json() → Mapping[str, Any][source]: Convert the request to a JSON-serializable dictionary.

top_n: int | None = None

class aleph_alpha_client.RerankResponse(results: List[RerankResult], usage: RerankUsage)[source]

Bases: object

Response from a rerank request.

Parameters:

results (List[RerankResult]):: The reranked results, each containing the original document index and its relevance score.
usage (RerankUsage):: Usage statistics for the request.

static from_json(json: Dict[str, Any]) → RerankResponse[source]

results: List[RerankResult]

usage: RerankUsage

class aleph_alpha_client.RerankResult(index: int, relevance_score: float)[source]

Bases: object

A single reranked document result.

Parameters:

index (int):: The index of the document in the original list of documents.
relevance_score (float):: The relevance score of the document.

static from_json(json: Dict[str, Any]) → RerankResult[source]

index: int

relevance_score: float

class aleph_alpha_client.RerankUsage(completion_tokens: int, prompt_tokens: int, total_tokens: int)[source]

Bases: object

Usage statistics for the rerank request.

Parameters:

completion_tokens (int):: Number of tokens in the generated completion. Will always be 0 for rerank tasks.
prompt_tokens (int):: Number of tokens in the prompt. Will always be 0 for rerank tasks.
total_tokens (int):: Total number of tokens used in the request.

completion_tokens: int

static from_json(json: Dict[str, Any]) → RerankUsage[source]

prompt_tokens: int

total_tokens: int

class aleph_alpha_client.SemanticEmbeddingRequest(prompt: Prompt, representation: SemanticRepresentation, compress_to_size: int | None = None, normalize: bool = False, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True)[source]

Bases: object

Embeds a text and returns vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers).

Parameters:

prompt

The text and/or image(s) to be embedded.

representation

Semantic representation to embed the prompt with.

compress_to_size

Options available: 128

The default behavior is to return the full embedding, but you can optionally request an embedding compressed to a smaller set of dimensions.

Full embedding sizes for supported models:

luminous-base: 5120

The 128 size is expected to have a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical.

The 128 size can also perform better if you are embedding really short texts or documents.

normalize

Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric.

contextual_control_threshold (float, default None)

If set to None, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-None value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive (bool, default True)

True: apply control by adding the log(control_factor) to attention scores. False: apply control by (attention_scores - - attention_scores.min(-1)) * control_factor

Examples

>>> texts = [
        "deep learning",
        "artificial intelligence",
        "deep diving",
        "artificial snow",
    ]
>>> # Texts to compare
>>> embeddings = []
>>> for text in texts:
        request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
        result = model.semantic_embed(request)
        embeddings.append(result.embedding)

compress_to_size: int | None = None

contextual_control_threshold: float | None = None

control_log_additive: bool | None = True

normalize: bool = False

prompt: Prompt

representation: SemanticRepresentation

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.SemanticEmbeddingResponse(model_version: str, embedding: List[float], num_tokens_prompt_total: int, message: str | None = None)[source]

Bases: object

Response of a semantic embedding request

Parameters:

model_version: Model name and version (if any) of the used model for inference
embedding: A list of floats that can be used to compare against other embeddings.
message: This field is no longer used.

embedding: List[float]

static from_json(json: Dict[str, Any]) → SemanticEmbeddingResponse[source]

message: str | None = None

model_version: str

num_tokens_prompt_total: int

class aleph_alpha_client.SemanticRepresentation(*values)[source]

Bases: Enum

Available types of semantic representations that prompts can be embedded with.

Symmetric:

Symmetric is useful for comparing prompts to each other, in use cases such as clustering, classification, similarity, etc. Symmetric embeddings should be compared with other Symmetric embeddings.

Document:

Document and Query are used together in use cases such as search where you want to compare shorter queries against larger documents.

Document embeddings are optimized for larger pieces of text to compare queries against.

Query:

Document and Query are used together in use cases such as search where you want to compare shorter queries against larger documents.

Query embeddings are optimized for shorter texts, such as questions or keywords.

Document = 'document'

Query = 'query'

Symmetric = 'symmetric'

class aleph_alpha_client.TargetGranularity(*values)[source]

Bases: Enum

How many explanations should be returned in the output.

Complete:: Return one explanation for the entire target. Helpful in many cases to determine which parts of the prompt contribute overall to the given completion.
Token:: Return one explanation for each token in the target.

Complete = 'complete'

Token = 'token'

to_json() → str[source]

class aleph_alpha_client.TargetPromptItemExplanation(scores: List[TargetScore | TargetScoreWithRaw])[source]

Bases: object

Explains the importance of text in the target string that came before the currently to-be-explained target token. The amount of items in the “scores” array depends on the granularity setting. Each score object contains an inclusive start character and a length of the substring plus a floating point score value.

static from_json(item: Dict[str, Any]) → TargetPromptItemExplanation[source]

scores: List[TargetScore | TargetScoreWithRaw]

with_text(prompt: str) → TargetPromptItemExplanation[source]

class aleph_alpha_client.TargetScore(start: int, length: int, score: float)[source]

Bases: object

static from_json(score: Any) → TargetScore[source]

length: int

score: float

start: int

class aleph_alpha_client.Text(text: str, controls: Sequence[TextControl])[source]

Bases: object

A Text-prompt including optional controls for attention manipulation.

Parameters:

text (str, required):: The text prompt
controls (list of TextControl, required):: A list of TextControls to manipulate attention when processing the prompt. Can be empty if no manipulation is required.

Examples:

>>> Text("Hello, World!", controls=[TextControl(start=0, length=5, factor=0.5)])

controls: Sequence[TextControl]

static from_json(json: Mapping[str, Any]) → Text[source]

static from_text(text: str) → Text[source]

text: str

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.TextControl(start: int, length: int, factor: float, token_overlap: ControlTokenOverlap | None = None)[source]

Bases: object

Attention manipulation for a Text PromptItem.

Parameters:

start (int, required):

Starting character index to apply the factor to.

length (int, required):

The amount of characters to apply the factor to.

factor (float, required):

The amount to adjust model attention by. Values between 0 and 1 will suppress attention. A value of 1 will have no effect. Values above 1 will increase attention.

token_overlap (ControlTokenOverlap, optional):

What to do if a control partially overlaps with a text token.

If set to “partial”, the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5.

If set to “complete”, the full factor will be applied as long as the control overlaps with the token at all.

If not set, the API will default to “partial”.

factor: float

length: int

start: int

to_json() → Mapping[str, Any][source]

token_overlap: ControlTokenOverlap | None = None

class aleph_alpha_client.TextPromptItemExplanation(scores: List[TextScore | TextScoreWithRaw])[source]

Bases: object

Explains the importance of a text prompt item. The amount of items in the “scores” array depends on the granularity setting. Each score object contains an inclusive start character and a length of the substring plus a floating point score value.

static from_json(item: Dict[str, Any]) → TextPromptItemExplanation[source]

scores: List[TextScore | TextScoreWithRaw]

with_text(prompt: Text) → TextPromptItemExplanation[source]

class aleph_alpha_client.TextScore(start: int, length: int, score: float)[source]

Bases: object

static from_json(score: Any) → TextScore[source]

length: int

score: float

start: int

class aleph_alpha_client.TokenControl(pos: int, factor: float)[source]

Bases: object

Used for Attention Manipulation, for a given token index, you can supply the factor you want to adjust the attention by.

Parameters:

pos (int, required):: The index of the token in the prompt item that you want to apply the factor to.
factor (float, required):: The amount to adjust model attention by. Values between 0 and 1 will suppress attention. A value of 1 will have no effect. Values above 1 will increase attention.

Examples:

>>> Tokens([1, 2, 3], controls=[TokenControl(pos=1, factor=0.5)])

factor: float

pos: int

to_json() → Mapping[str, Any][source]

class aleph_alpha_client.TokenPromptItemExplanation(scores: List[TokenScore])[source]

Bases: object

Explains the importance of a request prompt item of type “token_ids”. Will contain one floating point importance value for each token in the same order as in the original prompt.

static from_json(item: Dict[str, Any]) → TokenPromptItemExplanation[source]

scores: List[TokenScore]

class aleph_alpha_client.TokenScore(score: float)[source]

Bases: object

static from_json(score: Any) → TokenScore[source]

score: float

class aleph_alpha_client.TokenizationRequest(prompt: str, tokens: bool, token_ids: bool)[source]

Bases: object

Describes a tokenization request.

Parameters

prompt (str): The text prompt which should be converted into tokens
tokens (bool): True to extract text-tokens
token_ids (bool): True to extract token-ids

Returns

TokenizationResponse

Examples:

>>> request = TokenizationRequest(prompt="This is an example.", tokens=True, token_ids=True)

prompt: str

to_json() → Mapping[str, Any][source]

token_ids: bool

tokens: bool

class aleph_alpha_client.TokenizationResponse(tokens: Sequence[str] | None = None, token_ids: Sequence[int] | None = None)[source]

Bases: object

static from_json(json: Dict[str, Any]) → TokenizationResponse[source]

token_ids: Sequence[int] | None = None

tokens: Sequence[str] | None = None

class aleph_alpha_client.Tokens(tokens: Sequence[int], controls: Sequence[TokenControl])[source]

Bases: object

A list of token ids to be sent as part of a prompt.

Parameters:

tokens (List(int), required):: The tokens you want to be passed to the model as part of your prompt.
controls (List(TokenControl), optional, default None):: Used for Attention Manipulation. Provides the ability to change attention for given token ids.

Examples:

>>> token_ids = Tokens([1, 2, 3], controls=[])
>>> prompt = Prompt([token_ids])

controls: Sequence[TokenControl]

static from_json(json: Mapping[str, Any]) → Tokens[source]

static from_token_ids(token_ids: Sequence[int]) → Tokens[source]

to_json() → Mapping[str, Any][source]: Serialize the prompt item to JSON for sending to the API.

tokens: Sequence[int]

aleph_alpha_client.load_base64_from_file(path_and_filename: str)[source]: load a file from disk and return the base64 encoded content

aleph_alpha_client.load_base64_from_url(url: str)[source]: download a file and return the base64 encoded content