                        ━━━━━━━━━━━━━━━━━━━━━━━
                         LLM PACKAGE FOR EMACS
                        ━━━━━━━━━━━━━━━━━━━━━━━





1 Introduction
══════════════

  This library provides an interface for interacting with Large Language
  Models (LLMs). It allows elisp code to use LLMs while also giving
  end-users the choice to select their preferred LLM. This is
  particularly beneficial when working with LLMs since various
  high-quality models exist, some of which have paid API access, while
  others are locally installed and free but offer medium
  quality. Applications using LLMs can utilize this library to ensure
  compatibility regardless of whether the user has a local LLM or is
  paying for API access.

  This library abstracts several kinds of features:
  • Chat functionality: the ability to query the LLM and get a response,
    and continue to take turns writing to the LLM and receiving
    responses.  The library supports both synchronous, asynchronous, and
    streaming responses.
  • Chat with image and other kinda of media inputs are also supported,
    so that the user can input images and discuss them with the LLM.
  • Function calling (aka "tool use") is supported, for having the LLM
    call elisp functions that it chooses, with arguments it provides.
  • Embeddings: Send text and receive a vector that encodes the semantic
    meaning of the underlying text.  Can be used in a search system to
    find similar passages.
  • Prompt construction: Create a prompt to give to an LLM from one more
    sources of data.

  Certain functionalities might not be available in some LLMs. Any such
  unsupported functionality will raise a `'not-implemented' signal, or
  it may fail in some other way.  Clients are recommended to check
  `llm-capabilities' when trying to do something beyond basic text chat.


2 Setting up providers
══════════════════════

  Users of an application that uses this package should not need to
  install it themselves. The llm package should be installed as a
  dependency when you install the package that uses it. However, you do
  need to require the llm module and set up the provider you will be
  using. Typically, applications will have a variable you can set. For
  example, let's say there's a package called "llm-refactoring", which
  has a variable `llm-refactoring-provider'. You would set it up like
  so:

  ┌────
  │ (use-package llm-refactoring
  │   :init
  │   (require 'llm-openai)
  │   (setq llm-refactoring-provider (make-llm-openai :key my-openai-key))
  └────

  Here `my-openai-key' would be a variable you set up before with your
  OpenAI key. Or, just substitute the key itself as a string. It's
  important to remember never to check your key into a public repository
  such as GitHub, because your key must be kept private. Anyone with
  your key can use the API, and you will be charged.

  You can also use a function as a key, so you can store your key in a
  secure place and retrieve it via a function.  For example, you could
  add a line to `~/.authinfo.gpg':

  ┌────
  │ machine llm.openai password <key>
  └────

  And then set up your provider like:
  ┌────
  │ (setq llm-refactoring-provider (make-llm-openai :key (plist-get (car (auth-source-search :host "llm.openai")) :secret)))
  └────

  All of the providers (except for `llm-fake'), can also take default
  parameters that will be used if they are not specified in the prompt.
  These are the same parameters as appear in the prompt, but prefixed
  with `default-chat-'.  So, for example, if you find that you like
  Ollama to be less creative than the default, you can create your
  provider like:

  ┌────
  │ (make-llm-ollama :embedding-model "mistral:latest" :chat-model "mistral:latest" :default-chat-temperature 0.1)
  └────

  For embedding users. if you store the embeddings, you *must* set the
  embedding model.  Even though there's no way for the llm package to
  tell whether you are storing it, if the default model changes, you may
  find yourself storing incompatible embeddings.


2.1 Open AI
───────────

  You can set up with `make-llm-openai', with the following parameters:
  • `:key', the Open AI key that you get when you sign up to use Open
    AI's APIs.  Remember to keep this private.  This is non-optional.
  • `:chat-model': A model name from the [list of Open AI's model
    names.]  Keep in mind some of these are not available to everyone.
    This is optional, and will default to a reasonable model.
  • `:embedding-model': A model name from [list of Open AI's embedding
    model names.]  This is optional, and will default to a reasonable
    model.


[list of Open AI's model names.]
<https://platform.openai.com/docs/models/gpt-4>

[list of Open AI's embedding model names.]
<https://platform.openai.com/docs/guides/embeddings/embedding-models>


2.2 Open AI Compatible
──────────────────────

  There are many Open AI compatible APIs and proxies of Open AI.  You
  can set up one with `make-llm-openai-compatible', with the following
  parameter:
  • `:url', the URL of leading up to the command ("embeddings" or
    "chat/completions").  So, for example,
    "<https://api.openai.com/v1/>" is the URL to use Open AI (although
    if you wanted to do that, just use `make-llm-openai' instead.


2.3 Azure's Open AI
───────────────────

  Microsoft Azure has an Open AI integration, although it doesn't
  support everything Open AI does, such as function calling.  You can
  set it up with `make-llm-azure', with the following parameter:
  • `:url', the endpoint URL, such as
    "<https://docs-test-001.openai.azure.com/>".
  • `:key', the Azure key for Azure OpenAI service.
  • `:chat-model', the chat model, which must be deployed in Azure.
  • `embedding-model', the embedding model which must be deployed in
    Azure.


2.4 Gemini (not via Google Cloud)
─────────────────────────────────

  This is Google's AI model.  You can get an API key via their [page on
  Google AI Studio].  Set this up with `make-llm-gemini', with the
  following parameters:
  • `:key', the Google AI key that you get from Google AI Studio.
  • `:chat-model', the model name, from the [list] of models.  This is
    optional and will default to the text Gemini model.
  • `:embedding-model': the model name, currently must be
    "embedding-001".  This is optional and will default to
    "embedding-001".


[page on Google AI Studio] <https://makersuite.google.com/app/apikey>

[list] <https://ai.google.dev/models>


2.5 Vertex (Gemini via Google Cloud)
────────────────────────────────────

  This is mostly for those who want to use Google Cloud specifically,
  most users should use Gemini instead, which is easier to set up.

  You can set up with `make-llm-vertex', with the following parameters:
  • `:project': Your project number from Google Cloud that has Vertex
    API enabled.
  • `:chat-model': A model name from the [list of Vertex's model names.]
    This is optional, and will default to a reasonable model.
  • `:embedding-model': A model name from the [list of Vertex's
    embedding model names.]  This is optional, and will default to a
    reasonable model.

  In addition to the provider, which you may want multiple of (for
  example, to charge against different projects), there are customizable
  variables:
  • `llm-vertex-gcloud-binary': The binary to use for generating the API
    key.
  • `llm-vertex-gcloud-region': The gcloud region to use.  It's good to
    set this to a region near where you are for best latency.  Defaults
    to "us-central1".

    If you haven't already, you must run the following command before
    using this:
    ┌────
    │ gcloud beta services identity create --service=aiplatform.googleapis.com --project=PROJECT_ID
    └────


[list of Vertex's model names.]
<https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts#supported_model>

[list of Vertex's embedding model names.]
<https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#supported_models>


2.6 Claude
──────────

  [Claude] is Anthropic's large language model.  It does not support
  embeddings.  It does support function calling, but currently not in
  streaming.  You can set it up with the following parameters:

  `:key': The API key you get from [Claude's settings page].  This is
  required.  `:chat-model': One of the [Claude models].  Defaults to
  "claude-3-opus-20240229", the most powerful model.


[Claude] <https://docs.anthropic.com/claude/docs/intro-to-claude>

[Claude's settings page] <https://console.anthropic.com/settings/keys>

[Claude models] <https://docs.anthropic.com/claude/docs/models-overview>


2.7 Ollama
──────────

  [Ollama] is a way to run large language models locally. There are
  [many different models] you can use with it, and some of them support
  function calling. You set it up with the following parameters:
  • `:scheme': The scheme (http/https) for the connection to ollama.
    This default to "http".
  • `:host': The host that ollama is run on.  This is optional and will
    default to localhost.
  • `:port': The port that ollama is run on.  This is optional and will
    default to the default ollama port.
  • `:chat-model': The model name to use for chat.  This is not optional
    for chat use, since there is no default.
  • `:embedding-model': The model name to use for embeddings (only [some
    models](<https://ollama.com/search?q=&c=embedding>) can be used for
    embeddings.  This is not optional for embedding use, since there is
    no default.


[Ollama] <https://ollama.ai/>

[many different models] <https://ollama.ai/library>


2.8 GPT4All
───────────

  [GPT4All] is a way to run large language models locally.  To use it
  with `llm' package, you must click "Enable API Server" in the
  settings.  It does not offer embeddings or streaming functionality,
  though, so Ollama might be a better fit for users who are not already
  set up with local models.  You can set it up with the following
  parameters:
  • `:host': The host that GPT4All is run on.  This is optional and will
    default to localhost.
  • `:port': The port that GPT4All is run on.  This is optional and will
    default to the default ollama port.
  • `:chat-model': The model name to use for chat.  This is not optional
    for chat use, since there is no default.


[GPT4All] <https://gpt4all.io/index.html>


2.9 llama.cpp
─────────────

  [llama.cpp] is a way to run large language models locally.  To use it
  with the `llm' package, you need to start the server (with the
  "–embedding" flag if you plan on using embeddings).  The server must
  be started with a model, so it is not possible to switch models until
  the server is restarted to use the new model.  As such, model is not a
  parameter to the provider, since the model choice is already set once
  the server starts.

  There is a deprecated provider, however it is no longer needed.
  Instead, llama cpp is Open AI compatible, so the Open AI Compatible
  provider should work.


[llama.cpp] <https://github.com/ggerganov/llama.cpp>


2.10 Fake
─────────

  This is a client that makes no call, but it just there for testing and
  debugging.  Mostly this is of use to programmatic clients of the llm
  package, but end users can also use it to understand what will be sent
  to the LLMs.  It has the following parameters:
  • `:output-to-buffer': if non-nil, the buffer or buffer name to append
    the request sent to the LLM to.
  • `:chat-action-func': a function that will be called to provide a
    string or symbol and message cons which are used to raise an error.
  • `:embedding-action-func': a function that will be called to provide
    a vector or symbol and message cons which are used to raise an
    error.


3 Models
════════

  When picking a chat or embedding model, anything can be used, as long
  as the service thinks it is valid.  However, models vary on context
  size and capabilities.  The `llm-prompt' module, and any client, can
  depend on the context size of the model via `llm-chat-token-limit'.
  Similarly, some models have different capabilities, exposed in
  `llm-capabilities'.  The `llm-models' module defines a list of popular
  models, but this isn't a comprehensive list.  If you want to add a
  model, it is fairly easy to do, for example here is adding the Mistral
  model (which is already included, though):

  ┌────
  │ (require 'llm-models)
  │ (add-to-list
  │  'llm-models
  │  (make-llm-model
  │   :name "Mistral" :symbol 'mistral
  │   :capabilities '(generation tool-use free-software)
  │   :context-length 8192
  │   :regex "mistral"))
  └────

  The `:regex' needs to uniquely identify the model passed in from a
  provider's chat or embedding model.

  Once this is done, the model will be recognized to have the given
  context length and capabilities.


4 `llm' and the use of non-free LLMs
════════════════════════════════════

  The `llm' package is part of GNU Emacs by being part of GNU ELPA.
  Unfortunately, the most popular LLMs in use are non-free, which is not
  what GNU software should be promoting by inclusion.  On the other
  hand, by use of the `llm' package, the user can make sure that any
  client that codes against it will work with free models that come
  along.  It's likely that sophisticated free LLMs will, emerge,
  although it's unclear right now what free software means with respect
  to LLMs.  Because of this tradeoff, we have decided to warn the user
  when using non-free LLMs (which is every LLM supported right now
  except the fake one).  You can turn this off the same way you turn off
  any other warning, by clicking on the left arrow next to the warning
  when it comes up.  Alternatively, you can set `llm-warn-on-nonfree' to
  `nil'.  This can be set via customization as well.

  To build upon the example from before:
  ┌────
  │ (use-package llm-refactoring
  │   :init
  │   (require 'llm-openai)
  │   (setq llm-refactoring-provider (make-llm-openai :key my-openai-key)
  │ 	llm-warn-on-nonfree nil)
  └────


5 Programmatic use
══════════════════

  Client applications should require the `llm' package, and code against
  it.  Most functions are generic, and take a struct representing a
  provider as the first argument. The client code, or the user
  themselves can then require the specific module, such as `llm-openai',
  and create a provider with a function such as `(make-llm-openai :key
  user-api-key)'.  The client application will use this provider to call
  all the generic functions.

  For all callbacks, the callback will be executed in the buffer the
  function was first called from.  If the buffer has been killed, it
  will be executed in a temporary buffer instead.


5.1 Main functions
──────────────────

  • `llm-chat provider prompt': With user-chosen `provider' , and a
    `llm-chat-prompt' structure (created by `llm-make-chat-prompt'),
    send that prompt to the LLM and wait for the string output.
  • `llm-chat-async provider prompt response-callback error-callback':
    Same as `llm-chat', but executes in the background.  Takes a
    `response-callback' which will be called with the text response.
    The `error-callback' will be called in case of error, with the error
    symbol and an error message.
  • `llm-chat-streaming provider prompt partial-callback
    response-callback error-callback': Similar to `llm-chat-async', but
    request a streaming response.  As the response is built up,
    `partial-callback' is called with the all the text retrieved up to
    the current point.  Finally, `reponse-callback' is called with the
    complete text.
  • `llm-embedding provider string': With the user-chosen `provider',
    send a string and get an embedding, which is a large vector of
    floating point values.  The embedding represents the semantic
    meaning of the string, and the vector can be compared against other
    vectors, where smaller distances between the vectors represent
    greater semantic similarity.
  • `llm-embedding-async provider string vector-callback
    error-callback': Same as `llm-embedding' but this is processed
    asynchronously. `vector-callback' is called with the vector
    embedding, and, in case of error, `error-callback' is called with
    the same arguments as in `llm-chat-async'.
  • `llm-batch-embedding provider strings': same as `llm-embedding', but
    takes in a list of strings, and returns a list of vectors whose
    order corresponds to the ordering of the strings.
  • `llm-batch-embedding-async provider strings vectors-callback
    error-callback': same as `llm-embedding-async', but takes in a list
    of strings, and returns a list of vectors whose order corresponds to
    the ordering of the strings.
  • `llm-count-tokens provider string': Count how many tokens are in
    `string'.  This may vary by `provider', because some provideres
    implement an API for this, but typically is always about the same.
    This gives an estimate if the provider has no API support.
  • `llm-cancel-request request' Cancels the given request, if possible.
    The `request' object is the return value of async and streaming
    functions.
  • `llm-name provider'.  Provides a short name of the model or
    provider, suitable for showing to users.
  • `llm-chat-token-limit'.  Gets the token limit for the chat model.
    This isn't possible for some backends like `llama.cpp', in which the
    model isn't selected or known by this library.

    And the following helper functions:
    • `llm-make-chat-prompt text &keys context examples functions
      temperature max-tokens response-format non-standard-params': This
      is how you make prompts.  `text' can be a string (the user input
      to the llm chatbot), or a list representing a series of
      back-and-forth exchanges, of odd number, with the last element of
      the list representing the user's latest input.  This supports
      inputting context (also commonly called a system prompt, although
      it isn't guaranteed to replace the actual system prompt),
      examples, and other important elements, all detailed in the
      docstring for this function.  `response-format' can be `'json', to
      force JSON output, or a JSON schema (see below) but the prompt
      also needs to mention and ideally go into detail about what kind
      of JSON response is desired.  Providers with the `json-response'
      capability support JSON output, and it will be ignored if
      unsupported.  The `non-standard-params' let you specify other
      options that might vary per-provider, and for this, the
      correctness is up to the client.
    • `llm-chat-prompt-to-text prompt': From a prompt, return a string
      representation.  This is not usually suitable for passing to LLMs,
      but for debugging purposes.
    • `llm-chat-streaming-to-point provider prompt buffer point
      finish-callback': Same basic arguments as `llm-chat-streaming',
      but will stream to `point' in `buffer'.
    • `llm-chat-prompt-append-response prompt response role': Append a
      new response (from the user, usually) to the prompt.  The `role'
      is optional, and defaults to `'user'.


5.1.1 JSON schema
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  By using the `response-format' argument to `llm-make-chat-prompt', you
  can ask the LLM to return items according to a specified JSON schema,
  based on the [JSON Schema Spec].  Not everything is supported, but the
  most commonly used parts are.  To specify the JSON schema, we use a
  plist-based approach.  JSON objects are defined with `(:type object
  :properties (:<var1> <schema1> :<var2> <schema2> ... :<varn>
  <scheman>) :required (<req var1> ... <req varn>))'.  Arrays are
  defined with `(:type array :items <schema>)'.  Enums are defined with
  `(:enum (<val1> <val2> <val3>))'.  You can also request integers,
  strings, and other types defined by the JSON Schema Spec, by just
  having `(:type <type>)'.  Typically, LLMs often require the top-level
  schema object to be an object, and often that all properties on the
  top-level object must be required.

  Some examples:
  ┌────
  │ (llm-chat my-provider (llm-make-chat-prompt
  │ 				"How many countries are there?  Return the result as JSON."
  │ 				:response-format
  │ 				'(:type object :properties (:num (:type integer)) :required (num))))
  └────

  ┌────
  │ (llm-chat ash/llm-openai-small (llm-make-chat-prompt
  │ 				"Which editor is hard to quit?  Return the result as JSON."
  │ 				:response-format
  │ 				'(:type object :properties (:editor (:enum ("emacs" "vi" "vscode"))
  │ 								    :authors (:type array :items (:type string)))
  │ 					:required (editor authors))))
  └────


[JSON Schema Spec] <https://json-schema.org>


5.2 Logging
───────────

  Interactions with the `llm' package can be logged by setting `llm-log'
  to a non-nil value.  This should be done only when developing.  The
  log can be found in the `*llm log*' buffer.


5.3 How to handle conversations
───────────────────────────────

  Conversations can take place by repeatedly calling `llm-chat' and its
  variants.  The prompt should be constructed with
  `llm-make-chat-prompt'. For a conversation, the entire prompt must be
  kept as a variable, because the `llm-chat-prompt-interactions' slot
  will be getting changed by the chat functions to store the
  conversation.  For some providers, this will store the history
  directly in `llm-chat-prompt-interactions', but other LLMs have an
  opaque conversation history.  For that reason, the correct way to
  handle a conversation is to repeatedly call `llm-chat' or variants
  with the same prompt structure, kept in a variable, and after each
  time, add the new user text with `llm-chat-prompt-append-response'.
  The following is an example:

  ┌────
  │ (defvar-local llm-chat-streaming-prompt nil)
  │ (defun start-or-continue-conversation (text)
  │   "Called when the user has input TEXT as the next input."
  │   (if llm-chat-streaming-prompt
  │       (llm-chat-prompt-append-response llm-chat-streaming-prompt text)
  │     (setq llm-chat-streaming-prompt (llm-make-chat-prompt text))
  │     (llm-chat-streaming-to-point provider llm-chat-streaming-prompt (current-buffer) (point-max) (lambda ()))))
  └────


5.4 Caution about `llm-chat-prompt-interactions'
────────────────────────────────────────────────

  The interactions in a prompt may be modified by conversation or by the
  conversion of the context and examples to what the LLM understands.
  Different providers require different things from the interactions.
  Some can handle system prompts, some cannot.  Some require alternating
  user and assistant chat interactions, others can handle anything.
  It's important that clients keep to behaviors that work on all
  providers.  Do not attempt to read or manipulate
  `llm-chat-prompt-interactions' after initially setting it up for the
  first time, because you are likely to make changes that only work for
  some providers.  Similarly, don't directly create a prompt with
  `make-llm-chat-prompt', because it is easy to create something that
  wouldn't work for all providers.


5.5 Function calling
────────────────────

  *Note: function calling functionality is currently beta quality.  If
   you want to use function calling, please watch the `llm'
   [discussions] for any announcements about changes.*

  Function calling is a way to give the LLM a list of functions it can
  call, and have it call the functions for you.  The standard
  interaction has the following steps:
  1. The client sends the LLM a prompt with functions it can call.
  2. The LLM may return which functions to execute, and with what
     arguments, or text as normal.
  3. If the LLM has decided to call one or more functions, those
     functions should be called, and their results sent back to the LLM.
  4. The LLM will return with a text response based on the initial
     prompt and the results of the function calling.
  5. The client can now can continue the conversation.

  This basic structure is useful because it can guarantee a
  well-structured output (if the LLM does decide to call the
  function). *Not every LLM can handle function calling, and those that
  do not will ignore the functions entirely*. The function
  `llm-capabilities' will return a list with `function-calls' in it if
  the LLM supports function calls. Right now only Gemini, Vertex,
  Claude, and Open AI support function calling. Ollama should get
  function calling soon. However, even for LLMs that handle function
  calling, there is a fair bit of difference in the capabilities. Right
  now, it is possible to write function calls that succeed in Open AI
  but cause errors in Gemini, because Gemini does not appear to handle
  functions that have types that contain other types.  So client
  programs are advised for right now to keep function to simple types.

  The way to call functions is to attach a list of functions to the
  `llm-function-call' slot in the prompt. This is a list of
  `llm-function-call' structs, which takes a function, a name, a
  description, and a list of `llm-function-arg' structs. The docstrings
  give an explanation of the format.

  The various chat APIs will execute the functions defined in
  `llm-function-call' with the arguments supplied by the LLM. Instead of
  returning (or passing to a callback) a string, instead an alist will
  be returned of function names and return values.

  After sending a function call, the client could use the result, but if
  you want to proceed with the conversation, or get a textual response
  that accompany the function you should just send the prompt back with
  no modifications.  This is because the LLM gives the function call to
  make as a response, and then expects to get back the results of that
  function call.  The results were already executed at the end of the
  previous call, which also stores the result of that execution in the
  prompt.  This is why it should be sent back without further
  modifications.

  Be aware that there is no gaurantee that the function will be called
  correctly.  While the LLMs mostly get this right, they are trained on
  Javascript functions, so imitating Javascript names is
  recommended. So, "write_email" is a better name for a function than
  "write-email".

  Examples can be found in `llm-tester'. There is also a function call
  to generate function calls from existing elisp functions in
  `utilities/elisp-to-function-call.el'.


[discussions] <https://github.com/ahyatt/llm/discussions>


5.6 Media input
───────────────

  *Note: media input functionality is currently alpha quality.  If you
   want to use it, please watch the `llm' [discussions] for any
   announcements about changes.*

  Media can be used in `llm-chat' and related functions.  To use media,
  you can use `llm-multipart' in `llm-make-chat-prompt', and pass it an
  Emacs image or an `llm-media' object for other kinds of media.
  Besides images, some models support video and audio.  Not all
  providers or models support these, with images being the most
  frequently supported media type, and video and audio more rare.


[discussions] <https://github.com/ahyatt/llm/discussions>


5.7 Advanced prompt creation
────────────────────────────

  The `llm-prompt' module provides helper functions to create prompts
  that can incorporate data from your application.  In particular, this
  should be very useful for application that need a lot of context.

  A prompt defined with `llm-prompt' is a template, with placeholders
  that the module will fill in.  Here's an example of a prompt
  definition, from the [ekg] package:

  ┌────
  │ (llm-defprompt ekg-llm-fill-prompt
  │   "The user has written a note, and would like you to append to it,
  │ to make it more useful.  This is important: only output your
  │ additions, and do not repeat anything in the user's note.  Write
  │ as a third party adding information to a note, so do not use the
  │ first person.
  │ 
  │ First, I'll give you information about the note, then similar
  │ other notes that user has written, in JSON.  Finally, I'll give
  │ you instructions.  The user's note will be your input, all the
  │ rest, including this, is just context for it.  The notes given
  │ are to be used as background material, which can be referenced in
  │ your answer.
  │ 
  │ The user's note uses tags: {{tags}}.  The notes with the same
  │ tags, listed here in reverse date order: {{tag-notes:10}}
  │ 
  │ These are similar notes in general, which may have duplicates
  │ from the ones above: {{similar-notes:1}}
  │ 
  │ This ends the section on useful notes as a background for the
  │ note in question.
  │ 
  │ Your instructions on what content to add to the note:
  │ 
  │ {{instructions}}
  │ ")
  └────

  When this is filled, it is done in the context of a provider, which
  has a known context size (via `llm-chat-token-limit').  Care is taken
  to not overfill the context, which is checked as it is filled via
  `llm-count-tokens'.  We usually want to not fill the whole context,
  but instead leave room for the chat and subsequent terms.  The
  variable `llm-prompt-default-max-pct' controls how much of the context
  window we want to fill.  The way we estimate the number of tokens used
  is quick but inaccurate, so limiting to less than the maximum context
  size is useful for guarding against a miscount leading to an error
  calling the LLM due to too many tokens.  If you want to have a hard
  limit as well that doesn't depend on the context window size, you can
  use `llm-prompt-default-max-tokens'.  We will use the minimum of
  either value.

  Variables are enclosed in double curly braces, like this:
  `{{instructions}}'.  They can just be the variable, or they can also
  denote a number of tickets, like so: `{{tag-notes:10}}'.  Tickets
  should be thought of like lottery tickets, where the prize is a single
  round of context filling for the variable.  So the variable
  `tag-notes' gets 10 tickets for a drawing.  Anything else where
  tickets are unspecified (unless it is just a single variable, which
  will be explained below) will get a number of tickets equal to the
  total number of specified tickets.  So if you have two variables, one
  with 1 ticket, one with 10 tickets, one will be filled 10 times more
  than the other.  If you have two variables, one with 1 ticket, one
  unspecified, the unspecified one will get 1 ticket, so each will have
  an even change to get filled.  If no variable has tickets specified,
  each will get an equal chance.  If you have one variable, it could
  have any number of tickets, but the result would be the same, since it
  would win every round.  This algorithm is the contribution of David
  Petrou.

  The above is true of variables that are to be filled with a sequence
  of possible values.  A lot of LLM context filling is like this.  In
  the above example, `{{similar-notes}}' is a retrieval based on a
  similarity score.  It will continue to fill items from most similar to
  least similar, which is going to return almost everything the ekg app
  stores.  We want to retrieve only as needed.  Because of this, the
  `llm-prompt' module takes in /generators/ to supply each variable.
  However, a plain list is also acceptable, as is a single value.  Any
  single value will not enter into the ticket system, but rather be
  prefilled before any tickets are used.

  Values supplied in either the list or generators can be the values
  themselves, or conses.  If a cons, the variable to fill is the `car'
  of the cons, and the `cdr' is the place to fill the new value, `front'
  or `back'.  The `front' is the default: new values will be appended to
  the end.  `back' will add new values to the start of the filled text
  for the variable instead.

  So, to illustrate with this example, here's how the prompt will be
  filled:

  1. First, the `{{tags}}' and `{{instructions}}' will be filled first.
     This will happen regardless before we check the context size, so
     the module assumes that these will be small and not blow up the
     context.
  2. Check the context size we want to use (`llm-prompt-default-max-pct'
     multiplied by `llm-chat-token-limit') and exit if exceeded.
  3. Run a lottery with all tickets and choose one of the remaining
     variables to fill.
  4. If the variable won't make the text too large, fill the variable
     with one entry retrieved from a supplied generator, otherwise
     ignore.  These are values are not conses, so values will be
     appended to the end of the generated text for each variable (so a
     new variable generated for tags will append after other generated
     tags but before the subsequent "and" in the text.
  5. Goto 2

  The prompt can be filled two ways, one using predefined prompt
  template (`llm-defprompt' and `llm-prompt-fill'), the other using a
  prompt template that is passed in (`llm-prompt-fill-text').

  ┌────
  │ (llm-defprompt my-prompt "My name is {{name}} and I'm here's to say {{messages}}")
  │ 
  │ (llm-prompt-fill 'my-prompt my-llm-provider :name "Pat" :messages #'my-message-retriever)
  │ 
  │ (iter-defun my-message-retriever ()
  │   "Return the messages I like to say."
  │   (my-message-reset-messages)
  │   (while (my-has-next-message)
  │     (iter-yield (my-get-next-message))))
  └────

  Alternatively, you can just fill it directly:
  ┌────
  │ (llm-prompt-fill-text "Hi, I'm {{name}} and I'm here to say {{messages}}"
  │ 		      :name "John" :messages #'my-message-retriever)
  └────

  As you can see in the examples, the variable values are passed in with
  matching keys.


[ekg] <https://github.com/ahyatt/ekg>


6 Contributions
═══════════════

  If you are interested in creating a provider, please send a pull
  request, or open a bug.  This library is part of GNU ELPA, so any
  major provider that we include in this module needs to be written by
  someone with FSF papers.  However, you can always write a module and
  put it on a different package archive, such as MELPA.
