Skip to content

API Reference

Base URL: https://api.minima.sh. Base path: /v1. All request and response bodies are JSON. Interactive OpenAPI docs are served at https://api.minima.sh/docs.

Minima uses pass-through auth: present your own Mubit API key as a bearer token. There is no separate Minima key — Minima forwards the key to Mubit to read and write your task → model → outcome history on your behalf.

Authorization: Bearer mbt_<instance>_<keyid>_<secret>

A missing or invalid key returns 401. Your Mubit key determines which Mubit instance Minima reads and writes — it’s your data boundary. GET /v1/health is the only endpoint that works without a key (it returns service liveness only).

user_id and namespace are scoping fields within your Mubit instance, not auth boundaries — use them to partition recall and learning across teams, projects, or environments.

Errors are returned as application/problem+json (RFC 7807-style):

{ "type": "about:blank", "title": "No candidate models", "status": 422,
"detail": "no models match the supplied constraints" }
StatusTitleWhen
400Invalid requestRequest body fails validation.
401UnauthorizedMissing or invalid Mubit key.
422No candidate modelsConstraints eliminated every catalog model.

Recommend a model for a single task.

FieldTypeDefaultNotes
taskTaskInputrequiredThe task to route (see below).
cost_quality_tradeofffloat 0–105.00 = cheapest acceptable, 10 = highest quality. Sets the quality threshold τ.
constraintsConstraints{}Hard limits on the candidate set (see below).
user_idstring | nullnullWithin-account actor label. Scopes recall.
namespacestring | nullnullWithin-account sub-scope (team / project / environment). Isolates recall and learning.
max_candidatesint 1–648Cap on candidates considered.
allow_llm_escalationbooltrueAllow the cheap-LLM reasoner when evidence is thin.
explainbooltrueInclude evidence[] refs on each ranked model.
baseline_model_idstring | nullnullThe model you would have used without Minima. Powers the honest vs_declared baseline in GET /v1/savings.
FieldTypeDefaultNotes
taskstringrequiredRaw task/prompt text; embedded by Mubit for recall.
task_typeenum | nullnullcode | summarization | extraction | qa | reasoning | classification | translation | creative | rag | tool_use | other. Heuristic-classified if omitted.
difficultyenum | nullnulltrivial | easy | medium | hard | expert.
expected_input_tokensint ≥ 0 | nullnullFeeds the cost estimate; a per-task-type default is used if omitted.
expected_output_tokensint ≥ 0 | nullnullFeeds the cost estimate.
tagsstring[][]Propagated to Mubit env_tags (e.g. lang:python) for version-aware recall.
FieldTypeNotes
allowed_providersstring[] | nullWhitelist by provider.
candidate_modelsstring[] | nullRestrict to these model ids.
excluded_modelsstring[] | nullBlacklist by model id.
max_cost_per_callfloat ≥ 0 | nullUSD hard filter on estimated cost.
min_qualityfloat 0–1 | nullPredicted-success floor; raises τ.
require_prompt_cachingboolKeep only models that support prompt caching.
max_latency_msint > 0 | nullDrop candidates whose observed latency (p75 from similar past outcomes) exceeds this. A model with no latency history is never excluded.
require_context_windowint > 0 | nullKeep only models with at least this context window.
FieldTypeNotes
recommendation_idstringQuote this back to POST /v1/feedback.
recommended_modelRankedModelThe chosen model.
rankedRankedModel[]Every candidate, sorted by final score.
fallback_modelRankedModel | nullA more reliable retry target.
confidencefloat 0–1Overall confidence in the pick.
decision_basisenummemory | prior | llm — which path produced the pick.
threshold_usedfloatThe quality threshold τ applied.
classified_task_typeenumFinal task type used.
classified_difficultyenumFinal difficulty used.
catalog_versionstringCatalog version that priced the candidates.
catalog_staleboolPrices older than the staleness window.
latency_msintMinima-side recommendation latency.
warningsstring[]See Warnings below.
selection_policyenumargmin (deterministic cheapest-eligible, the default) | epsilon_softmax (account opted into bounded exploration within the eligible set).
FieldTypeNotes
model_idstring
providerstring
predicted_successfloat 0–1Probability the model clears the task.
est_cost_usdfloat ≥ 0Estimated cost for this request, per the chosen cost basis.
est_cost_breakdownobjectKeys depend on the basis: {rescaled, obs_output_tokens}, {observed_avg}, or {input, output}. See Cost-basis tiers.
scorefloatFinal objective score; the sorting key.
rationalestringHuman-readable reason (tags cost as obs or est).
decision_basisenumPer-model basis: memory | prior | llm.
evidenceEvidenceRef[]Recalled neighbors that informed this candidate (empty if explain=false).
supports_prompt_cachingbool
context_windowint
est_latency_msfloat | nullObserved latency percentile from similar past outcomes; null without latency history.
latency_basisstringe.g. observed_p75; empty without latency history.
FieldTypeNotes
entry_idstringMubit QueryEvidence.id (used for outcome attribution).
reference_idstring | nullStable reference id.
model_idstringThe model this past outcome was about.
scorefloatRetrieval similarity.
knowledge_confidencefloat 0–1Mubit’s reliability estimate for the entry.
observed_successfloat 0–1The recorded quality of that past outcome.
is_staleboolWhether the entry is marked stale.
WarningMeaning
cold_startNo recalled outcomes; prior-only.
recall_timeoutMubit recall exceeded the timeout; prior-only.
memory_unavailableRecall errored; prior-only.
prices_staleCatalog prices older than the staleness window.
no_model_meets_thresholdNo candidate cleared τ; recommended the highest-success one.
no_model_within_cost_budgetmax_cost_per_call eliminated all; constraint relaxed for ranking.
no_model_within_latency_budgetmax_latency_ms eliminated every candidate with latency history; constraint relaxed for ranking.
escalation_suggested:<reason>Escalation criteria met (thin_evidence, low_confidence, tie, conflict, wide_interval).
exploration_pickThe opt-in exploration policy sampled a different eligible model than the deterministic pick.
reasoner_consultedThe cheap-LLM reasoner was consulted and changed scores.
reasoner_failedThe reasoner errored or returned unusable output; deterministic result used.
reasoner_disabledEscalation suggested but no reasoner is configured.
llm_classifiedThe reasoner refined an ambiguous task classification.
Terminal window
curl -s https://api.minima.sh/v1/recommend \
-H "authorization: Bearer $MUBIT_API_KEY" \
-H 'content-type: application/json' -d '{
"task": {"task": "Write a Python function that merges k sorted linked lists.",
"task_type": "code", "difficulty": "hard",
"expected_input_tokens": 180, "expected_output_tokens": 600,
"tags": ["lang:python"]},
"cost_quality_tradeoff": 3,
"constraints": {"min_quality": 0.8, "excluded_models": ["some-deprecated-model"]},
"namespace": "team-payments"
}' | jq

POST /v1/recommend/workflow workflow

Section titled “POST /v1/recommend/workflow ”

Recommend a model for each step of a multi-step workflow. Each step runs the same engine independently and gets its own recommendation_id for per-step feedback.

FieldTypeDefaultNotes
stepsWorkflowStep[]requiredThe steps to route.
cost_quality_tradeofffloat 0–105.0Applied to every step.
constraintsConstraints{}Global constraints; each step may override.
user_idstring | nullnull
namespacestring | nullnull
allow_llm_escalationbooltrue
FieldTypeNotes
step_idstringCaller-defined id (echoed in the response).
taskTaskInputThe step’s task.
constraintsConstraints | nullPer-step override, merged over the global constraints.
depends_onstring[]Declared dependencies (informational; steps are scored independently).
FieldTypeNotes
workflow_recommendation_idstringId for the whole workflow.
stepsStepRecommendation[]{step_id, recommendation: RecommendResponse} per step.
total_est_cost_usdfloatSum of recommended-model costs across steps.
total_est_cost_if_all_premiumfloatSum if each step used its most expensive candidate — the savings baseline.
confidencefloat 0–1Mean step confidence.

See the multi-step workflow example.


Report an outcome and close the learning loop. This reinforces the memories that drove the recommendation and records realized cost/token history that powers the observed and rescaled cost-basis tiers.

FieldTypeDefaultNotes
recommendation_idstringrequiredFrom a prior /recommend (or a workflow step).
chosen_model_idstringrequiredThe model you actually ran (may differ from the recommendation).
outcomeenumrequiredsuccess | partial | failure.
quality_scorefloat 0–1 | nullnullCaller-supplied; no LLM judge. Defaults applied per outcome if omitted (0.9 / 0.5 / 0.1). A score that flatly contradicts the outcome label (e.g. failure at 0.95) is clamped into a consistent band with a quality_outcome_mismatch warning.
input_tokensint ≥ 0 | nullnullRealized input tokens — populate this to enable the rescaled cost tier.
output_tokensint ≥ 0 | nullnullRealized output tokens (captures reasoning/thinking) — populate this for the rescaled tier.
actual_cost_usdfloat ≥ 0 | nullnullRealized $/call — enables the observed cost tier.
latency_msint ≥ 0 | nullnull
verified_in_productionboolfalseMarks a real production outcome; gates lesson promotion.
notesstring | nullnull
idempotency_keystring | nullnullDedupe key; derived from recommendation_id + model if omitted.
FieldTypeNotes
acceptedboolfalse with a warning on failure.
record_idstring | nullThe Mubit id of the upserted outcome record.
reinforced_entry_idsstring[]The neighbor entry ids credited.
updated_confidencefloat | nullMubit’s updated knowledge_confidence for the primary entry.
reflection_triggeredboolWhether reflection fired this call.
lesson_promotedboolWhether a durable lesson was promoted.
warningsstring[]unknown_recommendation, memory_write_failed, reinforcement_failed, lesson_promotion_failed, quality_outcome_mismatch, late_feedback_no_attribution.
Terminal window
curl -s https://api.minima.sh/v1/feedback \
-H "authorization: Bearer $MUBIT_API_KEY" \
-H 'content-type: application/json' -d '{
"recommendation_id": "…",
"chosen_model_id": "claude-haiku-4-5",
"outcome": "success",
"quality_score": 0.95,
"input_tokens": 180, "output_tokens": 640, "actual_cost_usd": 0.0034,
"verified_in_production": true
}' | jq

The current model catalog (cost + capability priors).

ParamTypeDefaultNotes
providerstringFilter by provider (case-insensitive).
task_typeenumKeep only models with a capability prior for this task type.
max_costfloatKeep only models whose max(input, output) $/Mtok ≤ this.
include_stalebooltruePrefer fresh-priced models when false.

{ models: ModelCard[], catalog_version, refreshed_at, stale }, sorted by input price.

ModelCard fields include: model_id, provider, display_name, input_cost_per_mtok, output_cost_per_mtok, cache_read_cost_per_mtok, supports_prompt_caching, context_window, max_output_tokens, capability_priors, capability_by_task_type, cost_source, cost_fetched_at, cost_stale, capability_source.


Surfaces the rules Mubit has promoted for a namespace — the “why” behind routing patterns.

ParamTypeDefaultNotes
namespacestringWithin-account sub-scope to read strategies for.
lesson_typesstring[]Filter by lesson type.
max_strategiesint 1–505

{ namespace, lane, strategies: Strategy[], count }, where each Strategy has strategy_id, description, supporting_lesson_count, avg_confidence, avg_reinforcement, dominant_lesson_type, dominant_scope, lesson_ids[].


GET /v1/savings measurement

Section titled “GET /v1/savings ”

What Minima actually saved you. Every recommendation is logged with its counterfactual cost baselines and reconciled with the realized cost you report to /v1/feedback — this endpoint aggregates that ledger for your account.

Two baselines are always reported side by side, explicitly labeled: vs premium (the most expensive candidate that was scored for each request — generous; overstates savings if you’d never have used the premium model) and vs declared (your own stated default via baseline_model_id — honest, but only present on requests where you declared one). Estimated and realized figures are never mixed into one number.

ParamTypeDefaultNotes
namespacestringRestrict to one namespace.
daysfloat 0–36530Lookback window.
group_byenumcluster | task_type | lane — optional breakdown.
FieldTypeNotes
summary.estimatedobjectn, cost_recommended_usd, cost_premium_usd, savings_vs_premium_usd, plus n_declared / cost_declared_usd / savings_vs_declared_usd over the declared-baseline subset.
summary.realizedobjectSame shape over the reconciled subset (feedback received with actual_cost_usd), using your realized cost against the estimated baselines.
healthobjectrecommendations, feedback_coverage (share of recommendations that ever got feedback — the number that decides how much to trust realized), late_feedback_share, escalation_rate, exploration_share, epsilon_policy_share.
groupsarrayPer-group_by key: the same summary + health.
Terminal window
curl -s "https://api.minima.sh/v1/savings?days=30&group_by=task_type" \
-H "authorization: Bearer $MUBIT_API_KEY" | jq '.summary, .health'

GET /v1/calibration measurement

Section titled “GET /v1/calibration ”

Is predicted_success telling the truth for your traffic? Compares each recommendation’s predicted success against the realized outcome you reported, per task type.

ParamTypeDefaultNotes
namespacestringRestrict to one namespace.
daysfloat 0–36530Lookback window.
FieldTypeNotes
reportsarrayPer slice (first entry is global, then each task type): n, ece (expected calibration error — 0 is perfect), ece_shrunk (sparse slices pulled toward the global estimate so 3 feedbacks don’t read as a verdict), ece_quality (quality-score-weighted variant), and reliability bins (lo, hi, n, avg_predicted, avg_realized).
drift_flagsarraySustained prediction drift per (cluster, model)direction is over_predicting (the model got worse than the evidence says; the expensive failure mode) or under_predicting.
healthobjectSame routing-health rates as /v1/savings.
Terminal window
curl -s "https://api.minima.sh/v1/calibration?days=30" \
-H "authorization: Bearer $MUBIT_API_KEY" | jq '.reports[] | {slice_key, n, ece, ece_shrunk}'

Always returns 200; reports degraded state in the body. The only endpoint that doesn’t require a key (an unauthenticated probe gets service liveness; a key-bearing probe additionally confirms your account’s memory reachability).

{
"status": "ok",
"memory": {"reachable": true, "latency_ms": 12},
"catalog": {"version": "", "stale": false, "models": 42},
"version": "0.1.0"
}

status is degraded when the memory backend is unreachable. In that state /recommend still serves prior-only recommendations.