Compare commits

...

1 Commits

Author SHA1 Message Date
Vaayne
1e8251a05e add model catalogs 2025-07-06 21:27:27 +08:00
276 changed files with 11743 additions and 0 deletions

View File

@@ -0,0 +1,47 @@
id: 01-ai/yi-large
canonical_slug: 01-ai/yi-large
hugging_face_id: ''
name: '01.AI: Yi Large'
type: chat
created: 1719273600
description: |-
The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service.
It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French.
Check out the [launch announcement](https://01-ai.github.io/blog/01.ai-yi-large-llm-launch) to learn more.
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Yi
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000003'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- response_format
- structured_outputs
- logit_bias
- logprobs
- top_logprobs
model_provider: 01-ai

View File

@@ -0,0 +1,42 @@
id: aetherwiing/mn-starcannon-12b
canonical_slug: aetherwiing/mn-starcannon-12b
hugging_face_id: aetherwiing/MN-12B-Starcannon-v2
name: 'Aetherwiing: Starcannon 12B'
type: chat
created: 1723507200
description: |-
Starcannon 12B v2 is a creative roleplay and story writing model, based on Mistral Nemo, using [nothingiisreal/mn-celeste-12b](/nothingiisreal/mn-celeste-12b) as a base, with [intervitens/mini-magnum-12b-v1.1](https://huggingface.co/intervitens/mini-magnum-12b-v1.1) merged in using the [TIES](https://arxiv.org/abs/2306.01708) method.
Although more similar to Magnum overall, the model remains very creative, with a pleasant writing style. It is recommended for people wanting more variety than Magnum, and yet more verbose prose than Celeste.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Mistral
instruct_type: chatml
pricing:
prompt: '0.0000008'
completion: '0.0000012'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: aetherwiing

View File

@@ -0,0 +1,38 @@
id: ai21/jamba-1.6-large
canonical_slug: ai21/jamba-1.6-large
hugging_face_id: ai21labs/AI21-Jamba-Large-1.6
name: 'AI21: Jamba 1.6 Large'
type: chat
created: 1741905173
description: |-
AI21 Jamba Large 1.6 is a high-performance hybrid foundation model combining State Space Models (Mamba) with Transformer attention mechanisms. Developed by AI21, it excels in extremely long-context handling (256K tokens), demonstrates superior inference efficiency (up to 2.5x faster than comparable models), and supports structured JSON output and tool-use capabilities. It has 94 billion active parameters (398 billion total), optimized quantization support (ExpertsInt8), and multilingual proficiency in languages such as English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew.
Usage of this model is subject to the [Jamba Open Model License](https://www.ai21.com/licenses/jamba-open-model-license).
context_length: 256000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.000002'
completion: '0.000008'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
model_provider: ai21

View File

@@ -0,0 +1,38 @@
id: ai21/jamba-1.6-mini
canonical_slug: ai21/jamba-1.6-mini
hugging_face_id: ai21labs/AI21-Jamba-Mini-1.6
name: 'AI21: Jamba Mini 1.6'
type: chat
created: 1741905171
description: |-
AI21 Jamba Mini 1.6 is a hybrid foundation model combining State Space Models (Mamba) with Transformer attention mechanisms. With 12 billion active parameters (52 billion total), this model excels in extremely long-context tasks (up to 256K tokens) and achieves superior inference efficiency, outperforming comparable open models on tasks such as retrieval-augmented generation (RAG) and grounded question answering. Jamba Mini 1.6 supports multilingual tasks across English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew, along with structured JSON output and tool-use capabilities.
Usage of this model is subject to the [Jamba Open Model License](https://www.ai21.com/licenses/jamba-open-model-license).
context_length: 256000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000002'
completion: '0.0000004'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
model_provider: ai21

View File

@@ -0,0 +1,34 @@
id: aion-labs/aion-1.0-mini
canonical_slug: aion-labs/aion-1.0-mini
hugging_face_id: FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview
name: 'AionLabs: Aion-1.0-Mini'
type: chat
created: 1738697107
description: Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000007'
completion: '0.0000014'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
model_provider: aion-labs

View File

@@ -0,0 +1,34 @@
id: aion-labs/aion-1.0
canonical_slug: aion-labs/aion-1.0
hugging_face_id: ''
name: 'AionLabs: Aion-1.0'
type: chat
created: 1738697557
description: Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.000004'
completion: '0.000008'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
model_provider: aion-labs

View File

@@ -0,0 +1,32 @@
id: aion-labs/aion-rp-llama-3.1-8b
canonical_slug: aion-labs/aion-rp-llama-3.1-8b
hugging_face_id: ''
name: 'AionLabs: Aion-RP 1.0 (8B)'
type: chat
created: 1738696718
description: Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each others responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000002'
completion: '0.0000002'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
model_provider: aion-labs

View File

@@ -0,0 +1,39 @@
id: alfredpros/codellama-7b-instruct-solidity
canonical_slug: alfredpros/codellama-7b-instruct-solidity
hugging_face_id: AlfredPros/CodeLlama-7b-Instruct-Solidity
name: 'AlfredPros: CodeLLaMa 7B Instruct Solidity'
type: chat
created: 1744641874
description: A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library.
context_length: 4096
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: alpaca
pricing:
prompt: '0.0000008'
completion: '0.0000012'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: alfredpros

View File

@@ -0,0 +1,44 @@
id: all-hands/openhands-lm-32b-v0.1
canonical_slug: all-hands/openhands-lm-32b-v0.1
hugging_face_id: all-hands/openhands-lm-32b-v0.1
name: OpenHands LM 32B V0.1
type: chat
created: 1743613013
description: |-
OpenHands LM v0.1 is a 32B open-source coding model fine-tuned from Qwen2.5-Coder-32B-Instruct using reinforcement learning techniques outlined in SWE-Gym. It is optimized for autonomous software development agents and achieves strong performance on SWE-Bench Verified, with a 37.2% resolve rate. The model supports a 128K token context window, making it well-suited for long-horizon code reasoning and large codebase tasks.
OpenHands LM is designed for local deployment and runs on consumer-grade GPUs such as a single 3090. It enables fully offline agent workflows without dependency on proprietary APIs. This release is intended as a research preview, and future updates aim to improve generalizability, reduce repetition, and offer smaller variants.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000026'
completion: '0.0000034'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: all-hands

View File

@@ -0,0 +1,48 @@
id: alpindale/goliath-120b
canonical_slug: alpindale/goliath-120b
hugging_face_id: alpindale/goliath-120b
name: Goliath 120B
type: chat
created: 1699574400
description: |-
A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.
Credits to
- [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
- [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.
#merge
context_length: 6144
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Llama2
instruct_type: airoboros
pricing:
prompt: '0.00001'
completion: '0.0000125'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- logit_bias
- top_k
- min_p
- seed
- top_a
model_provider: alpindale

View File

@@ -0,0 +1,42 @@
id: alpindale/magnum-72b
canonical_slug: alpindale/magnum-72b
hugging_face_id: alpindale/magnum-72b-v1
name: Magnum 72B
type: chat
created: 1720656000
description: |-
From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.
The model is based on [Qwen2 72B](https://openrouter.ai/models/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: chatml
pricing:
prompt: '0.000004'
completion: '0.000006'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: alpindale

View File

@@ -0,0 +1,39 @@
id: amazon/nova-lite-v1
canonical_slug: amazon/nova-lite-v1
hugging_face_id: ''
name: 'Amazon: Nova Lite 1.0'
type: chat
created: 1733437363
description: |-
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.
With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.
context_length: 300000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Nova
instruct_type: null
pricing:
prompt: '0.00000006'
completion: '0.00000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0.00009'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: amazon

View File

@@ -0,0 +1,35 @@
id: amazon/nova-micro-v1
canonical_slug: amazon/nova-micro-v1
hugging_face_id: ''
name: 'Amazon: Nova Micro 1.0'
type: chat
created: 1733437237
description: Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Nova
instruct_type: null
pricing:
prompt: '0.000000035'
completion: '0.00000014'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: amazon

View File

@@ -0,0 +1,41 @@
id: amazon/nova-pro-v1
canonical_slug: amazon/nova-pro-v1
hugging_face_id: ''
name: 'Amazon: Nova Pro 1.0'
type: chat
created: 1733436303
description: |-
Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX).
Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents.
**NOTE**: Video input is not supported at this time.
context_length: 300000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Nova
instruct_type: null
pricing:
prompt: '0.0000008'
completion: '0.0000032'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0.0012'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: amazon

View File

@@ -0,0 +1,43 @@
id: anthracite-org/magnum-v2-72b
canonical_slug: anthracite-org/magnum-v2-72b
hugging_face_id: anthracite-org/magnum-v2-72b
name: Magnum v2 72B
type: chat
created: 1727654400
description: |-
From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the seventh in a family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.
The model is based on [Qwen2 72B](https://openrouter.ai/models/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: chatml
pricing:
prompt: '0.000003'
completion: '0.000003'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- logit_bias
- top_k
- min_p
- seed
model_provider: anthracite-org

View File

@@ -0,0 +1,44 @@
id: anthracite-org/magnum-v4-72b
canonical_slug: anthracite-org/magnum-v4-72b
hugging_face_id: anthracite-org/magnum-v4-72b
name: Magnum v4 72B
type: chat
created: 1729555200
description: |-
This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus).
The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct).
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: chatml
pricing:
prompt: '0.0000025'
completion: '0.000003'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
- logit_bias
- top_a
model_provider: anthracite-org

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2:beta
canonical_slug: anthropic/claude-2
hugging_face_id: ''
name: 'Anthropic: Claude v2 (self-moderated)'
type: chat
created: 1700611200
description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
context_length: 200000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2.0:beta
canonical_slug: anthropic/claude-2.0
hugging_face_id: ''
name: 'Anthropic: Claude v2.0 (self-moderated)'
type: chat
created: 1690502400
description: Anthropic's flagship model. Superior performance on tasks that require complex reasoning. Supports hundreds of pages of text.
context_length: 100000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2.0
canonical_slug: anthropic/claude-2.0
hugging_face_id: ''
name: 'Anthropic: Claude v2.0'
type: chat
created: 1690502400
description: Anthropic's flagship model. Superior performance on tasks that require complex reasoning. Supports hundreds of pages of text.
context_length: 100000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2.1:beta
canonical_slug: anthropic/claude-2.1
hugging_face_id: ''
name: 'Anthropic: Claude v2.1 (self-moderated)'
type: chat
created: 1700611200
description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
context_length: 200000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2.1
canonical_slug: anthropic/claude-2.1
hugging_face_id: ''
name: 'Anthropic: Claude v2.1'
type: chat
created: 1700611200
description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
context_length: 200000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,34 @@
id: anthropic/claude-2
canonical_slug: anthropic/claude-2
hugging_face_id: ''
name: 'Anthropic: Claude v2'
type: chat
created: 1700611200
description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
context_length: 200000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000008'
completion: '0.000024'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,43 @@
id: anthropic/claude-3-haiku:beta
canonical_slug: anthropic/claude-3-haiku
hugging_face_id: ''
name: 'Anthropic: Claude 3 Haiku (self-moderated)'
type: chat
created: 1710288000
description: |-
Claude 3 Haiku is Anthropic's fastest and most compact model for
near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.00000025'
completion: '0.00000125'
input_cache_read: '0.00000003'
input_cache_write: '0.0000003'
request: '0'
image: '0.0004'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,43 @@
id: anthropic/claude-3-haiku
canonical_slug: anthropic/claude-3-haiku
hugging_face_id: ''
name: 'Anthropic: Claude 3 Haiku'
type: chat
created: 1710288000
description: |-
Claude 3 Haiku is Anthropic's fastest and most compact model for
near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.00000025'
completion: '0.00000125'
input_cache_read: '0.00000003'
input_cache_write: '0.0000003'
request: '0'
image: '0.0004'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3-opus:beta
canonical_slug: anthropic/claude-3-opus
hugging_face_id: ''
name: 'Anthropic: Claude 3 Opus (self-moderated)'
type: chat
created: 1709596800
description: |-
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000015'
completion: '0.000075'
input_cache_read: '0.0000015'
input_cache_write: '0.00001875'
request: '0'
image: '0.024'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3-opus
canonical_slug: anthropic/claude-3-opus
hugging_face_id: ''
name: 'Anthropic: Claude 3 Opus'
type: chat
created: 1709596800
description: |-
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000015'
completion: '0.000075'
input_cache_read: '0.0000015'
input_cache_write: '0.00001875'
request: '0'
image: '0.024'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3-sonnet:beta
canonical_slug: anthropic/claude-3-sonnet
hugging_face_id: ''
name: 'Anthropic: Claude 3 Sonnet (self-moderated)'
type: chat
created: 1709596800
description: |-
Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3-sonnet
canonical_slug: anthropic/claude-3-sonnet
hugging_face_id: ''
name: 'Anthropic: Claude 3 Sonnet'
type: chat
created: 1709596800
description: |-
Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3.5-haiku-20241022:beta
canonical_slug: anthropic/claude-3-5-haiku-20241022
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Haiku (2024-10-22) (self-moderated)'
type: chat
created: 1730678400
description: |-
Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries.
It does not support image inputs.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.0000008'
completion: '0.000004'
input_cache_read: '0.00000008'
input_cache_write: '0.000001'
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3.5-haiku-20241022
canonical_slug: anthropic/claude-3-5-haiku-20241022
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Haiku (2024-10-22)'
type: chat
created: 1730678400
description: |-
Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries.
It does not support image inputs.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.0000008'
completion: '0.000004'
input_cache_read: '0.00000008'
input_cache_write: '0.000001'
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3.5-haiku:beta
canonical_slug: anthropic/claude-3-5-haiku
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Haiku (self-moderated)'
type: chat
created: 1730678400
description: |-
Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.
This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.
This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022).
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.0000008'
completion: '0.000004'
input_cache_read: '0.00000008'
input_cache_write: '0.000001'
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-3.5-haiku
canonical_slug: anthropic/claude-3-5-haiku
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Haiku'
type: chat
created: 1730678400
description: |-
Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.
This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.
This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022).
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.0000008'
completion: '0.000004'
input_cache_read: '0.00000008'
input_cache_write: '0.000001'
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,47 @@
id: anthropic/claude-3.5-sonnet-20240620:beta
canonical_slug: anthropic/claude-3.5-sonnet-20240620
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Sonnet (2024-06-20) (self-moderated)'
type: chat
created: 1718841600
description: |-
Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
For the latest version (2024-10-23), check out [Claude 3.5 Sonnet](/anthropic/claude-3.5-sonnet).
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,47 @@
id: anthropic/claude-3.5-sonnet-20240620
canonical_slug: anthropic/claude-3.5-sonnet-20240620
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Sonnet (2024-06-20)'
type: chat
created: 1718841600
description: |-
Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
For the latest version (2024-10-23), check out [Claude 3.5 Sonnet](/anthropic/claude-3.5-sonnet).
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,45 @@
id: anthropic/claude-3.5-sonnet:beta
canonical_slug: anthropic/claude-3.5-sonnet
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Sonnet (self-moderated)'
type: chat
created: 1729555200
description: |-
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,45 @@
id: anthropic/claude-3.5-sonnet
canonical_slug: anthropic/claude-3.5-sonnet
hugging_face_id: ''
name: 'Anthropic: Claude 3.5 Sonnet'
type: chat
created: 1729555200
description: |-
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
#multimodal
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- top_k
- stop
model_provider: anthropic

View File

@@ -0,0 +1,37 @@
id: anthropic/claude-3.7-sonnet:beta
canonical_slug: anthropic/claude-3-7-sonnet-20250219
hugging_face_id: ''
name: 'Anthropic: Claude 3.7 Sonnet (self-moderated)'
type: chat
created: 1740422110
description: "Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. \n\nClaude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.\n\nRead more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)"
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
model_provider: anthropic

View File

@@ -0,0 +1,37 @@
id: anthropic/claude-3.7-sonnet:thinking
canonical_slug: anthropic/claude-3-7-sonnet-20250219
hugging_face_id: ''
name: 'Anthropic: Claude 3.7 Sonnet (thinking)'
type: chat
created: 1740422110
description: "Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. \n\nClaude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.\n\nRead more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)"
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
model_provider: anthropic

View File

@@ -0,0 +1,39 @@
id: anthropic/claude-3.7-sonnet
canonical_slug: anthropic/claude-3-7-sonnet-20250219
hugging_face_id: ''
name: 'Anthropic: Claude 3.7 Sonnet'
type: chat
created: 1740422110
description: "Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. \n\nClaude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.\n\nRead more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)"
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
model_provider: anthropic

View File

@@ -0,0 +1,39 @@
id: anthropic/claude-opus-4
canonical_slug: anthropic/claude-4-opus-20250522
hugging_face_id: ''
name: 'Anthropic: Claude Opus 4'
type: chat
created: 1747931245
description: "Claude Opus 4 is benchmarked as the worlds best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation. \n\nRead more at the [blog post here](https://www.anthropic.com/news/claude-4)"
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- image
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000015'
completion: '0.000075'
input_cache_read: '0.0000015'
input_cache_write: '0.00001875'
request: '0'
image: '0.024'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
model_provider: anthropic

View File

@@ -0,0 +1,42 @@
id: anthropic/claude-sonnet-4
canonical_slug: anthropic/claude-4-sonnet-20250522
hugging_face_id: ''
name: 'Anthropic: Claude Sonnet 4'
type: chat
created: 1747930371
description: |-
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios.
Read more at the [blog post here](https://www.anthropic.com/news/claude-4)
context_length: 200000
architecture:
modality: text+image->text
input_modalities:
- image
- text
output_modalities:
- text
tokenizer: Claude
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: '0.0000003'
input_cache_write: '0.00000375'
request: '0'
image: '0.0048'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
model_provider: anthropic

View File

@@ -0,0 +1,40 @@
id: arcee-ai/arcee-blitz
canonical_slug: arcee-ai/arcee-blitz
hugging_face_id: arcee-ai/arcee-blitz
name: 'Arcee AI: Arcee Blitz'
type: chat
created: 1746470100
description: 'Arcee Blitz is a 24Bparameter dense model distilled from DeepSeek and built on Mistral architecture for "everyday" chat. The distillationplusrefinement pipeline trims compute while keeping DeepSeekstyle reasoning, so Blitz punches above its weight on MMLU, GSM8K and BBH compared with other midsize open models. With a default 128k context window and competitive throughput, it serves as a costefficient workhorse for summarization, brainstorming and light code help. Internally, Arcee uses Blitz as the default writer in Conductor pipelines when the heavier Virtuoso line is not required. Users therefore get near70B quality at ~⅓ the latency and price. '
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.00000045'
completion: '0.00000075'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,42 @@
id: arcee-ai/caller-large
canonical_slug: arcee-ai/caller-large
hugging_face_id: ''
name: 'Arcee AI: Caller Large'
type: chat
created: 1746487869
description: 'Caller Large is Arcee''s specialist "functioncalling" SLM built to orchestrate external tools and APIs. Instead of maximizing nexttoken accuracy, training focuses on structured JSON outputs, parameter extraction and multistep tool chains, making Caller a natural choice for retrievalaugmented generation, robotic process automation or datapull chatbots. It incorporates a routing head that decides when (and how) to invoke a tool versus answering directly, reducing hallucinated calls. The model is already the backbone of Arcee Conductor''s autotool mode, where it parses user intent, emits clean function signatures and hands control back once the tool response is ready. Developers thus gain an OpenAIstyle functioncalling UX without handing requests to a frontierscale model. '
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.00000055'
completion: '0.00000085'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,40 @@
id: arcee-ai/coder-large
canonical_slug: arcee-ai/coder-large
hugging_face_id: ''
name: 'Arcee AI: Coder Large'
type: chat
created: 1746478663
description: 'CoderLarge is a 32Bparameter offspring of Qwen2.5Instruct that has been further trained on permissivelylicensed GitHub, CodeSearchNet and synthetic bugfix corpora. It supports a 32k context window, enabling multifile refactoring or long diff review in a single call, and understands 30plus programming languages with special attention to TypeScript, Go and Terraform. Internal benchmarks show 58pt gains over CodeLlama34BPython on HumanEval and competitive BugFix scores thanks to a reinforcement pass that rewards compilable output. The model emits structured explanations alongside code blocks by default, making it suitable for educational tooling as well as production copilot scenarios. Costwise, Together AI prices it well below proprietary incumbents, so teams can scale interactive coding without runaway spend. '
context_length: 32768
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000005'
completion: '0.0000008'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,40 @@
id: arcee-ai/maestro-reasoning
canonical_slug: arcee-ai/maestro-reasoning
hugging_face_id: ''
name: 'Arcee AI: Maestro Reasoning'
type: chat
created: 1746481269
description: 'Maestro Reasoning is Arcee''s flagship analysis model: a 32Bparameter derivative of Qwen2.532B tuned with DPO and chainofthought RL for stepbystep logic. Compared to the earlier 7B preview, the production 32B release widens the context window to 128k tokens and doubles passrate on MATH and GSM8K, while also lifting code completion accuracy. Its instruction style encourages structured "thought → answer" traces that can be parsed or hidden according to user preference. That transparency pairs well with auditfocused industries like finance or healthcare where seeing the reasoning path matters. In Arcee Conductor, Maestro is automatically selected for complex, multiconstraint queries that smaller SLMs bounce. '
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000009'
completion: '0.0000033'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,41 @@
id: arcee-ai/spotlight
canonical_slug: arcee-ai/spotlight
hugging_face_id: ''
name: 'Arcee AI: Spotlight'
type: chat
created: 1746481552
description: 'Spotlight is a 7billionparameter visionlanguage model derived from Qwen2.5VL and finetuned by Arcee AI for tight imagetext grounding tasks. It offers a 32ktoken context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visualquestionanswering, and diagramanalysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mockups need to be interpreted on the fly. Early benchmarks show it matching or outscoring larger VLMs such as LLaVA1.6 13B on popular VQA and POPE alignment tests. '
context_length: 131072
architecture:
modality: text+image->text
input_modalities:
- image
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.00000018'
completion: '0.00000018'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,42 @@
id: arcee-ai/virtuoso-large
canonical_slug: arcee-ai/virtuoso-large
hugging_face_id: ''
name: 'Arcee AI: Virtuoso Large'
type: chat
created: 1746478885
description: VirtuosoLarge is Arcee's toptier generalpurpose LLM at 72B parameters, tuned to tackle crossdomain reasoning, creative writing and enterprise QA. Unlike many 70B peers, it retains the 128k context inherited from Qwen2.5, letting it ingest books, codebases or financial filings wholesale. Training blended DeepSeekR1 distillation, multiepoch supervised finetuning and a final DPO/RLHF alignment stage, yielding strong performance on BIGBenchHard, GSM8K and longcontext NeedleInHaystack tests. Enterprises use VirtuosoLarge as the "fallback" brain in Conductor pipelines when other SLMs flag low confidence. Despite its size, aggressive KVcache optimizations keep firsttoken latency in the lowsecond range on 8×H100 nodes, making it a practical productiongrade powerhouse.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.00000075'
completion: '0.0000012'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,42 @@
id: arcee-ai/virtuoso-medium-v2
canonical_slug: arcee-ai/virtuoso-medium-v2
hugging_face_id: arcee-ai/Virtuoso-Medium-v2
name: 'Arcee AI: Virtuoso Medium V2'
type: chat
created: 1746478434
description: 'VirtuosoMediumv2 is a 32B model distilled from DeepSeekv3 logits and merged back onto a Qwen2.5 backbone, yielding a sharper, more factual successor to the original Virtuoso Medium. The team harvested ~1.1B logit tokens and applied "fusionmerging" plus DPO alignment, which pushed scores past ArceeNova2024 and many 40Bplus peers on MMLUPro, MATH and HumanEval. With a 128k context and aggressive quantization options (from BF16 down to 4bit GGUF), it balances capability with deployability on singleGPU nodes. Typical use cases include enterprise chat assistants, technical writing aids and mediumcomplexity code drafting where VirtuosoLarge would be overkill. '
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000005'
completion: '0.0000008'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: arcee-ai

View File

@@ -0,0 +1,24 @@
id: bytedance/doubao-embedding-text-240715
canonical_slug: bytedance/doubao-embedding-text-240715
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding Text (240715)'
description: |-
Doubao Embedding Large 是字节跳动语义向量化模型的最新升级版,模型以豆包语言模型为基座,具备强大的语言理解能力;主要面向向量检索的使用场景,支持中、英双语。
context_length: 4000
dimensions:
- 512
- 1024
- 2048
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.7'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,25 @@
id: bytedance/doubao-embedding-large-text-240915
canonical_slug: bytedance/doubao-embedding-large-text-240915
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding Large Text (240915)'
description: |-
Doubao Embedding Large 是字节跳动语义向量化模型的最新升级版,模型以豆包语言模型为基座,具备强大的语言理解能力;主要面向向量检索的使用场景,支持中、英双语。
context_length: 4000
dimensions:
- 512
- 1024
- 2048
- 4096
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.7'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,24 @@
id: bytedance/doubao-embedding-text-240715
canonical_slug: bytedance/doubao-embedding-text-240715
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding'
description: |-
由字节跳动研发的语义向量化模型,主要面向向量检索的使用场景,支持中、英双语,最长 4K 上下文长度。向量维度 2048 维,支持 512、1024 降维使用。
context_length: 4000
dimensions:
- 512
- 1024
- 2048
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.5'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,25 @@
id: bytedance/doubao-embedding-text-240715
canonical_slug: bytedance/doubao-embedding-text-240715
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding'
description: |-
由字节跳动研发的语义向量化模型,主要面向向量检索的使用场景,支持中、英双语,最长 4K 上下文长度。向量维度 2048 维,支持 512、1024 降维使用。
context_length: 4000
dimensions:
- 512
- 1024
- 2048
- 2560
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.5'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,24 @@
id: bytedance/doubao-embedding-vision-241215
canonical_slug: bytedance/doubao-embedding-vision-241215
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding Vision'
description: |-
Doubao-embedding-vision全新升级图文多模态向量化模型主要面向图文多模向量检索的使用场景支持图片输入及中、英双语文本输入最长 8K 上下文长度。
context_length: 8000
dimensions:
- 3072
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.7'
prompt_image: '1.8'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,25 @@
id: bytedance/doubao-embedding-vision-250328
canonical_slug: bytedance/doubao-embedding-vision-250328
type: embedding
hugging_face_id: null
name: 'ByteDance: Doubao Embedding Vision'
description: |-
Doubao-embedding-vision全新升级图文多模态向量化模型主要面向图文多模向量检索的使用场景支持图片输入及中、英双语文本输入最长 8K 上下文长度。
context_length: 8000
dimensions:
- 1024
- 2048
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Doubao
pricing:
prompt: '0.7'
prompt_image: '1.8'
unit: 1000000
currency: CNY
model_provider: bytedance

View File

@@ -0,0 +1,41 @@
id: bytedance/doubao-seed-1.6-flash
canonical_slug: bytedance/doubao-seed-1.6-flash
type: chat
hugging_face_id: ''
name: 'ByteDance: Doubao Seed 1.6 Flash'
created: 1738402289
description: 有极致推理速度的多模态深度思考模型;同时支持文本和视觉理解。文本理解能力超过上一代 Lite 系列模型,视觉理解比肩友商 Pro 系列模型。
context_length: 256000
architecture:
modality: text+image+vedio->text
input_modalities:
- text
- image
- video
output_modalities:
- text
tokenizer: Doubao
instruct_type: null
pricing:
prompt: '0.15'
completion: '1.5'
input_cache_read: '0.03'
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
currency: CNY
unit: 1000000
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
- structured_outputs
model_provider: bytedance

View File

@@ -0,0 +1,41 @@
id: bytedance/doubao-seed-1.6-thinking
canonical_slug: bytedance/doubao-seed-1.6-thinking
type: chat
hugging_face_id: ''
name: 'ByteDance: Doubao Seed 1.6 Thinking'
created: 1738402289
description: 在思考能力上进行了大幅强化, 对比 doubao 1.5 代深度理解模型,在编程、数学、逻辑推理等基础能力上进一步提升, 支持视觉理解。
context_length: 256000
architecture:
modality: text+image+vedio->text
input_modalities:
- text
- image
- video
output_modalities:
- text
tokenizer: Doubao
instruct_type: null
pricing:
prompt: '0.8'
completion: '8.0'
input_cache_read: '0.16'
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
currency: CNY
unit: 1000000
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
- structured_outputs
model_provider: bytedance

View File

@@ -0,0 +1,41 @@
id: bytedance/doubao-seed-1.6
canonical_slug: bytedance/doubao-seed-1.6
type: chat
hugging_face_id: ''
name: 'ByteDance: Doubao Seed 1.6'
created: 1738402289
description: 全新多模态深度思考模型,同时支持 thinking、non-thinking、auto三种思考模式。其中 non-thinking 模型对比 doubao-1.5-pro-32k-250115 模型大幅提升。
context_length: 256000
architecture:
modality: text+image+vedio->text
input_modalities:
- text
- image
- video
output_modalities:
- text
tokenizer: Doubao
instruct_type: null
pricing:
prompt: '0.8'
completion: '8.0'
input_cache_read: '0.16'
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
currency: CNY
unit: 1000000
supported_parameters:
- max_tokens
- temperature
- stop
- reasoning
- include_reasoning
- tools
- tool_choice
- top_p
- top_k
- structured_outputs
model_provider: bytedance

View File

@@ -0,0 +1,47 @@
id: cognitivecomputations/dolphin-mixtral-8x22b
canonical_slug: cognitivecomputations/dolphin-mixtral-8x22b
hugging_face_id: cognitivecomputations/dolphin-2.9.2-mixtral-8x22b
name: "Dolphin 2.9.2 Mixtral 8x22B \U0001F42C"
type: chat
created: 1717804800
description: |-
Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of [Mixtral 8x22B Instruct](/models/mistralai/mixtral-8x22b-instruct). It features a 64k context length and was fine-tuned with a 16k sequence length using ChatML templates.
This model is a successor to [Dolphin Mixtral 8x7B](/models/cognitivecomputations/dolphin-mixtral-8x7b).
The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models).
#moe #uncensored
context_length: 16000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Mistral
instruct_type: chatml
pricing:
prompt: '0.0000009'
completion: '0.0000009'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- seed
- top_k
- min_p
- repetition_penalty
- logit_bias
model_provider: cognitivecomputations

View File

@@ -0,0 +1,41 @@
id: cohere/command-a
canonical_slug: cohere/command-a-03-2025
hugging_face_id: CohereForAI/c4ai-command-a-03-2025
name: 'Cohere: Command A'
type: chat
created: 1741894342
description: |-
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases.
Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks.
context_length: 256000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: null
pricing:
prompt: '0.0000025'
completion: '0.00001'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r-03-2024
canonical_slug: cohere/command-r-03-2024
hugging_face_id: ''
name: 'Cohere: Command R (03-2024)'
type: chat
created: 1709341200
description: |-
Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.
Read the launch post [here](https://txt.cohere.com/command-r/).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.0000005'
completion: '0.0000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r-08-2024
canonical_slug: cohere/command-r-08-2024
hugging_face_id: ''
name: 'Cohere: Command R (08-2024)'
type: chat
created: 1724976000
description: |-
command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.00000015'
completion: '0.0000006'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r-plus-04-2024
canonical_slug: cohere/command-r-plus-04-2024
hugging_face_id: ''
name: 'Cohere: Command R+ (04-2024)'
type: chat
created: 1712016000
description: |-
Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).
It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r-plus-08-2024
canonical_slug: cohere/command-r-plus-08-2024
hugging_face_id: ''
name: 'Cohere: Command R+ (08-2024)'
type: chat
created: 1724976000
description: |-
command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.0000025'
completion: '0.00001'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r-plus
canonical_slug: cohere/command-r-plus
hugging_face_id: ''
name: 'Cohere: Command R+'
type: chat
created: 1712188800
description: |-
Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).
It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.000003'
completion: '0.000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,45 @@
id: cohere/command-r
canonical_slug: cohere/command-r
hugging_face_id: ''
name: 'Cohere: Command R'
type: chat
created: 1710374400
description: |-
Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.
Read the launch post [here](https://txt.cohere.com/command-r/).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.0000005'
completion: '0.0000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,42 @@
id: cohere/command-r7b-12-2024
canonical_slug: cohere/command-r7b-12-2024
hugging_face_id: ''
name: 'Cohere: Command R7B (12-2024)'
type: chat
created: 1734158152
description: |-
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.0000000375'
completion: '0.00000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,42 @@
id: cohere/command
canonical_slug: cohere/command
hugging_face_id: ''
name: 'Cohere: Command'
type: chat
created: 1710374400
description: |-
Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
context_length: 4096
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Cohere
instruct_type: null
pricing:
prompt: '0.000001'
completion: '0.000002'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- seed
- response_format
- structured_outputs
model_provider: cohere

View File

@@ -0,0 +1,49 @@
id: deepseek/deepseek-chat-v3-0324
canonical_slug: deepseek/deepseek-chat-v3-0324
hugging_face_id: deepseek-ai/DeepSeek-V3-0324
name: 'DeepSeek: DeepSeek V3 0324'
type: chat
created: 1742824755
description: |-
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.
It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.
context_length: 163840
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: DeepSeek
instruct_type: null
pricing:
prompt: '0.0000003'
completion: '0.00000088'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- logprobs
- top_logprobs
- seed
- min_p
model_provider: deepseek

View File

@@ -0,0 +1,49 @@
id: deepseek/deepseek-chat
canonical_slug: deepseek/deepseek-chat-v3
hugging_face_id: deepseek-ai/DeepSeek-V3
name: 'DeepSeek: DeepSeek V3'
type: chat
created: 1735241320
description: |-
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
context_length: 163840
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: DeepSeek
instruct_type: null
pricing:
prompt: '0.00000038'
completion: '0.00000089'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- logprobs
- top_logprobs
- seed
- min_p
model_provider: deepseek

View File

@@ -0,0 +1,41 @@
id: deepseek/deepseek-prover-v2
canonical_slug: deepseek/deepseek-prover-v2
hugging_face_id: deepseek-ai/DeepSeek-Prover-V2-671B
name: 'DeepSeek: DeepSeek Prover V2'
type: chat
created: 1746013094
description: DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from [DeepSeek-Prover-V1.5](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL) Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: DeepSeek
instruct_type: null
pricing:
prompt: '0.0000005'
completion: '0.00000218'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- seed
- top_k
- min_p
- repetition_penalty
- logit_bias
- response_format
model_provider: deepseek

View File

@@ -0,0 +1,45 @@
id: deepseek/deepseek-r1-0528-qwen3-8b
canonical_slug: deepseek/deepseek-r1-0528-qwen3-8b
hugging_face_id: deepseek-ai/deepseek-r1-0528-qwen3-8b
name: 'DeepSeek: Deepseek R1 0528 Qwen3 8B'
type: chat
created: 1748538543
description: |-
DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro.
It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought.
The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: deepseek-r1
pricing:
prompt: '0.00000005'
completion: '0.0000001'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- presence_penalty
- frequency_penalty
- repetition_penalty
- top_k
- stop
- seed
- min_p
- logit_bias
model_provider: deepseek

View File

@@ -0,0 +1,51 @@
id: deepseek/deepseek-r1-0528
canonical_slug: deepseek/deepseek-r1-0528
hugging_face_id: deepseek-ai/DeepSeek-R1-0528
name: 'DeepSeek: R1 0528'
type: chat
created: 1748455170
description: |-
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model.
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: DeepSeek
instruct_type: deepseek-r1
pricing:
prompt: '0.0000005'
completion: '0.00000215'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
- logprobs
- top_logprobs
- tools
- tool_choice
- seed
- structured_outputs
model_provider: deepseek

View File

@@ -0,0 +1,55 @@
id: deepseek/deepseek-r1-distill-llama-70b
canonical_slug: deepseek/deepseek-r1-distill-llama-70b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
name: 'DeepSeek: R1 Distill Llama 70B'
type: chat
created: 1737663169
description: |-
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
- AIME 2024 pass@1: 70.0
- MATH-500 pass@1: 94.5
- CodeForces Rating: 1633
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Llama3
instruct_type: deepseek-r1
pricing:
prompt: '0.0000001'
completion: '0.0000004'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- seed
- top_k
- stop
- frequency_penalty
- presence_penalty
- logit_bias
- logprobs
- top_logprobs
- min_p
- repetition_penalty
- tools
- tool_choice
- response_format
- structured_outputs
model_provider: deepseek

View File

@@ -0,0 +1,42 @@
id: deepseek/deepseek-r1-distill-llama-8b
canonical_slug: deepseek/deepseek-r1-distill-llama-8b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
name: 'DeepSeek: R1 Distill Llama 8B'
type: chat
created: 1738937718
description: "DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:\n\n- AIME 2024 pass@1: 50.4\n- MATH-500 pass@1: 89.1\n- CodeForces Rating: 1205\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.\n\nHugging Face: \n- [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) \n- [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |"
context_length: 32000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Llama3
instruct_type: deepseek-r1
pricing:
prompt: '0.00000004'
completion: '0.00000004'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- stop
- frequency_penalty
- presence_penalty
- seed
- top_k
- min_p
- repetition_penalty
- logit_bias
model_provider: deepseek

View File

@@ -0,0 +1,51 @@
id: deepseek/deepseek-r1-distill-qwen-1.5b
canonical_slug: deepseek/deepseek-r1-distill-qwen-1.5b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
name: 'DeepSeek: R1 Distill Qwen 1.5B'
type: chat
created: 1738328067
description: |-
DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks.
Other benchmark results include:
- AIME 2024 pass@1: 28.9
- AIME 2024 cons@64: 52.7
- MATH-500 pass@1: 83.9
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: deepseek-r1
pricing:
prompt: '0.00000018'
completion: '0.00000018'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: deepseek

View File

@@ -0,0 +1,52 @@
id: deepseek/deepseek-r1-distill-qwen-14b
canonical_slug: deepseek/deepseek-r1-distill-qwen-14b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
name: 'DeepSeek: R1 Distill Qwen 14B'
type: chat
created: 1738193940
description: |-
DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
Other benchmark results include:
- AIME 2024 pass@1: 69.7
- MATH-500 pass@1: 93.9
- CodeForces Rating: 1481
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
context_length: 64000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: deepseek-r1
pricing:
prompt: '0.00000015'
completion: '0.00000015'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- seed
- stop
- frequency_penalty
- presence_penalty
- top_k
- min_p
- repetition_penalty
- logit_bias
- response_format
model_provider: deepseek

View File

@@ -0,0 +1,43 @@
id: deepseek/deepseek-r1-distill-qwen-32b
canonical_slug: deepseek/deepseek-r1-distill-qwen-32b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
name: 'DeepSeek: R1 Distill Qwen 32B'
type: chat
created: 1738194830
description: 'DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI''s o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1''s outputs, enabling competitive performance comparable to larger frontier models.'
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: deepseek-r1
pricing:
prompt: '0.00000012'
completion: '0.00000018'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- seed
- stop
- frequency_penalty
- presence_penalty
- top_k
- min_p
- repetition_penalty
- logit_bias
- response_format
model_provider: deepseek

View File

@@ -0,0 +1,35 @@
id: deepseek/deepseek-r1-distill-qwen-7b
canonical_slug: deepseek/deepseek-r1-distill-qwen-7b
hugging_face_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
name: 'DeepSeek: R1 Distill Qwen 7B'
type: chat
created: 1748628237
description: DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
context_length: 131072
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: deepseek-r1
pricing:
prompt: '0.0000001'
completion: '0.0000002'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- seed
model_provider: deepseek

View File

@@ -0,0 +1,53 @@
id: deepseek/deepseek-r1
canonical_slug: deepseek/deepseek-r1
hugging_face_id: deepseek-ai/DeepSeek-R1
name: 'DeepSeek: R1'
type: chat
created: 1737381095
description: |-
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
MIT licensed: Distill & commercialize freely!
context_length: 128000
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: DeepSeek
instruct_type: deepseek-r1
pricing:
prompt: '0.00000045'
completion: '0.00000215'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- stop
- frequency_penalty
- presence_penalty
- seed
- top_k
- min_p
- logit_bias
- top_logprobs
- response_format
- structured_outputs
- logprobs
- repetition_penalty
- tools
- tool_choice
model_provider: deepseek

View File

@@ -0,0 +1,39 @@
id: eleutherai/llemma_7b
canonical_slug: eleutherai/llemma_7b
hugging_face_id: EleutherAI/llemma_7b
name: 'EleutherAI: Llemma 7b'
type: chat
created: 1744643225
description: Llemma 7B is a language model for mathematics. It was initialized with Code Llama 7B weights, and trained on the Proof-Pile-2 for 200B tokens. Llemma models are particularly strong at chain-of-thought mathematical reasoning and using computational tools for mathematics, such as Python and formal theorem provers.
context_length: 4096
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Other
instruct_type: code-llama
pricing:
prompt: '0.0000008'
completion: '0.0000012'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: eleutherai

View File

@@ -0,0 +1,44 @@
id: eva-unit-01/eva-llama-3.33-70b
canonical_slug: eva-unit-01/eva-llama-3.33-70b
hugging_face_id: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
name: EVA Llama 3.33 70B
type: chat
created: 1734377303
description: |
EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model
This model was built with Llama by Meta.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Llama3
instruct_type: llama3
pricing:
prompt: '0.000004'
completion: '0.000006'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: eva-unit-01

View File

@@ -0,0 +1,42 @@
id: eva-unit-01/eva-qwen-2.5-32b
canonical_slug: eva-unit-01/eva-qwen-2.5-32b
hugging_face_id: EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
name: EVA Qwen2.5 32B
type: chat
created: 1731104847
description: |-
EVA Qwen2.5 32B is a roleplaying/storywriting specialist model. It's a full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: chatml
pricing:
prompt: '0.0000026'
completion: '0.0000034'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: eva-unit-01

View File

@@ -0,0 +1,42 @@
id: eva-unit-01/eva-qwen-2.5-72b
canonical_slug: eva-unit-01/eva-qwen-2.5-72b
hugging_face_id: EVA-UNIT-01/EVA-Qwen2.5-72B-v0.1
name: EVA Qwen2.5 72B
type: chat
created: 1732210606
description: |-
EVA Qwen2.5 72B is a roleplay and storywriting specialist model. It's a full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
context_length: 16384
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Qwen
instruct_type: chatml
pricing:
prompt: '0.000004'
completion: '0.000006'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- repetition_penalty
- top_k
- min_p
- seed
model_provider: eva-unit-01

View File

@@ -0,0 +1,42 @@
id: google/gemini-2.0-flash-001
canonical_slug: google/gemini-2.0-flash-001
hugging_face_id: ''
name: 'Google: Gemini 2.0 Flash'
type: chat
created: 1738769413
description: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- text
- image
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.0000001'
completion: '0.0000004'
input_cache_read: '0.000000025'
input_cache_write: '0.0000001833'
request: '0'
image: '0.0000258'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,42 @@
id: google/gemini-2.0-flash-lite-001
canonical_slug: google/gemini-2.0-flash-lite-001
hugging_face_id: ''
name: 'Google: Gemini 2.0 Flash Lite'
type: chat
created: 1740506212
description: Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- text
- image
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.000000075'
completion: '0.0000003'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,44 @@
id: google/gemini-2.5-flash-lite-preview-06-17
canonical_slug: google/gemini-2.5-flash-lite-preview-06-17
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash Lite Preview 06-17'
type: chat
created: 1750173831
description: 'Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. '
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- file
- image
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.0000001'
completion: '0.0000004'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,44 @@
id: google/gemini-2.5-flash-preview-05-20:thinking
canonical_slug: google/gemini-2.5-flash-preview-05-20
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash Preview 05-20 (thinking)'
type: chat
created: 1747761924
description: "Gemini 2.5 Flash May 20th Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nNote: This model is available in two variants: thinking and non-thinking. The output pricing varies significantly depending on whether the thinking capability is active. If you select the standard variant (without the \":thinking\" suffix), the model will explicitly avoid generating thinking tokens. \n\nTo utilize the thinking capability and receive thinking tokens, you must choose the \":thinking\" variant, which will then incur the higher thinking-output pricing. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning)."
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- image
- text
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000015'
completion: '0.0000035'
input_cache_read: '0.0000000375'
input_cache_write: '0.0000002333'
request: '0'
image: '0.0006192'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,44 @@
id: google/gemini-2.5-flash-preview-05-20
canonical_slug: google/gemini-2.5-flash-preview-05-20
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash Preview 05-20'
type: chat
created: 1747761924
description: "Gemini 2.5 Flash May 20th Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nNote: This model is available in two variants: thinking and non-thinking. The output pricing varies significantly depending on whether the thinking capability is active. If you select the standard variant (without the \":thinking\" suffix), the model will explicitly avoid generating thinking tokens. \n\nTo utilize the thinking capability and receive thinking tokens, you must choose the \":thinking\" variant, which will then incur the higher thinking-output pricing. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning)."
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- image
- text
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000015'
completion: '0.0000006'
input_cache_read: '0.0000000375'
input_cache_write: '0.0000002333'
request: '0'
image: '0.0006192'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,39 @@
id: google/gemini-2.5-flash-preview:thinking
canonical_slug: google/gemini-2.5-flash-preview-04-17
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash Preview 04-17 (thinking)'
type: chat
created: 1744914667
description: "Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nNote: This model is available in two variants: thinking and non-thinking. The output pricing varies significantly depending on whether the thinking capability is active. If you select the standard variant (without the \":thinking\" suffix), the model will explicitly avoid generating thinking tokens. \n\nTo utilize the thinking capability and receive thinking tokens, you must choose the \":thinking\" variant, which will then incur the higher thinking-output pricing. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning)."
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- image
- text
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000015'
completion: '0.0000035'
input_cache_read: '0.0000000375'
input_cache_write: '0.0000002333'
request: '0'
image: '0.0006192'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- tools
- tool_choice
- stop
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,39 @@
id: google/gemini-2.5-flash-preview
canonical_slug: google/gemini-2.5-flash-preview-04-17
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash Preview 04-17'
type: chat
created: 1744914667
description: "Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nNote: This model is available in two variants: thinking and non-thinking. The output pricing varies significantly depending on whether the thinking capability is active. If you select the standard variant (without the \":thinking\" suffix), the model will explicitly avoid generating thinking tokens. \n\nTo utilize the thinking capability and receive thinking tokens, you must choose the \":thinking\" variant, which will then incur the higher thinking-output pricing. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning)."
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- image
- text
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000015'
completion: '0.0000006'
input_cache_read: '0.0000000375'
input_cache_write: '0.0000002333'
request: '0'
image: '0.0006192'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- tools
- tool_choice
- stop
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,44 @@
id: google/gemini-2.5-flash
canonical_slug: google/gemini-2.5-flash
hugging_face_id: ''
name: 'Google: Gemini 2.5 Flash'
type: chat
created: 1750172488
description: "Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning)."
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- file
- image
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.0000003'
completion: '0.0000025'
input_cache_read: '0.000000075'
input_cache_write: '0.0000003833'
request: '0'
image: '0.001238'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,43 @@
id: google/gemini-2.5-pro-exp-03-25
canonical_slug: google/gemini-2.5-pro-exp-03-25
hugging_face_id: ''
name: 'Google: Gemini 2.5 Pro Experimental'
type: chat
created: 1742922099
description: |-
This model has been deprecated by Google in favor of the (paid Preview model)[google/gemini-2.5-pro-preview]
 
Gemini 2.5 Pro is Googles state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- text
- image
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0'
completion: '0'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- tools
- tool_choice
- stop
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,40 @@
id: google/gemini-2.5-pro-preview-05-06
canonical_slug: google/gemini-2.5-pro-preview-03-25
hugging_face_id: ''
name: 'Google: Gemini 2.5 Pro Preview 05-06'
type: chat
created: 1746578513
description: Gemini 2.5 Pro is Googles state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- text
- image
- file
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000125'
completion: '0.00001'
input_cache_read: '0.00000031'
input_cache_write: '0.000001625'
request: '0'
image: '0.00516'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- tools
- tool_choice
- stop
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,45 @@
id: google/gemini-2.5-pro-preview
canonical_slug: google/gemini-2.5-pro-preview-06-05
hugging_face_id: ''
name: 'Google: Gemini 2.5 Pro Preview 06-05'
type: chat
created: 1749137257
description: |
Gemini 2.5 Pro is Googles state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- file
- image
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000125'
completion: '0.00001'
input_cache_read: '0.00000031'
input_cache_write: '0.000001625'
request: '0'
image: '0.00516'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,44 @@
id: google/gemini-2.5-pro
canonical_slug: google/gemini-2.5-pro
hugging_face_id: ''
name: 'Google: Gemini 2.5 Pro'
type: chat
created: 1750169544
description: Gemini 2.5 Pro is Googles state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
context_length: 1048576
architecture:
modality: text+image->text
input_modalities:
- file
- image
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000125'
completion: '0.00001'
input_cache_read: '0.00000031'
input_cache_write: '0.000001625'
request: '0'
image: '0.00516'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- tools
- tool_choice
- max_tokens
- temperature
- top_p
- reasoning
- include_reasoning
- structured_outputs
- response_format
- stop
- frequency_penalty
- presence_penalty
- seed
model_provider: google

View File

@@ -0,0 +1,46 @@
id: google/gemini-flash-1.5-8b
canonical_slug: google/gemini-flash-1.5-8b
hugging_face_id: ''
name: 'Google: Gemini 1.5 Flash 8B'
type: chat
created: 1727913600
description: |-
Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.
[Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/).
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
context_length: 1000000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.0000000375'
completion: '0.00000015'
input_cache_read: '0.00000001'
input_cache_write: '0.0000000583'
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- tools
- tool_choice
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,48 @@
id: google/gemini-flash-1.5
canonical_slug: google/gemini-flash-1.5
hugging_face_id: ''
name: 'Google: Gemini 1.5 Flash '
type: chat
created: 1715644800
description: |-
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.
Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
#multimodal
context_length: 1000000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.000000075'
completion: '0.0000003'
input_cache_read: '0.00000001875'
input_cache_write: '0.0000001583'
request: '0'
image: '0.00004'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- tools
- tool_choice
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,57 @@
id: google/gemini-pro-1.5
canonical_slug: google/gemini-pro-1.5
hugging_face_id: ''
name: 'Google: Gemini 1.5 Pro'
type: chat
created: 1712620800
description: |-
Google's latest multimodal model, supports image and video[0] in text or chat prompts.
Optimized for language tasks including:
- Code generation
- Text generation
- Text editing
- Problem solving
- Recommendations
- Information extraction
- Data extraction or generation
- AI agents
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
* [0]: Video input is not available through OpenRouter at this time.
context_length: 2000000
architecture:
modality: text+image->text
input_modalities:
- text
- image
output_modalities:
- text
tokenizer: Gemini
instruct_type: null
pricing:
prompt: '0.00000125'
completion: '0.000005'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0.0006575'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- tools
- tool_choice
- seed
- response_format
- structured_outputs
model_provider: google

View File

@@ -0,0 +1,45 @@
id: google/gemma-2-27b-it
canonical_slug: google/gemma-2-27b-it
hugging_face_id: google/gemma-2-27b-it
name: 'Google: Gemma 2 27B'
type: chat
created: 1720828800
description: |-
Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini).
Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
context_length: 8192
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: gemma
pricing:
prompt: '0.0000008'
completion: '0.0000008'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- top_k
- repetition_penalty
- logit_bias
- min_p
- response_format
model_provider: google

View File

@@ -0,0 +1,45 @@
id: google/gemma-2-9b-it
canonical_slug: google/gemma-2-9b-it
hugging_face_id: google/gemma-2-9b-it
name: 'Google: Gemma 2 9B'
type: chat
created: 1719532800
description: |-
Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
context_length: 8192
architecture:
modality: text->text
input_modalities:
- text
output_modalities:
- text
tokenizer: Gemini
instruct_type: gemma
pricing:
prompt: '0.0000002'
completion: '0.0000002'
input_cache_read: ''
input_cache_write: ''
request: '0'
image: '0'
web_search: '0'
internal_reasoning: '0'
unit: 1
currency: USD
supported_parameters:
- max_tokens
- temperature
- top_p
- stop
- frequency_penalty
- presence_penalty
- response_format
- top_logprobs
- logprobs
- logit_bias
- seed
model_provider: google

Some files were not shown because too many files have changed in this diff Show More