May 26, 2026

New LLM Providers: GitHub Models, Fireworks AI, Cerebras

Sanjay Senthilkumar

Three new LLM providers added — GitHub Models (multi-model via PAT), Fireworks AI (fastest open-model inference), and Cerebras (2000+ tok/s wafer-scale hardware).

Neural Inverse now supports 20 LLM providers. Three new providers ship today — all OpenAI-compatible, all zero-config beyond an API key.

GitHub Models

Access 40+ models from a single GitHub Personal Access Token. GPT-4.1, DeepSeek-R1, Llama 4, Grok-3, Mistral — all from one credential you already have.

Endpoint: https://models.github.ai/inference
Auth: GitHub PAT with models:read scope
Free tier: Available (rate-limited)
Default models: openai/gpt-4.1, openai/gpt-4.1-mini, openai/o4-mini, deepseek/deepseek-r1, meta/llama-4-scout-17b-16e-instruct, xai/grok-3-mini

Fireworks AI

Fastest open-model inference available. Native function calling for Power Mode agents. Sub-second latency on 70B+ models.

Endpoint: https://api.fireworks.ai/inference/v1
Auth: API key
Default models: llama-v3p3-70b-instruct, deepseek-r1, qwen3-235b-a22b, gemma-4-31b-it, gpt-oss-120b

Cerebras

Wafer-scale inference hardware generating 2000+ tokens per second. Makes autocomplete and inline edit feel instant.

Endpoint: https://api.cerebras.ai/v1
Auth: API key
Free tier: Available
Default models: llama3.1-8b, gpt-oss-120b, qwen-3-235b-a22b-instruct-2507

Setup

Open Settings > Neural Inverse > LLM Providers
Select the provider
Enter your API key
Select a model for each feature (Chat, Autocomplete, Ctrl+K, Power Mode)

All keys stay local. No proxy.

Was this page helpful?

PreviousWorkflow Composer

NextFirmware: Hardware Instruments, RTT, and Full MCU Coverage

New LLM Providers: GitHub Models, Fireworks AI, Cerebras

GitHub Models

Fireworks AI

Cerebras

Setup

Related