New LLM Providers: GitHub Models, Fireworks AI, Cerebras
Three new LLM providers added — GitHub Models (multi-model via PAT), Fireworks AI (fastest open-model inference), and Cerebras (2000+ tok/s wafer-scale hardware).
Neural Inverse now supports 20 LLM providers. Three new providers ship today — all OpenAI-compatible, all zero-config beyond an API key.
GitHub Models
Access 40+ models from a single GitHub Personal Access Token. GPT-4.1, DeepSeek-R1, Llama 4, Grok-3, Mistral — all from one credential you already have.
- Endpoint:
https://models.github.ai/inference - Auth: GitHub PAT with
models:readscope - Free tier: Available (rate-limited)
- Default models:
openai/gpt-4.1,openai/gpt-4.1-mini,openai/o4-mini,deepseek/deepseek-r1,meta/llama-4-scout-17b-16e-instruct,xai/grok-3-mini
Fireworks AI
Fastest open-model inference available. Native function calling for Power Mode agents. Sub-second latency on 70B+ models.
- Endpoint:
https://api.fireworks.ai/inference/v1 - Auth: API key
- Default models:
llama-v3p3-70b-instruct,deepseek-r1,qwen3-235b-a22b,gemma-4-31b-it,gpt-oss-120b
Cerebras
Wafer-scale inference hardware generating 2000+ tokens per second. Makes autocomplete and inline edit feel instant.
- Endpoint:
https://api.cerebras.ai/v1 - Auth: API key
- Free tier: Available
- Default models:
llama3.1-8b,gpt-oss-120b,qwen-3-235b-a22b-instruct-2507
Setup
- Open Settings > Neural Inverse > LLM Providers
- Select the provider
- Enter your API key
- Select a model for each feature (Chat, Autocomplete, Ctrl+K, Power Mode)
All keys stay local. No proxy.
Related
- LLM Providers Documentation
- BYOLLM Contributing Guide
- GitHub Issues: #53, #55, #56
Copyright 2026 Neural Inverse Inc.