Model Management

Run any model, anywhere — directly from the IDE.

Neural Inverse ships a full model management layer inside the IDE. Install base models with one click, deploy private GPU endpoints to your own AWS or Azure account, and see the live status of every model in your environment — all without opening a terminal or leaving the editor.

Read the Docs Get Started Free

Quick Install

Open Agent Manager (Cmd+Alt+A) → Models tab. Pick a model, click Install. That's it.

Neural Inverse detects your local Ollama instance automatically and streams the download directly into the card — no terminal, no manual ollama pull.

Category	Available models
Code	Qwen2.5-Coder 7B / 14B / 32B, DeepSeek-Coder V2, CodeLlama 13B / 34B, Codestral
Chat	Llama 3.3 70B, Llama 3.2 3B, Phi-4 14B, Gemma 3 12B, Mistral 7B
Reasoning	DeepSeek-R1 7B / 14B / 32B, QwQ 32B
Multimodal	LLaVA 13B, Llava-Phi-3, BakLLaVA

Once downloaded, the model is auto-detected and registered in your provider settings. No manual configuration.

Cloud Deployment

Deploy a private vLLM endpoint to your own AWS or Azure account in a few clicks. Your data never touches Neural Inverse infrastructure.

What Neural Inverse provisions for you:

GPU instance (your choice of type and region)
IP-restricted security group scoped to your IP only
vLLM installed and started as a systemd service with auto-restart
A generated API key so your endpoint is never open
IMDSv2 enforced on EC2 (no metadata SSRF)

The wizard shows a live provisioning timeline — every step as it happens, elapsed time, and an Abort button if you need to cancel. Deployments that get stuck are automatically recovered.

Deployments Tab

A live view of everything running in your model environment.

Local providers — Ollama, vLLM, LM Studio detected automatically via health checks every 30 seconds
Cloud deployments — all your AWS/Azure GPU instances with status, cost/hour, and quick actions
Live updates — status changes appear in real time without manual refresh

Auto-Configuration

When Neural Inverse detects a new deployment, it automatically configures the matching provider in your LLM settings — but only if you haven't already set it up.

Skips configuration if you've set a custom endpoint, API key, or models list
Shows a notification with OK, Undo, and Don't auto-configure options
Dismissal is remembered permanently per provider

Enterprise

Neural Inverse Enterprise adds organization-wide model governance on top of the open source layer:

Neural Inverse is an open source AI-native IDE for critical software:

117 GitHub stars
357+ MCU variants supported
36 migration language pairs
Apache 2.0. Free and open source. No credit card required.

Private cloud model registry

Deploy models once to your org's private cloud or network and share the endpoint across every developer on the team. No one needs to manage their own instance.

Single deployment, team-wide access
Endpoint credentials managed centrally — developers never handle raw cloud keys
Works within your existing VPC / private network — no public internet exposure required

Policy enforcement

Define which models and providers developers are allowed to use. Unapproved providers are blocked at the IDE level.

Allowlist models by provider, parameter count, or capability type
Block cloud providers not approved by your infosec team
Policies propagate to all clients within 30 seconds of a change

Local model governance

IT defines an approved local model catalog. Developers install from the approved list only — no arbitrary ollama pull from the internet.

Curated catalog visible to developers; unapproved models hidden
Audit log of every model install and usage across the org
Works with Ollama, vLLM, and LM Studio

Centralized credential management

Cloud credentials (AWS, Azure) are stored and rotated at the org level. Developers connect to shared deployments without ever seeing raw access keys.

Usage telemetry

Token counts per developer and per project
Model usage breakdown by team
Cost attribution per deployment

SSO-gated deployment access

Only authorized roles can provision or terminate cloud GPU instances. Enforce via your existing SSO provider.

Talk to Sales sales@neuralinverse.com

FAQ

Does this require a Neural Inverse account?

No. Quick Install and cloud deployment work entirely with your own Ollama install and your own AWS/Azure account. Neural Inverse does not proxy any traffic.

What cloud providers are supported for GPU deployment?

AWS EC2 and Azure VM today. GCP is on the roadmap.

What happens if provisioning fails mid-way?

The wizard shows a live error log. You can retry or abort. Any deployment stuck in provisioning for more than 20 minutes is automatically transitioned to error state on next IDE start so it doesn't block the UI.

Can I use my own vLLM or Ollama endpoint instead of deploying a new one?

Yes. Neural Inverse auto-detects any Ollama, vLLM, or LM Studio instance running on localhost. For a remote endpoint, add it manually under Settings → LLM Providers.

How does auto-config interact with settings I've already set?

Auto-config only runs if the provider has no custom endpoint, no API key, and _didFillInProviderSettings is false. If you've touched any setting for that provider, auto-config skips it entirely.

What is the Enterprise pricing for model management?

Model governance and the private cloud registry are part of Neural Inverse Enterprise. Talk to sales for pricing.

LLM Providers (BYOLLM) — connect cloud providers
Model Management Docs — full reference
Contributing: Model Management — architecture guide for OSS contributors
Enterprise — full Enterprise feature set

Was this page helpful?