Neural Inverse is Open Source →
DocsModel Management

Model Management

Neural Inverse includes a built-in model management layer so you never have to leave the IDE to get a model running. From the Agent Manager (Cmd+Alt+A), you can install base models locally in one click, deploy private GPU endpoints to AWS or Azure, and see the live status of every model available in your environment.


Quick Install (Simple Mode)

The Models tab opens in Simple mode by default. It shows a curated grid of base models organized by category — Code, Chat, Reasoning, and Multimodal.

Each card shows:

  • Model name and originating org (e.g. Meta, Mistral AI, Google)
  • Parameter count badge (7B, 13B, 70B, etc.)
  • One-click Install button that uses Ollama under the hood
  • Live download progress bar while pulling

How it works

Simple mode requires Ollama to be running locally. Neural Inverse detects it automatically. If Ollama is not running, a status pill at the top of the screen shows its state and links to the install page.

When you click Install, Neural Inverse calls the Ollama pull API (POST /api/pull) and streams the download progress directly into the card UI. No terminal window required.

Once installed, the model is auto-detected and registered in your Ollama provider settings — no manual configuration needed.

Curated models

CategoryModels available
CodeQwen2.5-Coder 7B/14B/32B, DeepSeek-Coder V2, CodeLlama 13B/34B, Codestral
ChatLlama 3.3 70B, Llama 3.2 3B, Phi-4 14B, Gemma 3 12B, Mistral 7B
ReasoningDeepSeek-R1 7B/14B/32B, QwQ 32B
MultimodalLLaVA 13B, Llava-Phi-3, BakLLaVA

Switching to Advanced mode

Click Advanced / Marketplace in the top-right of the Models tab to open the full model marketplace with search, filters, provider selection, and cloud deployment options.


Model Marketplace (Advanced Mode)

The marketplace gives you access to the full model catalog across all providers. Use it when the curated list doesn't have what you need.

Sidebar controls:

  • Search box with live filtering
  • Provider filter chips (Ollama, vLLM, HuggingFace, Cloud)
  • Category filters (Code, Chat, Vision, Embedding, etc.)

Model detail pane:

  • Full description, parameter counts, quantization options
  • Provider compatibility
  • Install / Deploy button

From the marketplace you can also trigger cloud deployments directly — select a model, choose Deploy to Cloud, and the wizard opens.


Cloud Deployment

Neural Inverse can provision a private GPU instance on AWS or Azure running vLLM — fully within your own cloud account. Your data never touches Neural Inverse infrastructure.

Prerequisites

Before deploying you need stored cloud credentials:

  1. Open Agent ManagerSettingsCloud Credentials
  2. Select AWS or Azure and enter your credentials
  3. Neural Inverse validates the format and tests connectivity before storing

Credentials are encrypted via the OS secret store (ISecretStorageService) — never written to disk in plaintext.

Deployment wizard

The cloud deploy wizard walks through:

StepWhat happens
Instance selectionPick GPU type, vCPUs, RAM, storage. Cost/hour shown per option.
RegionChoose deployment region. Region format is validated to prevent misconfiguration.
ModelConfirm the model to serve.
DeployNeural Inverse runs the provisioning sequence and streams live timeline logs.

The provisioning sequence runs in a dedicated terminal:

  1. Creates a security group with IP-restricted inbound rules (your IP only)
  2. Launches the instance with IMDSv2 enforced
  3. Installs CUDA, Python, and vLLM via cloud-init
  4. Starts vLLM as a systemd service with auto-restart and a generated API key
  5. Polls the health endpoint until it responds (up to 15 minutes, with retry counter)

You can Abort at any time during provisioning. The wizard shows elapsed time and the current step.

Deployment states

StatusMeaning
provisioningInstance is being created
runningEndpoint is live and healthy
unreachableHealth check failed — may be a transient network issue
stoppingTeardown in progress
stoppedInstance terminated
errorProvisioning failed — see log for details

Security

  • API keys are randomly generated (32-byte hex) and stored encrypted
  • Security groups restrict port 8000 to your current IP at deploy time
  • IMDSv2 is enforced on all EC2 instances (no SSRF via metadata endpoint)
  • Azure tenant IDs are URI-encoded; region names are validated against an allowlist

Deployments Tab

The Deployments tab in Agent Manager gives you a live view of everything running in your model environment.

Local providers

The top section shows auto-detected local providers:

ProviderDefault portDetection
Ollama11434Health check every 30s
vLLM8000Health check every 30s
LM Studio1234Health check every 30s

Each row shows: status badge (Running / Stopped), endpoint URL, list of loaded models, and last-checked timestamp.

Cloud deployments

The bottom section lists all cloud deployments from your account with their current status, GPU type, model, region, cost/hour, and quick-action buttons (Open endpoint, Stop, Delete).

Live updates

The Deployments tab listens to the DeploymentRegistryService event bus. When a local provider comes up or goes down, or when a cloud deployment changes status, the UI updates in real time without requiring a manual refresh.


Auto-Configuration

When Neural Inverse detects a new deployment (local or cloud), it can automatically configure the corresponding provider in your LLM settings — but only if the provider is currently unconfigured.

Rules:

  • If you have already set a custom endpoint or API key, auto-config will not overwrite it
  • If the provider has _didFillInProviderSettings = true, auto-config skips it
  • A notification appears with OK, Undo, and Don't auto-configure options
  • Selecting "Don't auto-configure" dismisses future auto-config for that provider permanently (stored per profile)

What gets configured:

Deployment typeWhat is set
Local (Ollama/vLLM/LM Studio)Available models registered; provider enabled
Cloud (vLLM endpoint)Endpoint URL, API key, and model registered; provider enabled

Auto-config rules are stored and can be reviewed or reverted from the Deployments tab.


Enterprise

Model Management for Teams is available on the Enterprise plan. View pricing or contact sales.

Neural Inverse Enterprise adds organization-wide model management:

  • Private cloud model registry — deploy models once, share endpoints across your entire engineering team within your org's private cloud and network
  • Policy enforcement — define which models developers can use, block unapproved providers at the org level
  • Centralized credential management — cloud credentials stored and rotated at the org level; developers never handle raw AWS/Azure keys
  • Local model governance — IT can define approved local model lists; developers can install from the approved catalog only
  • Usage telemetry — token counts, model usage by developer, cost attribution per team or project
  • SSO-gated deployment access — only authorized roles can provision or terminate cloud GPU instances

Contact sales@neuralinverse.com for early access.



Was this page helpful?