AI CLI Integration with PAL MCP

Complete guide to setting up Codex CLI, Gemini CLI, GitHub Copilot CLI, and Claude Desktop with PAL MCP Server for multi-model orchestration with your local Ollama models.

Supported CLIs

CLI	Config Location	Format
Claude Desktop	`%APPDATA%\Claude\claude_desktop_config.json`	JSON
Codex CLI	`~/.codex/config.toml`	TOML
Gemini CLI	`~/.gemini/settings.json`	JSON
Copilot CLI	`~/.copilot/mcp-config.json`	JSON

Why PAL MCP?

Combined with PAL MCP Server, you get:

✅ Multi-model orchestration - Route tasks to best model automatically
✅ Local-first - All models run on your RTX 5090 via Ollama
✅ Consensus workflows - Compare responses across multiple models
✅ Advanced tools - Code review, debugging, security audits, etc.
✅ Web search - Integrated with your SearXNG instance

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Codex CLI Workflow                       │
│                                                              │
│  User: "Review this code for security issues"               │
│                                                              │
│  1. Codex CLI → PAL MCP Server (via conda env)              │
│  2. PAL → Ollama (localhost:11434)                          │
│  3. PAL orchestrates: qwen2.5-coder:32b-5090                │
│  4. Optional: Consensus with deepseek-r1:32b-5090           │
│  5. Codex synthesizes final response                         │
└─────────────────────────────────────────────────────────────┘

Prerequisites

Ollama running on localhost:11434 with models installed
PAL MCP Server installed and configured
Conda environment (if using conda setup) or Python venv
Codex CLI installed (npm install -g @openai/codex-cli)

Installation

Step 1: Install Codex CLI

# Using npm
npm install -g @openai/codex-cli

# Verify installation
codex --version
# Output: codex-cli 0.77.0

Step 2: Choose Your PAL MCP Setup Method

We support two methods: Conda (recommended if you use conda) or Virtual Environment (standard Python).

Method A: Conda Setup (Recommended)

Why Conda?

✅ Environment isolation (no conflicts with system Python)
✅ Easier dependency management (conda update)
✅ Same environment can be shared with Claude Desktop
✅ Better for Windows users (no PATH issues)

1. Create Conda Environment

# Create environment
conda create -n pal-mcp python=3.11 -y

# Activate and install dependencies
conda activate pal-mcp
pip install anthropic openai google-generativeai requests mcp

2. Add to Claude Desktop (Optional)

If you also use Claude Desktop, add to %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "pal": {
      "command": "C:\\Users\\YOUR_USERNAME\\anaconda3\\condabin\\conda.bat",
      "args": [
        "run",
        "-n",
        "pal-mcp",
        "--no-capture-output",
        "python",
        "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"
      ]
    }
  }
}

3. Create Codex Configuration

Save as C:\Users\YOUR_USERNAME\.codex\config.toml:

# Codex CLI Configuration for Windows + Conda
# Location: C:\Users\YOUR_USERNAME\.codex\config.toml

# ============================================
# Model Providers Configuration
# ============================================

[model_providers.ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"
wire_api = "responses"  # Fixes deprecation warning

# ============================================
# Model Profiles
# ============================================
# Quick reference:
#   -p qwen-code    → qwen2.5-coder:32b-5090 (19GB, best local coding)
#   -p deepseek     → deepseek-r1:32b-5090   (19GB, reasoning with CoT)
#   -p fast         → qwen2.5:3b             (2GB, quick queries)
#   -p medium       → qwen2.5-coder:14b      (9GB, balanced)

[profiles.qwen-code]
model_provider = "ollama"
model = "qwen2.5-coder:32b-5090"

[profiles.deepseek]
model_provider = "ollama"
model = "deepseek-r1:32b-5090"

[profiles.fast]
model_provider = "ollama"
model = "qwen2.5:3b"

[profiles.medium]
model_provider = "ollama"
model = "qwen2.5-coder:14b"

# ============================================
# MCP Servers Configuration - Using Conda
# ============================================

[mcp_servers.pal]
command = "C:\\Users\\YOUR_USERNAME\\anaconda3\\condabin\\conda.bat"
args = [
    "run",
    "-n",
    "pal-mcp",
    "--no-capture-output",
    "python",
    "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"
]

# Timeouts - from official PAL guide
startup_timeout_sec = 300   # 5 minutes for conda env activation
tool_timeout_sec = 1200      # 20 minutes for tool execution

[mcp_servers.pal.env]
OLLAMA_BASE_URL = "http://localhost:11434"
DEFAULT_MODEL = "auto"  # Let PAL auto-select best model

# ============================================
# Features Configuration
# ============================================

[features]
web_search_request = true  # Enable web search for PAL's apilookup

# ============================================
# Default Settings
# ============================================

[defaults]
provider = "ollama"
model = "qwen2.5-coder:32b-5090"

Important: Replace YOUR_USERNAME with your actual Windows username!

4. Test the Setup

# Start Codex
codex

# Inside Codex, test MCP connection
/mcp

# Should show:
# • pal
#   • Status: enabled
#   • Tools: analyze, chat, codereview, consensus, debug, etc.

# Test PAL integration
> Use pal to list available models

# Test a simple query
> Write a hello world in Python

Method B: Virtual Environment Setup

Why Virtual Environment?

✅ Standard Python approach (no conda needed)
✅ Lighter weight
✅ Faster startup (no conda activation overhead)

1. Create Virtual Environment

# Clone PAL MCP Server
cd C:\Users\YOUR_USERNAME\code
git clone https://github.com/BeehiveInnovations/pal-mcp-server
cd pal-mcp-server

# Create virtual environment
python -m venv .pal_venv

# Activate and install dependencies
.\.pal_venv\Scripts\Activate.ps1
pip install anthropic openai google-generativeai requests mcp

2. Create Codex Configuration

Save as C:\Users\YOUR_USERNAME\.codex\config.toml:

# Codex CLI Configuration (Virtual Environment)

[model_providers.ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"
wire_api = "responses"

# ... (same profiles as conda version) ...

# ============================================
# MCP Servers Configuration - Using venv
# ============================================

[[mcp_servers]]
name = "pal"
transport = "stdio"
command = "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\.pal_venv\\Scripts\\python.exe"
args = ["C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"]
tool_timeout_sec = 1200

[mcp_servers.pal.env]
OLLAMA_BASE_URL = "http://localhost:11434"
DEFAULT_MODEL = "auto"

[features]
web_search_request = true

[defaults]
provider = "ollama"
model = "qwen2.5-coder:32b-5090"

Key Differences from Conda:

Uses [[mcp_servers]] syntax (array) instead of [mcp_servers.pal] (table)
Includes transport = "stdio" explicitly
Points directly to Python executable in venv
No conda.bat wrapper
No startup_timeout_sec (venv starts instantly)

Configuration Explained

Model Providers

[model_providers.ollama]
base_url = "http://localhost:11434/v1"  # Ollama's OpenAI-compatible API
wire_api = "responses"                   # New API format (fixes deprecation warning)

Model Profiles

Profiles let you switch models with -p flag:

codex -p fast       # Uses qwen2.5:3b (2GB VRAM)
codex -p deepseek   # Uses deepseek-r1:32b-5090 (19GB VRAM)
codex -p qwen-code  # Uses qwen2.5-coder:32b-5090 (19GB VRAM)

Add profiles for all your installed models:

[profiles.custom-model]
model_provider = "ollama"
model = "your-model:tag"

MCP Server Timeouts

startup_timeout_sec = 300   # Conda env activation can take time
tool_timeout_sec = 1200     # PAL operations can be long (consensus, etc.)

Why these values?

Conda activation: ~10-30s (we allow 5 minutes for safety)
PAL tools (consensus, codereview): 5-20 minutes depending on complexity

Features

[features]
web_search_request = true  # Required for PAL's apilookup tool

What this enables:

PAL can fetch latest API documentation
Web search integration with SearXNG (if configured)
Real-time information lookup

Usage Examples

Basic Chat

# Interactive mode
codex
> Write a function to parse JSON

# One-off query
codex chat "explain async/await in Python"

Using PAL Tools

codex
> Use pal codereview to review src/auth.py

> Use pal thinkdeep with deepseek-r1:32b-5090 to debug this memory leak

> Use pal consensus with fast and medium to evaluate this API design

Model Switching

# Quick queries with fast model
codex -p fast
> What's new in Python 3.13?

# Deep analysis with reasoning model
codex -p deepseek
> Analyze the time complexity of this algorithm

# Default coding model
codex
> Refactor this class to use dependency injection

Comparison: Conda vs Virtual Environment

Feature	Conda	Virtual Environment
Startup speed	~10-30s	~1-2s
Environment isolation	Excellent	Good
Dependency management	`conda update`	`pip install -U`
Windows compatibility	Excellent	Good
Disk space	~500MB	~200MB
Shared with Claude Desktop	✅ Yes	❌ No
Complexity	Medium	Low
Recommended for	Windows users, conda users	Linux/macOS, Python developers

Common Issues We Fixed

During setup, we encountered and fixed several issues:

Issue 1: `wire_api` Deprecation Warning

Problem:

⚠ Support for the "chat" wire API is deprecated

Solution: Set wire_api = "responses" in model provider:

[model_providers.ollama]
wire_api = "responses"  # Not "chat"

Issue 2: Invalid `history.persistence` Value

Problem:

Error: unknown variant `local`, expected `save-all` or `none`

Solution:

[history]
persistence = "save-all"  # Not "local"

Issue 3: Invalid `tui.notifications` Syntax

Problem:

Error: data did not match any variant of untagged enum Notifications

Solution: Remove the broken notifications config entirely. The default works fine:

# DON'T use this (broken):
# [tui]
# notifications = { types = ["agent-turn-complete"] }

# Just omit it - defaults work fine

Issue 4: Terminal Display Garbled (Character-by-Character Output)

Problem: Each token appears on separate line with "Worked for Xs" between them:

• name
─ Worked for 13s ──────
• ":
─ Worked for 13s ──────
• "

Root Cause: Terminal width detection failing in MinGW/Git Bash/default PowerShell

Solutions (Try in Order):

A. Use Windows Terminal (BEST FIX)

# Install Windows Terminal
winget install Microsoft.WindowsTerminal

# Run Codex in it
wt codex

B. Increase PowerShell Window Width

Right-click PowerShell title bar → Properties
Layout tab → Window Size → Width: 120 (or more)
Restart PowerShell and try codex again

C. Run in WSL (if you have it)

# In WSL
codex
# Terminal detection works better in WSL

D. Use Chat Mode Instead of TUI

# Skip the broken TUI entirely
codex chat "your question here"

E. Force Environment Variable

$env:COLUMNS = 120
$env:LINES = 40
codex

Issue 5: Web Search Deprecated Warning

Problem:

⚠ `tools.web_search` is deprecated. Use `[features].web_search_request`

Solution:

# OLD (deprecated)
[tools]
web_search = true

# NEW (correct)
[features]
web_search_request = true

Issue 6: Slow PowerShell Startup (2732ms Profile Load)

Problem:

Loading personal and system profiles took 2732ms.

Root Cause: Your PowerShell profile ($PROFILE) has slow initialization (conda, Oh My Posh, etc.)

Solutions:

Option A: Skip Profile When Running Codex

powershell -NoProfile -Command "codex"

Option B: Create a Fast Alias Add to your $PROFILE:

# Fast alias that doesn't reload profile
function cx { codex.exe $args }

Option C: Optimize Your PowerShell Profile

# Find what's slow
Measure-Command { . $PROFILE }

# Comment out slow sections (conda init, Oh My Posh, etc.)

Troubleshooting

PAL MCP not connecting

# Check if conda env exists
conda env list | Select-String "pal-mcp"

# Test manual activation
conda activate pal-mcp
python C:\Users\YOUR_USERNAME\code\pal-mcp-server\server.py

Models not appearing

# Verify Ollama is running
ollama list

# Check PAL can reach Ollama
curl http://localhost:11434/api/tags

Wire API deprecation warning

Make sure you have wire_api = "responses" in your config:

[model_providers.ollama]
wire_api = "responses"  # Not "chat"

Slow PowerShell startup (2732ms)

This is your PowerShell profile loading, not Codex. To skip:

powershell -NoProfile -Command "codex"

Or create an alias in your profile:

function cx { codex.exe $args }

Advanced Configuration

Add More Ollama Models

[profiles.uncensored]
model_provider = "ollama"
model = "dolphin3:8b-5090"

[profiles.vision]
model_provider = "ollama"
model = "NeuralNexusLab/CodeXor:12b"  # Supports vision

Mix Local + Cloud Models

# Add OpenAI provider
[model_providers.openai]
name = "OpenAI"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"

[profiles.gpt4]
model_provider = "openai"
model = "gpt-4"

Then in PAL env:

[mcp_servers.pal.env]
OLLAMA_BASE_URL = "http://localhost:11434"
OPENAI_API_KEY = "sk-..."  # For cloud models

Restrict PAL to Specific Models

[mcp_servers.pal.env]
OLLAMA_ALLOWED_MODELS = "qwen2.5-coder:32b-5090,deepseek-r1:32b-5090"

Disable PAL Tools You Don't Use

[mcp_servers.pal.env]
DISABLED_TOOLS = "analyze,refactor,testgen,secaudit,docgen,tracer"

Integration with This Repo's Setup

If you followed the main setup in this repo, you already have:

✅ Ollama installed with -5090 optimized models
✅ SearXNG running on localhost:4000
✅ Open WebUI with web search configured

Codex CLI complements Open WebUI:

Feature	Open WebUI	Codex CLI
Interface	Browser	Terminal
Multi-model	Manual dropdown	Automatic routing via PAL
Web search	SearXNG injection	Native + PAL orchestration
Workflows	Single model	Multi-model consensus
Use case	Chat, experimentation	Coding, automation, CI/CD

Recommended workflow:

Use Open WebUI for interactive chat and experimentation
Use Codex CLI for coding tasks, code review, automation

Files Reference

File	Purpose
`~/.codex/config.toml`	Codex CLI configuration
`pal-mcp-server/.env`	PAL environment variables (optional)
`custom_models.json`	Model definitions for PAL (from this repo)
`~/.codex/history.jsonl`	Session history

Next Steps

Try PAL tools: /mcp in Codex to see all available tools
Set up aliases: See setup-codex-aliases.ps1 in this repo
Integrate with CI/CD: Use codex chat in GitHub Actions
Explore consensus workflows: Compare multiple models on same task

Gemini CLI Setup (Bonus)

Google's Gemini CLI also supports MCP! The setup is nearly identical to Codex.

Installation

# Install Gemini CLI
npm install -g @google/generative-ai-cli

# Verify installation
gemini --version

Configuration

Gemini uses ~/.gemini/settings.json instead of a TOML file:

Copy the template:

cp gemini-settings.json ~/.gemini/settings.json

Edit and replace YOUR_USERNAME with your actual Windows username.

Gemini settings.json structure:

{
  "mcpServers": {
    "pal": {
      "command": "C:\\Users\\YOUR_USERNAME\\anaconda3\\condabin\\conda.bat",
      "args": [
        "run",
        "-n",
        "pal-mcp",
        "--no-capture-output",
        "python",
        "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"
      ],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "DEFAULT_MODEL": "auto"
      }
    }
  },
  "security": {
    "auth": {
      "selectedType": "oauth-personal"
    }
  }
}

Usage

# Launch Gemini CLI
gemini

# Check MCP servers
gemini mcp list

# Use PAL tools (similar to Codex)
gemini "Use pal to list available models"

Key Differences: Codex vs Gemini

Feature	Codex CLI	Gemini CLI
Config format	TOML	JSON
Config location	`~/.codex/config.toml`	`~/.gemini/settings.json`
Model profiles	Built-in (via config)	Set with `-m` flag
Provider	OpenAI (Ollama via OSS mode)	Google Gemini (cloud)
MCP support	✅ Yes	✅ Yes
Local models	✅ Via Ollama	❌ Cloud only (but PAL uses Ollama)
Web search	Native feature flag	Not built-in (use PAL)
Best for	Local-first, coding	Gemini-specific features

Recommendation: Use Codex CLI for local-first workflows with Ollama. Use Gemini CLI if you want Google's latest models alongside your local Ollama models via PAL.

GitHub Copilot CLI Setup

GitHub Copilot CLI (gh copilot) can also use PAL MCP Server for multi-model orchestration.

Installation

# Install GitHub Copilot CLI extension
gh extension install github/copilot-cli

# Verify installation
gh copilot --version
# Output: @github/copilot-cli version 0.0.374

Configuration

Create ~/.copilot/mcp-config.json:

Conda Setup (recommended):

{
  "mcpServers": {
    "pal": {
      "type": "local",
      "command": "C:\\Users\\YOUR_USERNAME\\anaconda3\\condabin\\conda.bat",
      "tools": ["*"],
      "args": [
        "run",
        "-n",
        "pal-mcp",
        "--no-capture-output",
        "python",
        "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"
      ],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "DEFAULT_MODEL": "auto"
      }
    }
  }
}

Python venv Setup:

{
  "mcpServers": {
    "pal": {
      "type": "local",
      "command": "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\.pal_venv\\Scripts\\python.exe",
      "tools": ["*"],
      "args": [
        "C:\\Users\\YOUR_USERNAME\\code\\pal-mcp-server\\server.py"
      ],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "DEFAULT_MODEL": "auto"
      }
    }
  }
}

Critical Requirement

The "tools": ["*"] field is REQUIRED for Copilot CLI. Without it, you'll get:

Failed to start MCP Servers

This is different from Claude Desktop and Gemini which don't require this field.

Usage

# Launch Copilot CLI
gh copilot

# Allow PAL tools
gh copilot --allow-tool pal:*

# Use PAL for multi-model consensus
gh copilot "Use pal to get a consensus on the best approach"

CLI Comparison Matrix

Feature	Claude Desktop	Codex CLI	Gemini CLI	Copilot CLI
Config format	JSON	TOML	JSON	JSON
Config location	`%APPDATA%\Claude\*`	`~/.codex/*`	`~/.gemini/*`	`~/.copilot/*`
Requires `tools` array	No	No	No	Yes
Best for	Chat UI	Local coding	Gemini models	GitHub integration
MCP support	✅	✅	✅	✅

Multi-Agent Workflows - Run multiple CLIs together (Coder + Observer pattern)
PAL MCP Server
Codex CLI Documentation
Gemini CLI Documentation
GitHub Copilot CLI
Model Selection Guide

Supported CLIs​

Why PAL MCP?​

Architecture​

Prerequisites​

Installation​

Step 1: Install Codex CLI​

Step 2: Choose Your PAL MCP Setup Method​

Method A: Conda Setup (Recommended)​

Why Conda?​

1. Create Conda Environment​

2. Add to Claude Desktop (Optional)​

3. Create Codex Configuration​

4. Test the Setup​

Method B: Virtual Environment Setup​

Why Virtual Environment?​

1. Create Virtual Environment​

2. Create Codex Configuration​

Configuration Explained​

Model Providers​

Model Profiles​

MCP Server Timeouts​

Features​

Usage Examples​

Basic Chat​

Using PAL Tools​

Model Switching​

Comparison: Conda vs Virtual Environment​

Common Issues We Fixed​

Issue 1: wire_api Deprecation Warning​

Issue 2: Invalid history.persistence Value​

Issue 3: Invalid tui.notifications Syntax​

Issue 4: Terminal Display Garbled (Character-by-Character Output)​

Issue 5: Web Search Deprecated Warning​

Issue 6: Slow PowerShell Startup (2732ms Profile Load)​

Troubleshooting​

PAL MCP not connecting​

Models not appearing​

Wire API deprecation warning​

Slow PowerShell startup (2732ms)​

Advanced Configuration​

Add More Ollama Models​

Mix Local + Cloud Models​

Restrict PAL to Specific Models​

Disable PAL Tools You Don't Use​

Integration with This Repo's Setup​

Files Reference​

Next Steps​

Gemini CLI Setup (Bonus)​

Installation​

Configuration​

Usage​

Key Differences: Codex vs Gemini​

GitHub Copilot CLI Setup​

Installation​

Configuration​

Usage​

CLI Comparison Matrix​

Related Documentation​

Supported CLIs

Why PAL MCP?

Architecture

Prerequisites

Installation

Step 1: Install Codex CLI

Step 2: Choose Your PAL MCP Setup Method

Method A: Conda Setup (Recommended)

Why Conda?

1. Create Conda Environment

2. Add to Claude Desktop (Optional)

3. Create Codex Configuration

4. Test the Setup

Method B: Virtual Environment Setup

Why Virtual Environment?

1. Create Virtual Environment

2. Create Codex Configuration

Configuration Explained

Model Providers

Model Profiles

MCP Server Timeouts

Features

Usage Examples

Basic Chat

Using PAL Tools

Model Switching

Comparison: Conda vs Virtual Environment

Common Issues We Fixed

Issue 1: `wire_api` Deprecation Warning

Issue 2: Invalid `history.persistence` Value

Issue 3: Invalid `tui.notifications` Syntax

Issue 4: Terminal Display Garbled (Character-by-Character Output)

Issue 5: Web Search Deprecated Warning

Issue 6: Slow PowerShell Startup (2732ms Profile Load)

Troubleshooting

PAL MCP not connecting

Models not appearing

Wire API deprecation warning

Slow PowerShell startup (2732ms)

Advanced Configuration

Add More Ollama Models

Mix Local + Cloud Models

Restrict PAL to Specific Models

Disable PAL Tools You Don't Use

Integration with This Repo's Setup

Files Reference

Next Steps

Gemini CLI Setup (Bonus)

Installation

Configuration

Usage

Key Differences: Codex vs Gemini

GitHub Copilot CLI Setup

Installation

Configuration

Usage

CLI Comparison Matrix

Related Documentation