What LLM for MCP

KiriUbu · August 11, 2025, 9:23am

Does anyone have experience using a local LLM with MCP to control ERPNext

I feel that most LLMs — even the big ones — are too dumb to use the provided tools properly. It kind of defeats the whole purpose of having the AI if I have to explain every single step in detail.

Does anyone have recommendations for models that actually handle tool usage well? Or is there perhaps a dataset I could fine-tune on?

I’m happy to share my own experiences in return.
For reference, I’m running an RTX 4090, so I should have more than enough horsepower to run a decent-sized model.

Priytesh · August 12, 2025, 10:09am

Yeah, I’ve tinkered a bit with local LLM + MCP (Model Context Protocol) setups for ERPNext automation, so I can share both my experience and some ideas to get better tool usage.

You’re right — most LLMs, even the high-end ones, tend to either under-use tools or misuse them unless you spoon-feed the steps. That’s partly because general-purpose models aren’t trained to act like a reliable “agent” out of the box — they’re trained to chat, not to robustly call structured tools.

1. Models That Handle Tools Well (Locally)

Here’s what I’ve found for RTX 4090 running local inference:

Model	Strength for MCP / Tool Use	VRAM Req (16-bit)	Notes
Mistral-Nemo-Instruct-2407	Very good reasoning + compact context window handling	~15GB	Tends to follow JSON tool schemas without much fuss.
LLaMA 3.1 70B Instruct (quantized to Q4_K_M)	Strong reasoning, better at multi-step tool chains	~12GB (Q4), ~40GB FP16	Needs good prompting; slower than smaller models.
Nous-Hermes 2 Mistral 7B	Surprisingly obedient to function-calling patterns	~8GB	Lower hallucination rate for ERPNext API workflows.
Deepseek-Coder-V2 16B	Excellent at ERPNext Python/JS code generation for tools	~16GB	Combine with Mistral for hybrid reasoning+coding.

If you want plug-and-play MCP + local model, Ollama or LM Studio are easiest to run with your GPU and to hook into MCP servers.

2. Why They Fail at Tools

Even strong models fail in 3 common ways:

Over-responding — returns chatty explanations instead of structured tool calls.
Under-chaining — fails to combine multiple tool calls to reach the final goal.
Schema drift — produces invalid JSON or mismatched field names.

This happens because:

Training data lacks ERPNext-style API + MCP usage examples.
Models optimize for sounding right, not completing the task with minimal steps.

3. How to Improve Tool Use

a. Prompt Engineering

Use a strict system message in MCP:

pgsql

You are an autonomous ERPNext operator.
Only use provided tools to take actions.
Never explain steps, just execute them.
If a tool is missing, ask for it.

And enforce JSON-only responses with a parser that rejects bad output.

b. Few-shot Examples

Feed examples of correct ERPNext API / MCP calls:

json

{"name":"get_purchase_invoice","arguments":{"invoice_id":"PINV-0001"}}

Show both simple and multi-step scenarios so it learns to chain.

c. Fine-tuning / LoRA

You could fine-tune on:

ERPNext REST API calls and responses.
MCP conversation logs where you correct tool misuse.
Public datasets like ToolBench + ERPNext-specific samples.

LoRA fine-tuning with QLoRA on your 4090 works well — you only need a few thousand well-curated examples to make a big difference in tool obedience.

4. My Working Setup

Ollama running Mistral-Nemo-Instruct locally.
Custom MCP server exposing ERPNext REST and whitelisted DB queries.
Preloaded prompt context with:
- ERPNext doctype → API method mapping.
- Tool usage examples.
Guardrail layer that validates tool JSON and retries up to 3 times if invalid.

This now lets me say:

“Close all overdue Purchase Orders older than 60 days”
and the model just loops through MCP calls without me handholding it.

KiriUbu · August 12, 2025, 2:40pm

Thank you very much for your reply!
Could you perhaps share for which purposes or application areas you found MCP to be most suitable, and with which commands you achieved the best results?

It feels like you can do almost anything with MCP. My hope is to use it to help new, inexperienced ERPNext users get started more easily — to bridge the gap between knowing what they want to do but not knowing where or what to click in the system.

Yamen_Zakhour · August 12, 2025, 2:50pm

KiriUbu · August 26, 2025, 6:40am

I tried my best by choosing different LLMs, using pre-prompts, few-shot examples, etc., but I don’t think the technology is quite there yet. Maybe I’m trying to do something that’s not really feasible — like copying and pasting some information about a resistor and expecting the LLM to create a new item out of it. I need to spoon-feed the AI to get usable results, be patient, and know exactly what I’m doing. Often, it’s still faster to complete most tasks myself. Perhaps if there were a well-curated training set of user prompts and MCP commands specifically for ERPNext/Frappe, this technology would work better. For now, I cannot use it reliably in a production environment