Switch Language
Toggle Theme

Ollama Modelfile Parameters Explained: A Complete Guide to Creating Custom Models

In the last article, we covered how to get models up and running. But there’s been a nagging issue—responses can be all over the place.

Here’s the thing: when I ask llama3.2 a simple coding question, sometimes I get a concise three-line answer, other times it writes me an entire essay. Crank the temperature up to 0.8, and it gets “creative”; dial it down to 0.1, and it becomes robotic, like it’s reciting from a textbook. Worse yet, every conversation requires re-setting the system prompt—copy-pasting until you’re sick of it.

Then I discovered Ollama’s Modelfile. Simply put, it’s a way to write a “personality resume” for your model—configure once, and it sticks forever. This article consolidates all the pitfalls I’ve encountered and tuning lessons I’ve learned, including optimization advice for 10 core parameters and 4 battle-tested templates you can use right away.

If you haven’t installed Ollama yet, I recommend checking out the previous getting-started article first. This is advanced content, assuming you already know how to use ollama run.

What is a Modelfile and Why You Need One

Think of Modelfile as a “configuration blueprint” for your model—similar to Dockerfile conceptually. You tell Ollama which model to use as a base, what parameters to set, what system prompt to include, then give it a name. After that, every time you call it by that name, all configurations activate automatically.

Simply put, it solves three pain points:

Pain Point 1: Repetitive Setup Every Time

You’ve been there: open terminal, ollama run llama3.2, type a system prompt. Next day, do it again. Third day, repeat… annoying, right? Modelfile locks in these configurations—set once, valid forever.

Pain Point 2: Unstable Output Style

The same model with different parameter settings produces wildly different outputs. A code assistant needs stable output; creative writing needs diverse exploration. You can’t possibly remember “oh right, this task needs temperature 0.3, that one needs 0.8”—Modelfile saves your presets directly.

Pain Point 3: Model Variant Management

You want a “code review llama3.2,” a “writing assistant llama3.2,” a “JSON output llama3.2.” What do you do? Copy the model three times? No need. Use Modelfile to create three named variants—under the hood, it’s still the same model file, just configured differently.

The basic workflow is just three steps:

# 1. Create a Modelfile
echo 'FROM llama3.2
SYSTEM "You are a code review expert"' > Modelfile

# 2. Use ollama create to generate a new model
ollama create my-coder -f Modelfile

# 3. Run it directly
ollama run my-coder

That simple. Now let’s dive into what you can put inside a Modelfile.

Modelfile Structure and 8 Core Instructions

Modelfile syntax is straightforward: comments use #, instructions start with uppercase keywords. Like this:

# This is a comment
FROM llama3.2
PARAMETER temperature 0.8
SYSTEM "You are a helpful assistant"

The whole file contains two things: comments and instructions. Instructions come in 8 types—here’s the complete overview:

InstructionPurposeRequired?When to Use
FROMSpecify base modelRequiredMust be in every file
PARAMETERSet inference parametersOptionalAdjust temperature, context, etc.
TEMPLATEPrompt templateOptionalCustomize conversation format
SYSTEMSystem messageOptionalDefine role and behavior
ADAPTERLoad LoRA adapterOptionalFor model fine-tuning
LICENSELicense declarationOptionalRequired when publishing models
MESSAGEPre-set conversation historyOptionalFew-shot examples
REQUIRESVersion requirementOptionalFor specific version features

Honestly, 90% of daily use only needs FROM, PARAMETER, and SYSTEM. The others? You can learn them when specific needs arise.

Three Ways to Use FROM Instruction

FROM is the only required instruction, with three syntax options:

Option 1: Use Model Name (Most Common)

FROM llama3.2
FROM llama3.2:3b
FROM mistral:latest

Just use the model name supported by Ollama. The colon followed by a version tag—omit it for latest.

Option 2: Use Local GGUF File

FROM ./my-model.gguf

If you’ve downloaded a GGUF format model file elsewhere, point directly to it.

Option 3: Use Safetensors Directory

FROM ./my-safetensors-dir

This is less common, typically for models downloaded from Hugging Face in their original format.

Alright, basics covered. Now for the main event—PARAMETER parameters.

PARAMETER Deep Dive

This is the most valuable part of the entire article. I’ve compiled all the pitfalls I’ve encountered while tuning parameters, giving you a configuration table you can copy directly.

First, the complete parameter list:

ParameterDefaultTypeWhat It DoesHow to Tune
temperature0.8floatControls randomness, higher = more “creative”Code: 0.3, Creative: 1.0
num_ctx2048intContext window sizeLong docs: 4096-8192
top_k40intOnly select from top K probable wordsUsually don’t touch unless output is chaotic
top_p0.9floatNucleus sampling, controls diversityUse with temperature
min_p0.0floatFilter out low-probability wordsHigh-quality output: 0.05
seed0intFix random seed for reproducible outputTesting: 42 or a fixed value
stopnonestringStop generating when this appearsMultiple stops can be stacked
num_predict-1intMax output length, -1 = unlimitedLimit output: 100-500
repeat_penalty1.1floatPenalize repetitive contentLong-form: increase to 1.5
repeat_last_n64intCheck last N words for repetitionUse with repeat_penalty

Let me dive into the most important parameters in detail.

temperature: Creativity vs. Stability

This is the easiest parameter to understand. High temperature (like 1.0) makes the model “let loose,” choosing lower-probability but more creative words. Low temperature (like 0.1) makes it conservative, only selecting highest-probability words for more stable output.

I’ve tested llama3.2’s responses to the same question at different temperatures:

Q: How do you read a file in Python?

  • temperature 0.1: Output like a textbook, only the standard answer
  • temperature 0.5: Adds practical tips like “watch out for encoding issues”
  • temperature 0.8: Might discuss different methods for different scenarios, even give examples
  • temperature 1.0: Answers all over the place, sometimes goes off-topic

My experience:

  • Coding, technical Q&A: around 0.3, need stability
  • Creative writing, brainstorming: 0.8-1.0, need surprises
  • JSON output, fixed format: 0.1-0.2, need precision

num_ctx: Context Window

This parameter determines how much content the model can “remember.” Default is 2048 tokens, roughly 1500-2000 Chinese characters.

Want it to read a long article and summarize? 2048 might not be enough. Having a long conversation and it suddenly forgets earlier content? Probably num_ctx is too small.

Important note: Increasing num_ctx consumes more memory. In my llama3.2 tests, going from 2048 to 8192 more than doubled memory usage. If your machine only has 8GB RAM, 4096 is about the limit.

My experience:

  • Short conversations, simple Q&A: 2048 default is fine
  • Code review, technical discussion: 4096 is comfortable
  • Long document processing, fiction writing: 8192 (if you have the memory)

stop: Stop Sequences

This parameter is handy—tell the model “stop when you see this.”

For example, if you want JSON output but worry about verbose additions:

PARAMETER stop "\n\n"
PARAMETER stop "```"

The model will stop when it sees two line breaks or code block markers, producing cleaner output.

You can stack multiple stops—many people don’t know this.

repeat_penalty: Preventing Verbose Repetition

Models have a quirk: they tend to repeat the same phrase. repeat_penalty punishes repetitive content.

Default is 1.1, which I find insufficient. When generating long articles, I usually set it to 1.3-1.5, effectively reducing fluff like “as mentioned earlier” or “in conclusion.”

Parameter Comparison for Different Scenarios

I’ve summarized optimal configurations for four common scenarios—copy away:

Scenariotemperaturenum_ctxOther Suggestions
Code Assistant0.34096stop ”```”, seed 42 (for reproducibility)
Creative Writing1.02048top_p 0.95, repeat_penalty 1.5
Technical Q&A0.54096min_p 0.05 (filter low-quality words)
JSON Output0.12048stop “\n\n”, stop ”```”

At this point, you should have a solid grasp of each parameter. Next up: hands-on time—four Modelfile templates you can use directly.

Practical Examples: 4 Complete Modelfiles

Theory done, let’s see code. All four templates have been tested—you can copy, paste, and run them.

Example 1: Role-Playing—Pig Bajie Assistant

This is a fun little model I made, imitating Pig Bajie’s speech style:

# Pig Bajie Assistant Modelfile
FROM llama3.2

SYSTEM """You are Pig Bajie from Journey to the West. Speak in a humorous, down-to-earth style.
When answering questions:
- Occasionally complain about the master being too naggy
- Get excited when food is mentioned
- Refer to yourself as "Old Pig" (俺老猪)
- When facing difficulties, say "Let's just split up" (散伙算了)"""

PARAMETER temperature 0.8
PARAMETER num_ctx 2048

Why this configuration:

Temperature set to 0.8 lets Pig Bajie speak with more “personality,” not too rigid. The SYSTEM includes specific behavioral rules—get excited about food, use “Old Pig” for self-reference—these details make the character more vivid.

Usage:

ollama create pig-bajie -f Modelfile
ollama run pig-bajie

Try asking: “How do I learn programming well?” and see how it responds.

Example 2: Professional Assistant—Python Code Review

This is my daily work configuration, specifically for reviewing code:

# Python Code Review Assistant Modelfile
FROM llama3.2:3b

SYSTEM """You are a senior Python developer. When reviewing code, focus on:
1. Type safety—potential type errors
2. Exception handling—edge cases coverage
3. Performance bottlenecks—unnecessary loops or redundant computations
4. Security vulnerabilities—sensitive data exposure

Response format:
Issue → Impact → Suggestion → Code Example"""

PARAMETER temperature 0.3
PARAMETER num_ctx 8192
PARAMETER seed 42

Why this configuration:

Temperature 0.3 ensures stable output—code review doesn’t need “creative flair.” num_ctx at 8192 because code files can be long. seed 42 for reproducibility—same question gets same advice, convenient for comparison testing.

Usage effect: I throw in a few hundred lines of Python code, and it gives a clear review report in the “Issue → Impact → Suggestion → Example” format.

Example 3: Structured Output—JSON Format

If you need to feed model output to other programs, JSON format is most convenient:

# JSON Output Assistant Modelfile
FROM llama3.2

SYSTEM """Your output must be valid JSON format.
Analysis result format:
{"result": "analysis content", "confidence": 0-100, "tags": ["tag1", "tag2"]}

Do not output anything else. Do not add code block markers."""

PARAMETER temperature 0.1
PARAMETER num_ctx 2048
PARAMETER stop "\n\n"
PARAMETER stop "```"

MESSAGE user Analyze the security risks in this code
MESSAGE assistant {"result": "SQL injection risk detected, user input not filtered", "confidence": 85, "tags": ["security", "SQL"]}

Why this configuration:

Temperature 0.1 for maximum stability—JSON format can’t tolerate any deviation. stop parameters filter out line breaks and code block markers, preventing extraneous output. The MESSAGE section is a few-shot example—telling the model “output should look like this.”

I use this configuration in automated workflows: throw error logs at the model, it outputs structured analysis, then programs handle it automatically.

Example 4: Long Context—Document Summary

When processing long articles, the context window needs to be large enough:

# Document Summary Assistant Modelfile
FROM llama3.2

SYSTEM """You are a document summarization expert. Output requirements:
- No more than 5 key points
- Each point under 50 characters
- Extract core viewpoints first, then add details
- Output in English"""

PARAMETER temperature 0.5
PARAMETER num_ctx 8192
PARAMETER num_predict 300

Why this configuration:

Temperature 0.5—summaries need stability, but not too rigid (too low makes summaries read like chronological logs). num_ctx 8192 ensures it can handle long documents. num_predict 300 limits output length—summaries shouldn’t be longer than the original.

I use this for technical articles: throw in a 3000-word piece, it gives me 5 points, each under 50 characters—much more efficient to read.

Summary

These four templates cover common use cases. You can modify SYSTEM content or adjust parameters based on your needs—Modelfile changes just require re-running ollama create, debugging cost is low.

TEMPLATE and MESSAGE Advanced Topics

The previous examples didn’t use TEMPLATE because this parameter is an “advanced feature”—you only need it when you want fine-grained control over conversation format.

TEMPLATE’s Go Template Syntax

Ollama uses Go’s template syntax with three key variables:

  • {{ .System }} — your SYSTEM content
  • {{ .Prompt }} — user input
  • {{ .Response }} — model output (to define output format)

A simple example:

FROM llama3.2

TEMPLATE """{{ .System }}

User Question: {{ .Prompt }}

Answer: {{ .Response }}"""

SYSTEM "You are a technical expert"

This TEMPLATE defines the conversation structure: SYSTEM content first, then user question, finally the model’s answer.

Honestly, most cases don’t need custom TEMPLATE—Ollama’s default template works fine. When do you need to change it?

Scenario 1: Integrating with Other Tools

You’re connecting Ollama to a chat system with specific input format requirements. Use TEMPLATE for adaptation.

Scenario 2: Special Conversation Format

You want conversations to have “prefixes”—like starting every sentence with [AI] or [USER]—TEMPLATE can do that.

MESSAGE: Pre-set Conversation History

MESSAGE’s purpose is “showing the model some examples.” The JSON output example earlier used it:

MESSAGE user Analyze the security risks in this code
MESSAGE assistant {"result": "SQL injection risk detected", "confidence": 85}

This tells the model: “When the user asks this kind of question, you should answer like this”—the concept of few-shot learning.

You can pre-set multiple conversation turns:

MESSAGE user Hello
MESSAGE assistant Hi, how can I help you?
MESSAGE user How's the weather?
MESSAGE assistant I don't know real-time weather. I suggest checking a weather app.

When the model starts, it “remembers” these conversations, and new conversations continue in this style.

How to View a Model’s Modelfile

If you want to see how a particular model’s Modelfile is written, use this command:

ollama show --modelfile llama3.2

The output is lengthy, containing all the model’s default configurations. Copy it, modify, and you can create your customized version.

This command is particularly useful—when you find a model that works well (like a custom version someone shared), use ollama show to export its Modelfile and learn how they configured it.

Common Problems and Pitfall Guide

Parameter tuning is inevitably accompanied by pitfalls. Here are several typical issues I’ve encountered—consider this your mine-clearing guide.

Problem 1: Temperature Too Low, Output Like Recitation

Some people think lower temperature is better—more stable output, right? I thought so too initially, setting my code assistant’s temperature to 0.05.

The result? The model answered questions like reciting standard answers, zero flexibility. Asked “how to read files in Python,” it gave me three methods, but each was textbook-standard, no practical tips.

My lesson: Temperature isn’t “the lower the better.” Code review is fine at 0.3; below 0.2 becomes robotic. You do want low for precise output (like JSON), but for daily conversation, there’s no need.

Problem 2: num_ctx Too Large, Memory Explodes

One time on a whim, I set num_ctx to 16384—wanted to handle ultra-long documents. After a few minutes, the system started swapping furiously, the whole machine froze.

My machine only has 16GB RAM. llama3.2:3b itself takes about 2GB, num_ctx from 2048 to 16384 pushed memory usage over 8GB. Add other programs…

Lesson: num_ctx isn’t something you just crank up. Based on your memory:

  • 8GB RAM: num_ctx max 4096
  • 16GB RAM: num_ctx can go to 8192
  • 32GB+: Only then can you try 16384

Problem 3: Mixing Up SYSTEM and MESSAGE

These two instructions serve different purposes, easily confused:

  • SYSTEM: Permanent role definition, carried in every conversation
  • MESSAGE: Pre-set conversation history, equivalent to few-shot examples

For example: you want the model to play a code review expert. Write the role definition in SYSTEM, write a few Q&A examples in MESSAGE.

Many people only write SYSTEM, no MESSAGE, resulting in the model “knowing it’s a code review expert” but not knowing how to answer specific questions. After adding a few MESSAGE examples, output quality improved noticeably.

Problem 4: How to Update After Creation

Created a model with Modelfile, want to change configuration?

Simple—re-run ollama create with the same name to overwrite:

# First creation
ollama create my-coder -f Modelfile

# Want to change config? Modify Modelfile, then run again
ollama create my-coder -f Modelfile

Ollama directly overwrites the same-name model, no need to delete first. Of course, if you don’t want to overwrite, just use a different name:

ollama create my-coder-v2 -f Modelfile

Problem 5: How to Reproduce Someone Else’s Modelfile

See a custom model someone shared that works well, want to learn how they configured it:

# First pull that model
ollama pull someone-elses-model

# Export its Modelfile
ollama show --modelfile someone-elses-model > learned-modelfile

# View, learn, modify
cat learned-modelfile

I use this command often—learned quite a few parameter tuning tricks from community custom models.

Conclusion

After all this, Modelfile’s core logic boils down to one sentence: solidify configurations to avoid manual adjustments every time.

Here are three things you can do right now:

1. Create a Role-Playing Model

Try the Pig Bajie assistant template, or swap in a character you like (like Doraemon, Iron Man). Run a few conversation rounds and feel how the temperature parameter affects output style.

2. Compare Outputs at Different Parameters

Same question, run it at temperature 0.3 and 0.8, see the difference. I’ve done this experiment many times—always discover something new.

3. Find a Scenario You Use Often

Code review, document summary, creative writing—pick one you’ll use daily, adapt the template above to your needs. After a few rounds of debugging, you’ll have your own custom model.

In the next article, I’ll cover Ollama API integration—how to connect local models to your programs using the OpenAI-compatible interface. Once Modelfile is configured, API calls become more stable—no need to pass parameters in code, just use the custom model directly.

Questions? Leave a comment—maybe the pitfalls I’ve hit can help you dodge a few.

Create Ollama Custom Model

Configure and create a custom model using Modelfile

⏱️ Estimated time: 10 min

  1. 1

    Step1: Create Modelfile

    Create a text file named Modelfile with basic configuration:

    • FROM llama3.2 (specify base model)
    • SYSTEM "Your system prompt" (define role)
    • PARAMETER temperature 0.3 (set temperature parameter)
  2. 2

    Step2: Generate Custom Model

    Run the creation command in terminal:

    ```bash
    ollama create my-model -f Modelfile
    ```

    my-model is the name you give your model, customizable.
  3. 3

    Step3: Run and Test

    Run the created model directly:

    ```bash
    ollama run my-model
    ```

    Test a few questions and observe if output meets expectations.
  4. 4

    Step4: Iterate and Optimize

    If output isn't ideal:

    • Adjust temperature parameter (code: 0.3 / creative: 0.8)
    • Modify SYSTEM prompt
    • Add MESSAGE examples

    After changes, re-run `ollama create my-model -f Modelfile` to overwrite.

FAQ

What's the difference between Modelfile and setting parameters directly?
Modelfile solidifies configuration—create once, permanent effect. Setting parameters directly (like passing during ollama run) requires repetition every time, and can't save complex configurations like SYSTEM prompts.
What should I set temperature to?
Choose based on scenario:

• Code review, technical Q&A: around 0.3, stable output
• Creative writing, brainstorming: 0.8-1.0, more diversity
• JSON output, fixed format: 0.1-0.2, ensure precision

Below 0.2 becomes robotic; above 1.0 might go off-topic.
How much memory does increasing num_ctx use?
Varies by model size and num_ctx value, rough guide:

• 8GB RAM: num_ctx max 4096
• 16GB RAM: num_ctx can reach 8192
• 32GB+: Can try 16384

Recommend adjusting gradually based on actual memory.
How do I view an existing model's Modelfile?
Use the command `ollama show --modelfile model-name` to export the model's complete Modelfile configuration for learning or modification.
What's the difference between SYSTEM and MESSAGE?
They serve different purposes:

• SYSTEM: Defines model role and behavior rules, applies to every conversation
• MESSAGE: Pre-set conversation history, used for few-shot examples

Recommend using both—SYSTEM defines role, MESSAGE gives a few Q&A examples.
How do I modify parameters after creation?
After modifying the Modelfile, re-run `ollama create model-name -f Modelfile` to overwrite the same-name model without deleting. To keep the original version, use a different model name.

12 min read · Published on: Apr 5, 2026 · Modified on: Apr 5, 2026

Comments

Sign in with GitHub to leave a comment

Related Posts