AI Reference Guide

As organisations explore how to boost productivity using Artificial Intelligence, understanding the basics of how Large Language Models (LLMs) work is a crucial first step. This AI Reference Guide introduces key terms, concepts, and technical variables to help you make informed decisions when considering the adoption of AI in your workplace.

Whether you're evaluating local deployment options, planning integrations with internal systems, or simply curious about how models like ChatGPT function under the hood, this guide will give you a clear, jargon-free foundation.

You can read below or download a PDF copy for printing and sharing with your team.

Common Terms & Concepts

  • Model: A mathematical system trained on data to perform tasks like generating text, translating languages, or answering questions. In AI, a language model learns patterns in language so it can predict what comes next in a sentence.
  • Training / Fine‑Tuning:
    • Training is the initial process of teaching an AI model by feeding it massive amounts of data so it can learn patterns.
    • Fine-tuning happens after initial training, it's when a model is further adjusted using specialised or smaller datasets to improve performance on specific tasks (e.g., medical chatbots, legal assistants).
  • Inference: The process of using a trained model to generate predictions or responses. For example, when you ask ChatGPT a question, it performs inference to produce the answer.
  • Parameter Count: Number of trainable weights in a model (e.g., 7 B = 7 billion). These parameters determine how well the model can learn and generalise from data.
  • Token: A text piece (word or subword) the model processes sequentially. For example, “cat” might be one token, while “predictable” could be broken into several subword tokens.
  • Context Window (Token Limit): Max length of input + output the model can handle at once (e.g., 128 k tokens). Longer windows allow models to handle bigger documents or longer conversations.
  • Accuracy / Benchmarks: Performance on standardised tasks like MMLU (Massive Multitask Language Understanding), HellaSwag (commonsense reasoning), HumanEval (code generation), etc. Higher scores indicate stronger real-world understanding.
  • Quantization: A technique to reduce model size and speed up performance by using lower precision numbers (e.g., INT8 instead of FP16), making it easier to run large models on smaller hardware.
  • RAG: RAG (Retrieval-Augmented Generation) enhances an LLM by retrieving relevant documents or chunks from your knowledge base and injecting them into the prompt before generating a response. RAG automates the retrieval of relevant context.
  • MCP: MCP (Model Context Protocol) is an open protocol that standardises how applications provide context to LLMs. MCP gives you a repeatable and structured format to inject context into the model.

Open‑Source LLM Comparison

Model Params Benchmarks Possible Hardware
LLaMA 3.1 (Meta) 8B / 70B / 405B MMLU: 87.3% (405B), 82.0% (70B), 69.4% (8B); HumanEval: 89.0% (405B)

8B: Mac Mini M4 (24GB), RTX 4070

70B: Mac Studio M3 Ultra (128GB), 2x RTX 4090

405B: Enterprise GPU clusters


Qwen 3 (Alibaba) 1B – 235B (22B active MoE) MMLU: 64.3% (8B), strong multilingual, coding excellence

8B: Mac Mini M4 (24GB), RTX 4060 Ti

22B: Mac Studio M3 Max (64GB), RTX 4080

235B: Mac Studio M3 Ultra (512GB), Multi-GPU workstation

DeepSeek-V3 (DeepSeek) 671B total / 37B active MMLU: 88.5%, MATH: 90.2%, HumanEval: 82.6%, state-of-the-art performance 37B active: Mac Studio M3 Ultra (128GB), 2x RTX 4090Full model: Enterprise clusters
Mistral Large 2 123B (dense) MMLU: ~84.0%, strong instruction following, 128K context Mac Studio M3 Ultra (256GB), 4x RTX 4090, H100
DeepSeek-R1 (DeepSeek) 671B total / 37B active MMLU: 90.8%, MATH: 97.3%, reasoning specialist competitive with OpenAI o1 37B active: Mac Studio M3 Ultra (128GB), 2x RTX 4090Full model: Enterprise clusters
Kimi K2 (Moonshot AI) 1T total / 32B active (MoE) LiveCodeBench: 53.7%, SWE-bench: 65.8%, GPT-4-class performance 32B active: Mac Studio M3 Ultra (128GB), RTX 4090<br>Full model: Multi-GPU clusters

GPU Size & Hardware Requirements

Model Size Precision Memory Needed Mac Options PC/GPU Options Performance Notes
3–7B INT4 ~3.5–4 GB

Mac Mini M4 (16GB)

MacBook Pro M4 (16GB)


RTX 3060, RTX 4060
Budget-friendly, excellent Mac performance
3–7B FP16 ~14–16 GB

Mac Mini M4 (24GB)

Mac Studio M3 Max (32GB)

RTX 4090, RTX 5090 High-end consumer setup
8–13B INT4 ~6.5–7 GB

Mac Mini M4 (32GB)

Mac Studio M3 Max (64GB)

RTX 4070, RTX 5070 Good balance of cost/performance
8–13B FP16 ~26–28 GB

Mac Studio M3 Max (64GB)

Mac Studio M3 Ultra (128GB)

2x RTX 4090, RTX 5090 Professional workstation level
20–30B INT4 ~15–20 GB Mac Studio M3 Max (64GB)<br>Mac Studio M3 Ultra (128GB) RTX 4090, A6000 High-end workstation
20–30B FP16 ~60–65 GB

Mac Studio M3 Ultra (128GB)

Mac Studio M3 Ultra (256GB)


4x RTX 4090, A6000, H100
High-memory workstation/server
65–70B INT4 ~35–42 GB

Mac Studio M3 Ultra (128GB)

Mac Studio M3 Ultra (256GB)


A6000 (48GB), H100
Great Mac performance at this size
65–70B FP16 ~140–150 GB

Mac Studio M3 Ultra (256GB)

Mac Studio M3 Ultra (512GB)


4x A100, 2x H100
Mac now viable for 70B FP16!
120–200B INT4 ~60–100 GB

Mac Studio M3 Ultra (256GB)

Mac Studio M3 Ultra (512GB)

2x H100, 4x A6000 Mac competitive for large models
405B+ FP16 ~200+ GB

Mac Studio M3 Ultra (512GB)

for smaller 405B variants

4x H100 (80GB) Enterprise clusters preferred
405B+ FP16 ~800+ GB N/A 8x H100 (80GB) Enterprise clusters / Cloud GPUs 

Note: Quantization (e.g., INT4) can reduce memory needs dramatically (e.g., 70B INT4 can fit on a 24 GB GPU).

FAQs

1. What’s the difference between an LLM and general AI?

A Large Language Model (LLM) is a type of AI trained to understand and generate human language. It excels at tasks like writing, summarising, and answering questions.
Artificial Intelligence (AI)
is a broader field that includes LLMs but also covers vision, robotics, decision-making systems, etc.

2. Do LLMs think or understand like humans?

No. LLMs generate text based on statistical patterns learned from massive datasets. They don’t have intentions, self-awareness, or true understanding—but they often appear intelligent due to the quality of their training data.

3. How do I connect to my organisations documents and knowledge?

You can connect to your organisation’s documents by combining Retrieval-Augmented Generation (RAG)—which retrieves relevant internal content at runtime—with the Model–Context–Prompt (MCP) protocol, which cleanly defines the model used, the context retrieved, and the prompt given. This setup enables grounded, auditable answers from local AI systems without sending data to the cloud.

4. Where do you find datasets for training?

Training datasets are often sourced from public internet data such as webpages, books, scientific articles, GitHub code, and forums. Common repositories include Hugging Face, The Pile, Common Crawl, and OpenWebUI. For fine-tuning, organisations may use curated internal data or domain-specific corpora.

5. Why are some LLMs good at some tasks and not others?

Performance varies based on a model’s training data, architecture, and number of parameters. LLMs trained on diverse, high-quality datasets tend to generalise well. Others may specialise—for example, coding models are often fine-tuned on code. Larger models typically perform better but can be less efficient or harder to deploy.

Keep Your Data Local. Unlock the Power of AI.

Local AI gives your organisation full control over how you deploy powerful language models — privately, securely, and without sending data to the cloud.

If you're ready to explore a custom Local AI setup that meets your security, performance, and compliance needs, we'd love to hear from you.

First Name
Last Name
E-mail Address
Phone Number
WorkingMouse logo

(07) 3606 0230

info@workingmouse.com.au

C1/55 Railway Terrace Milton
QLD 4064 Australia


QAssure
No. 20247


Made with ❤️ in Milton,

Brisbane (Meanjin) Australia.

WorkingMouse acknowledges the Traditional Owners and their continuing connection to land, sea and community. We pay our respects to them, their Elders, both past and present.

The Australian Aboriginal Flag

The Torres Strait Islands Flag
The Australian Flag


Clutch Reiew logo with red stars

The logo for the ISO 27001 certificate

© 2025 WorkingMouse Pty Ltd. All Rights Reserved.