Mac Mini‑Powered AI: 5‑User Local Model Success

(Drafted with the help of our own Local AI solution)
Source: Audio Recording
On the morning of 7 August 2025, David Burkett sat down with the WorkingMouse AI R&D team to revisit our Local
AI Hypothesis and
see what a small‑team can actually do on a single piece of consumer hardware. The interview illustrated that the 32 GB
Unified‑Memory Mac Mini
is more than a toy; it can handle five concurrent users and some of the hurdles the team overcame.
"The Mac Mini is only a 32 GB Unified‑Memory Mac Mini so it has about 21 GB allocated for VRAM… we got up to five different
concurrent inferences happening at the same time with relatively good speed for the price point.”
- Aidem Lambert.
That line echoed the recommendation in our AI Reference Guide, which suggests the same machine for a five‑user setup. In other words, the hardware we’re championing in our blog post “AI You Own vs. AI You Rent: What’s Better for Business?” is now proven in a real‑world pilot.
Bringing the Model to Life
The discussion jumped straight to the models themselves. Aiden explained that they’re not just loading any old LLM; they’re using open‑weight models—a 120 B‑parameter beast and a 20 B‑parameter sibling—all running locally. “We’re actually able to get the 20 B parameter model loaded and running on the Mac Mini flawlessly”.
Aiden added that the key to keeping these models usable for a niche team is the Mixture‑of‑Experts (MOE) architecture. “The beauty of the way they do the MOE's is you actually don’t need as large a pre‑training data set… you could get away with ... 1,000 well‑curated data points”
This means a small team can inject domain knowledge without having to retrain from scratch—a huge win for speed and data‑privacy.
Interface & Collaboration
Both interviewees agreed that a user‑friendly front end is essential. “Open Web UI has been really easy to set up and run on top of this, and that gives us a heap of power to enable anybody to access all of these models that we’ve deployed into the Mac Mini”.
They also highlighted the privacy guarantee: data never leaves the internal network. “…those searches aren't getting shared with anyone outside the organisation nor used for training” This is a selling point for companies that have regulatory or competitive concerns.
Challenges
The first hiccup they faced when scaling a local AI stack to five users: “We ran into some issues with LM Studio and having multiple users concurrently message the same models” said Aiden.
LM Studio, while great for quick prototyping, was built around a single‑user architecture. When a second or third person tried to fire a prompt, the server would either throttle the request or return an error. For a small team that needed to collaborate in real time, that was a deal‑breaker.
The fix:
-
Swap the backend – They dropped LM Studio’s built‑in server and switched to O Lama
(pronounced “oll‑ama”). O Lama is designed from the ground up to handle many simultaneous connections to the same model.
-
Set a hard concurrency limit – Using an environment‑variable command, they configured O Lama to allow exactly five
users
to talk to a model at once. This keeps the Mac Mini’s 21 GB of VRAM from being overloaded while still delivering respectable speed.
-
Tested and confirmed – Once the switch was in place, the interviewers verified that five people could run inferences in
parallel, each getting a responsive reply without contention.
With this tweak, the Mac Mini became a true five‑user local AI hub—matching the recommendation in the Working Mouse AI Reference Guide and turning the initial concurrency roadblock into a smooth, multi‑user experience.
Looking Forward
While the interview focused on hardware and model deployment, the next logical step is to understand how organisations are actually using AI. “We’re creating an AI survey for organisations to participate and get an understanding of their AI landscape within their organisations” Said David Burkett. The survey will collect insights on use cases, maturity, and pain points, feeding back into our local‑AI strategy and the AI Reference Guide.
In addition, Open AI and Google have just released their first large open‑source models, available on huggingface.com . WorkingMouse has also ordered a Mac Studio, which is arriving next week and will be used to experiment running local AI for the entire company!
Takeaway
- Hardware: A single Mac Mini can serve five users, matching our reference guide’s recommendation.
- Models: Open‑weight LLMs (120 B & 20 B) run locally with no data leakage.
- Fine‑tuning: MOE lets you add domain knowledge with only ~1,000 examples.
- Interface: Open Web UI + LM Studio gives a slick, collaborative experience.
- Next step: A company‑wide AI survey will shape future guidance.
We’re excited to continue refining our Local AI approach and to share the survey results once they’re ready. Meanwhile, if you’re curious about setting up a local‑AI stack like the one discussed, drop by for a chat using the link below.