Control — A local-first AI app for iPhone that runs entirely on-device

I wanted to understand what AI actually feels like when everything runs locally on a phone. No accounts, no cloud dependency, no hidden server processing — just open-source language models running entirely on-device.

Control became part privacy experiment, part technical exploration into the real limits of local AI. The app runs fully offline, keeps conversations on-device, and strips the experience down to what small models can realistically support today. Built in Cursor.

Problem

I wanted to build an on-device AI app using open-source models, partly for privacy and partly to learn firsthand what small local models can and can't do. Most AI apps required accounts, sent data to servers, and offered no transparency about what happened with your conversations. There was no simple, free option that ran entirely on your phone with zero network dependency, no data collection, and no accounts. I also wanted to understand what building with these models actually felt like: what works, what breaks, and where the ceiling is when everything runs locally.

Process

Started with much bigger ambitions: speech input, reactive avatar animations that responded to voice, and full PDF comprehension. Hitting the limitations of on-device models forced repeated pivots. Speech-to-text was too unreliable at small model sizes. PDF processing had to become a summary-based extraction using Vision and PDFKit rather than full document understanding, which sacrificed accuracy. The animated avatars required more processing overhead than the device could spare while running inference. The core inference layer itself needed an Objective-C++ bridge to llama.cpp with streaming token generation and manual memory management. Over 234 commits, the scope narrowed from an ambitious multimodal assistant into something focused and honest about what these models can actually do on a phone.

Solution

A text-only iOS app that runs open-source LLMs entirely on-device using llama.cpp. Supports models from Google (Gemma 3), Meta (Llama 3.2), Hugging Face (SmolLM2), Alibaba (Qwen3), and Microsoft (Phi-3), all downloaded directly from Hugging Face and stored locally. The app is 8.9MB before model downloads. Conversations stay on your phone, reset within 24 hours, and are never transmitted anywhere. Features include PDF upload for on-device summaries, customizable colors and avatars, Metal shader visual effects, iOS Shortcuts integration for automation workflows, and support for English, French, and Spanish. Runs on iPhone and iPad. Built natively in Swift with IBM Plex Mono throughout and a dark-only minimal interface. Free with no subscriptions, open-sourced under Apache 2.0.

Outcome

Live on the App Store at version 1.3. The project gave me a grounded understanding of what small on-device models are capable of and where they fall short. Gemma 3 and Qwen3 handle conversational tasks well, but anything requiring deep reasoning or long context breaks down fast. The constraints shaped the product into something deliberately minimal: no server costs, no user data to protect, and a genuine privacy story that's verifiable because the full source code is public. Open-sourced on GitHub with 234 commits across the full build.