mlxstudio
Native macOS app that runs open-source AI models on Apple Silicon. The killer feature: connect multiple Macs over WiFi to split model layers across machines. A Mac Mini handles layers 0-39, a MacBook Pro takes 40-79. Hits 42.8 tok/s on Llama 3 70B from consumer hardware. 10 MB download.
Retrospective
The Good
The multi-node setup actually works over WiFi. Two Macs split model layers automatically — Mac Mini handles 0-39, MacBook Pro takes 40-79. Getting 42.8 tok/s on a 70B model from consumer hardware felt like a breakthrough. The app is 10 MB, not 200.
The Bad
WiFi latency between nodes adds up. Layer-to-layer handoff over the network is the bottleneck, not the compute. Ethernet doubles throughput but nobody has their Macs wired together.
The Ugly
Model compatibility is a moving target. Every new MLX release changes the quantization format slightly. We've had to re-download and re-convert models three times since launch.