Project Title

Edge LLM Inference


This PWA offers a fully offline inference runtime with a chat model finetuned from Cerebras-GPT 111M (M for million).
For comparison, Llama 2 is available in 7B, 13B, 70B (B for billion). GPT 3.5 turbo has 20B parameters, and GPT 4 has 1.76T parameters (T for trillion).
This model is very small in comparison, so it tends to hallucinate a lot and get confused easily. Still, it’s quite amazing that we can run it on a phone.

This PoC works as a game where the goal is to guess the secret word through the conversation with the model. If the model responds the secret work in the reply you win the game.
The model is running 100% on this device, once the model is loaded you can disable your internet connection and it’ll continue working.
For the best experience, open in Safari, click the share button, and “Add to Home Screen”.

Model: LaMini-Cerebras-111M
Inference: transformers.js
Runtime: ONNX

ps: This project was created as a gift for a digitial secret not-santa gift exchange. Hence the image in the final screen and the obession with broccoli.


Svelte, Transformers.js, ONNX