`llama.cpp` is an open-source library implemented in pure C/C++ to maximize efficiency for running large language models (LLMs) locally. It is used by tools like Ollama to run models such as the 8 billion parameter Llama 3 model on a laptop. The library is designed to be efficient and does not support training or fine-tuning LLMs, focusing solely on inference [1].
[1] [Build a Large Language Model (From Scratch) (chapter-7) by Sebastian Raschka](https://livebook.manning.com/raschka/chapter-7)