Hugging Face Model Distillation reduces the size of LLMs while preserving accuracy by training smaller models to mimic larger ones, ideal for edge deployments.

https://huggingface.co/docs/optimum/model-distillation