Hugging Face Text Generation Inference provides an optimized API for serving LLMs with high throughput, supporting OpenAI GPT and BERT Model-based architectures.

https://huggingface.co/docs/text-generation-inference