I've just checked airbrix's system diagram - looks similar to me, as it also aims to distribute LLM inference in a cluster. Not sure if all the components are doing exactly the same thing, but I see airbrix has:
- API gateway (similar to dynamo)
- Distributes llm engines in pods (similar to dynamo prefill/decode workers)
- Writes to a distributed cache (similar to dynamo's NIXL interface and KV Memory Manager)
- Control Plane autoscaler (similar to dynamo's Event Plane)
I don't see airbrix doing disaggregated prefill/decode exactly as dynamo does it, but I think it also has a smart distributed KV cache handling.
Thanks for mentioning it, I'll dig a bit deeper into airbrix and research it, maybe I'll write about it in a future article.
Thanks for the great article. Is aibrix similar to Dynamo?
Hey TND,
I've just checked airbrix's system diagram - looks similar to me, as it also aims to distribute LLM inference in a cluster. Not sure if all the components are doing exactly the same thing, but I see airbrix has:
- API gateway (similar to dynamo)
- Distributes llm engines in pods (similar to dynamo prefill/decode workers)
- Writes to a distributed cache (similar to dynamo's NIXL interface and KV Memory Manager)
- Control Plane autoscaler (similar to dynamo's Event Plane)
I don't see airbrix doing disaggregated prefill/decode exactly as dynamo does it, but I think it also has a smart distributed KV cache handling.
Thanks for mentioning it, I'll dig a bit deeper into airbrix and research it, maybe I'll write about it in a future article.
Thanks so much Alex. Your blog is really valuable for us. Keep up your great job
Great article man!
Thanks, glad it helped man! Got 2 more on the way as an extension to this one. I'll cover vLLM and SGLang - stay tuned :P