Do you know the best practices to scale and deploy ONNXRuntime? I recently tried scaling it by using Ray Serve to automatically create replica of my service, but despite allocating certain number of CPU cores for each replica, they all suffer from contention. Is it not the way to go, but instead one deployment per one machine? on CPU it seems to just use up every cores it can access, despite being limited / pinned to certain cpu cores
This was a very informative post!
Glad to hear that, Are! Here to help 🔥
Hey Alex, thanks for the post! One question:
Do you know the best practices to scale and deploy ONNXRuntime? I recently tried scaling it by using Ray Serve to automatically create replica of my service, but despite allocating certain number of CPU cores for each replica, they all suffer from contention. Is it not the way to go, but instead one deployment per one machine? on CPU it seems to just use up every cores it can access, despite being limited / pinned to certain cpu cores