Weights quantization using GPTQ, BitsAndBytes. Parallelism techniques, KV-caching, Flash Attention and Speculative Decoding.
Informative Article. Keep writing!!
Hey Arpan,
Thank you for your feedback. I'm happy to hear that!
Informative Article. Keep writing!!
Hey Arpan,
Thank you for your feedback. I'm happy to hear that!