QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

noneabove1182@sh.itjust.works · 1 year ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

justynasty · edit-2 1 year ago

deleted by creator

rufus@discuss.tchncs.de · edit-2 1 year ago

Thank you very much for your explanation. I can understand that one. This is exactly the important difference. In my words it’d be: They figured out a way to improve on the maths, making the calculations faster. (by reducing an important matrix multiplication in dimensionality)

But there is another important aspect to it. They keep the quanzized property after the fine-tuning which QLoRA doesn’t. Which makes it a bit more precise than doing another (lossy) quantization after the fact.

Your explanation got me on track to figure it out. Thanks. I wrote another longer reply to noneabove1182. I’m not going to repeat everything, but I think I’m satisfied now.