LoRa (Low Rank Adaptation)

test

To tailor general models for specific tasks, we need to fine-tune them. This process usually involves retraining all the model’s parameters, but as models get bigger, this becomes more expensive and resource-intensive. |To solve this, Parameter-Efficient Fine-Tuning (PEFT) methods were developed, and one of them is LoRA (Low-Rank Adaptation). LoRA modifies fewer parameters for efficient fine-tuning while keeping the model architecture the same. However, it often doesn’t perform as well as full fine-tuning (FT) where all parameters are retrained. But why? |NVIDIA and HKUST researchers, inspired by the concept of Weight Normalization, compared LoRA and FT. Weight Normalization separates the weight matrix into two components – magnitude (how large the change is) and direction (where the change is happening in the parameter space). This separation helps better understand how each method updates the model’s weights, allowing a more detailed comparison of the flexibility and precision in adjustments made by LoRA and FT. They discovered that these two methods update the model in different ways" LoRA updates the model proportionally, changing both magnitude and direction consistently.

Full fine-tuning, on the other hand, shows more complex behavior. It can make subtle changes in direction while making larger changes in magnitude or vice versa. This flexibility allows FT to adapt more precisely to tasks.| |As LoRA lacks this flexibility, it may not always be able to make the same precise adjustments as FT. But full fine-tuning still requires retraining all parameters, which is computationally intensive.