DoRa (Weight Decomposed Low Rank Adaptation)

As LoRA lacks this flexibility, it may not always be able to make the same precise adjustments as FT. But full fine-tuning still requires retraining all parameters, which is computationally intensive

To bridge the gap between efficiency and performance, NVIDIA and HKUST researchers developed a new method called DoRA (Weight-Decomposed Low-Rank Adaptation). DoRA separates the pre-trained weights into magnitude and direction, just like in their analysis, and fine-tunes both parts. It makes fine-tuning easier and more stable.

DoRA has been tested on various tasks, from NLP to tasks that involve both language and vision, and it consistently performs better than LoRA without adding extra inference time.

How does DoRA work?

DoRA uses LoRA for the directional updates to keep things efficient but improves the model’s overall learning ability, making it behave more like full fine-tuning.

Decomposing weights: DoRA starts by splitting the pre-trained weights into two parts:
- Magnitude: This represents how large or strong the weight is.
- Direction: This represents how the weight is changing or moving during training.
DoRA fine-tunes both the magnitude and the direction separately:
- The magnitude is directly fine-tuned and can be adjusted during training.
- The direction part is handled using LoRA, because it is a much larger part. LoRA breaks this into smaller, more manageable updates without adding too many trainable parameters.
After fine-tuning, DoRA merges updates with the original weights, keeping the model as efficient as before without extra computation for predictions.

The benefits of DoRA

As DoRA method was made to solve the limitations of LoRA let’s look at its advantages:

Simplifies the task: By breaking the weight updates into two parts, DoRA makes fine-tuning easier and more stable.
More efficient: It uses LoRA to handle the complex directional updates, which reduces the number of parameters it needs to train. Focus only on updating the direction simplifies LoRA’s job.
Performs closer to full fine-tuning: DoRA shows better learning capacity, performing closer to full fine-tuning methods (which retrain all parameters), without the heavy computational cost and getting overwhelmed.