TY - JOUR
T1 - Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks
AU - Sohn, Jy Yong
AU - Kwon, Dohyun
AU - An, Seoyeon
AU - Lee, Kangwook
N1 - Publisher Copyright:
© 2024 Proceedings of Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons (m) needed to arbitrarily change N labels among K samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network f and a neural network g (with m neurons) designed for fine-tuning. When g is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that N samples can be fine-tuned with m = Θ(N) neurons for 2-layer networks, and with m = Θ(√N) neurons for 3-layer networks, no matter how large K is. Our results recover the known memorization capacity results when N = K as a special case.
AB - Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons (m) needed to arbitrarily change N labels among K samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network f and a neural network g (with m neurons) designed for fine-tuning. When g is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that N samples can be fine-tuned with m = Θ(N) neurons for 2-layer networks, and with m = Θ(√N) neurons for 3-layer networks, no matter how large K is. Our results recover the known memorization capacity results when N = K as a special case.
UR - https://www.scopus.com/pages/publications/85212217778
M3 - Conference article
AN - SCOPUS:85212217778
SN - 2640-3498
VL - 244
SP - 3264
EP - 3278
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 40th Conference on Uncertainty in Artificial Intelligence, UAI 2024
Y2 - 15 July 2024 through 19 July 2024
ER -