Intro
LLMs are remarkably good at generating and understanding text, yet we still know little about how their internal layers process information. Previous work typically identifies "important" layers only after a model has been fine-tuned on a particular dataset, making these findings inherently post-hoc and dataset-specific. But are critical layers an intrinsic property of the model, independent of specific data? If so, can we predict a model's future training behavior from its current state alone?
To investigate these questions, we adopt a different approach: we analyze off-the-shelf (pre-fine-tuned) models and show that certain layers are intrinsically easier to adapt during subsequent fine-tuning. We further demonstrate that each layer's Representation Dynamics reliably predicts its behavior in subsequent training steps, regardless of the dataset used.
Data-oblivious Critical Layers & Representation Dynamics
Critical Layers Identified during Supervised Fine-Tuning.
We identify the critical layers during Supervised Fine-Tuning (SFT) by substituting each layer in the fine-tuned model with its corresponding layer from the pre-fine-tuned model, and then measure the loss reduction of the model during SFT for each layer. High values here indicates that the layer is more sensitive in the fine-tuning steps.


We find that the same model shows a very similar pattern in the loss curves across different datasets (high values in the middle layers, low values in the last layer), which indicates that the critical layer is determined by the pre-fine-tuned model and is independent of the fine-tuning dataset.
Representation Dynamics of the Pre-fine-tuned Models
Centered Kernel Alignment (CKA) is a popular metric for measuring the similarity between two representation spaces. We use it to quantitatively describe changes in the representation space between layer $\ell$ and its neighboring layers, using the average CKA value, denoted as $\delta^{\ell}$. A smaller $\delta^{\ell}$ indicates a greater representation shift at layer $\ell$ relative to its neighbors. Layers with the largest shifts are called change-point layers.


We also observe that these CKA patterns and the change-point layers are independent of the data used to compute CKA, and are instead determined by the pre-fine-tuned model state.
Connect Critical Layers with Representation Dynamics
From the figure above, we can also observe an interesting phenomenon that average CKA $\delta^{\ell}$ is negatively correlated with the tracked loss value after SFT. This trend is consistent with a high negative correlation coefficient across different models and datasets.
