All the articles with the tag "rlhf".
A base model just predicts tokens. Alignment turns it into an assistant that follows instructions and refuses harmful ones.