All the articles with the tag "dpo".
A base model just predicts tokens. Alignment turns it into an assistant that follows instructions and refuses harmful ones.