Title | Authors | Year | Reading Date | Summary | Notes | Topic |
---|---|---|---|---|---|---|
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | Rafael Rafailov et al. | 2023 | 05.11.24 | Wrote a largish summary on DPO | RL | |
A Survey of Reinforcement Learning from Human Feedback | Timo Kaufmann et al. | 2024 | 15.11.24 | Wrote a large summary on RLHF | Still WIP | RL |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | Lei Huang, et al. | 2023 | 18.11.24 | Wrote a large summary on Hallucination and Hallucination Causes | Still WIP | Hallucination |
A Mathematical Framework for Transformers Circuits | Nelson Elhange, et al. | 2021 | 21.11.24 | Wrote a large summary on A Mathematical Framework for Transformers Circuits | Still WIP | Mech Interp |
Data-Driven Sentence Simplification: Survey and Benchmark | Alva-Manchego et al. | 2020 | 22.11.24 | Focused mainly on chapter 3 on how Human Assessment should be done. Report on Human Assessment for Text Simplification | The rest is mostly on corpora and old way in which TS was done. | Text Simplification |
The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification | Alva-Manchego et al. | 2021 | 22.11.24 | An extensive evaluation of different simplification metrics and how they perform and correlates w.r.t. human judges. Bigger report are on Human Assessment for Text Simplification and Automatic Evaluation of Simplicity | Focused mainly on the results and the introduction. Experimental setting wasn’t really useful for current projects. | Text Simplification |
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models | Liu et al. | 2025 | 23.01.25 | New method to do Hallucination Detection. They calculate a score that indicates how likely the generation is a hallucination by doing two more passes: one with the tokens that have the highest contribution to the last token in the sequence (2/3) and the other 1/3. Then it does a L-Rouge between the two and use the difference between Rouge(on the top 2/3) and rouge between the bottom third as an hallucination score. | The way the “contribution” score is calculated could probably be improved. | Hallucination Detection |
TruthfulQA: Measuring how Models Mimic Human Falsehoods | Lin et al. | 2021 | 05.02.25 | Read to prepare for Pesaresi Seminar | Hallucination | |
Do I know this entity? Knowledge awareness and Hallucinations in Language Models | Ferrando et al. | 2024 | 06.02.25 | Read to prepare for Pesaresi Seminar | Hallucination Detection | |
DoLa: Decoding by Contrasting Layers improve factuality in LLMs | Chuang et al. | 2024 | 06.02.25 | Read to prepare for Pesaresi Seminar | Hallucination | |
Position Aware Automatic Circuit Discovery | Haklay et al. | 2025 | 11.03.25 | |||
Causal Abstraction of Neural Networks | Geiger et al. | 2021 | 22.03.25 | They create Causal Tree-like models for Neural Networks behaviours. They align the nodes of the causal model to specific neurons of the network and do intervention, they observe how the causal model change its output when certain values are changed, and then intervene in the network to observe if the network output changes the same way. If it does for a number of samples you can say that your model causally abstracts the network. | Causal Abstraction |