Title | Authors | Year | Reading Date | Summary | Notes | Topic |
---|---|---|---|---|---|---|
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | Rafael Rafailov et al. | 2023 | 05.11.24 | Wrote a largish summary on DPO | RL | |
A Survey of Reinforcement Learning from Human Feedback | Timo Kaufmann et al. | 2024 | 15.11.24 | Wrote a large summary on RLHF | Still WIP | RL |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | Lei Huang, et al. | 2023 | 18.11.24 | Wrote a large summary on Hallucination and Hallucination Causes | Still WIP | Hallucination |
A Mathematical Framework for Transformers Circuits | Nelson Elhange, et al. | 2021 | 21.11.24 | Wrote a large summary on A Mathematical Framework for Transformers Circuits | Still WIP | Mech Interp |
Data-Driven Sentence Simplification: Survey and Benchmark | Alva-Manchego et al. | 2020 | 22.11.24 | Focused mainly on chapter 3 on how Human Assessment should be done. Report on Human Assessment for Text Simplification | The rest is mostly on corpora and old way in which TS was done. | Text Simplification |
The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification | Alva-Manchego et al. | 2021 | 22.11.24 | An extensive evaluation of different simplification metrics and how they perform and correlates w.r.t. human judges. Bigger report are on Human Assessment for Text Simplification and Automatic Evaluation of Simplicity | Focused mainly on the results and the introduction. Experimental setting wasn’t really useful for current projects. | Text Simplification |
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models | Liu et al. | 2025 | 23.01.25 | New method to do Hallucination Detection. They calculate a score that indicates how likely the generation is a hallucination by doing two more passes: one with the tokens that have the highest contribution to the last token in the sequence (2/3) and the other 1/3. Then it does a L-Rouge between the two and use the difference between Rouge(on the top 2/3) and rouge between the bottom third as an hallucination score. | The way the “contribution” score is calculated could probably be improved. | Hallucination Detection |