TitleAuthorsYearReading DateSummaryNotesTopic
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelRafael Rafailov et al.202305.11.24Wrote a largish summary on DPORL
A Survey of Reinforcement Learning from Human FeedbackTimo Kaufmann et al.202415.11.24Wrote a large summary on RLHFStill WIPRL
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open QuestionsLei Huang, et al.202318.11.24Wrote a large summary on Hallucination and Hallucination CausesStill WIPHallucination
A Mathematical Framework for Transformers CircuitsNelson Elhange, et al.202121.11.24Wrote a large summary on A Mathematical Framework for Transformers CircuitsStill WIPMech Interp
Data-Driven Sentence Simplification: Survey and BenchmarkAlva-Manchego et al.202022.11.24Focused mainly on chapter 3 on how Human Assessment should be done. Report on Human Assessment for Text SimplificationThe rest is mostly on corpora and old way in which TS was done.Text Simplification
The (Un)Suitability of Automatic Evaluation Metrics for Text SimplificationAlva-Manchego et al.202122.11.24An extensive evaluation of different simplification metrics and how they perform and correlates w.r.t. human judges. Bigger report are on Human Assessment for Text Simplification and Automatic Evaluation of SimplicityFocused mainly on the results and the introduction.
Experimental setting wasn’t really useful for current projects.
Text Simplification
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language ModelsLiu et al.202523.01.25New method to do Hallucination Detection. They calculate a score that indicates how likely the generation is a hallucination by doing two more passes: one with the tokens that have the highest contribution to the last token in the sequence (2/3) and the other 1/3. Then it does a L-Rouge between the two and use the difference between Rouge(on the top 2/3) and rouge between the bottom third as an hallucination score.The way the “contribution” score is calculated could probably be improved.Hallucination Detection