TitleAuthorsYearReading DateSummaryNotesTopic
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelRafael Rafailov et al.202305.11.24Wrote a largish summary on DPORL
A Survey of Reinforcement Learning from Human FeedbackTimo Kaufmann et al.202415.11.24Wrote a large summary on RLHFStill WIPRL
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open QuestionsLei Huang, et al.202318.11.24Wrote a large summary on Hallucination and Hallucination CausesStill WIPHallucination
A Mathematical Framework for Transformers CircuitsNelson Elhange, et al.202121.11.24Wrote a large summary on A Mathematical Framework for Transformers CircuitsStill WIPMech Interp
Data-Driven Sentence Simplification: Survey and BenchmarkAlva-Manchego et al.202022.11.24Focused mainly on chapter 3 on how Human Assessment should be done. Report on Human Assessment for Text SimplificationThe rest is mostly on corpora and old way in which TS was done.Text Simplification
The (Un)Suitability of Automatic Evaluation Metrics for Text SimplificationAlva-Manchego et al.202122.11.24An extensive evaluation of different simplification metrics and how they perform and correlates w.r.t. human judges. Bigger report are on Human Assessment for Text Simplification and Automatic Evaluation of SimplicityFocused mainly on the results and the introduction.
Experimental setting wasn’t really useful for current projects.
Text Simplification
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language ModelsLiu et al.202523.01.25New method to do Hallucination Detection. They calculate a score that indicates how likely the generation is a hallucination by doing two more passes: one with the tokens that have the highest contribution to the last token in the sequence (2/3) and the other 1/3. Then it does a L-Rouge between the two and use the difference between Rouge(on the top 2/3) and rouge between the bottom third as an hallucination score.The way the “contribution” score is calculated could probably be improved.Hallucination Detection
TruthfulQA: Measuring how Models Mimic Human FalsehoodsLin et al.202105.02.25Read to prepare for Pesaresi SeminarHallucination
Do I know this entity? Knowledge awareness and Hallucinations in Language ModelsFerrando et al.202406.02.25Read to prepare for Pesaresi SeminarHallucination Detection
DoLa: Decoding by Contrasting Layers improve factuality in LLMsChuang et al.202406.02.25Read to prepare for Pesaresi SeminarHallucination
Position Aware Automatic Circuit DiscoveryHaklay et al.202511.03.25
Causal Abstraction of Neural NetworksGeiger et al.202122.03.25They create Causal Tree-like models for Neural Networks behaviours. They align the nodes of the causal model to specific neurons of the network and do intervention, they observe how the causal model change its output when certain values are changed, and then intervene in the network to observe if the network output changes the same way. If it does for a number of samples you can say that your model causally abstracts the network.Causal Abstraction