📚 Michele's Notes

            • 24.10.14 (1) Dialing in The Steam & Co. on Wacaco Picopresso
            • 24.10.15 (1) Dialing in The Steam & Co. on Wacaco Picopresso
            • 24.10.15 (2) Dialing in The Steam & Co. on Wacaco Picopresso
            • 24.10.16 (1) Daily Coffee with The Steam & Co. on Wacaco Picopresso
            • 24.10.16 (2) Daily Coffee with The Steam & Co. on Wacaco Picopresso
            • 24.10.16 (3) Daily Coffee with The Steam & Co. on Wacaco Picopresso
            • 24.10.21 (1) Dialing in La Esmeralda on Wacaco Picopresso
            • 24.10.21 (2) Dialing in La Esmeralda on Wacaco Picopresso
            • 24.10.21 (3) Dialing in La Esmeralda on Wacaco Picopresso
            • 24.10.27 (1) Dialing in La Esmeralda on La Pavoni Europiccola
            • 24.10.28 (1) Dialing in Filter C on the Sworkdesign Dripper
            • 24.10.28 (1) Dialing in La Esmeralda on La Pavoni Europiccola
            • 24.11.01 (1) Dialing Dambi Uddo on La Pavoni Europiccola
            • 24.11.01 (2) Dialing Dambi Uddo on La Pavoni Europiccola
            • 24.11.01 (1) Daily Coffee with Filter C on the Sworkdesign Dripper
            • 24.11.03 (1) Daily Coffee Dambi Uddo on La Pavoni Europiccola
            • 24.12.29 (1) Daily Coffee Coopchebi on La Pavoni Europiccola
            • 25.01.01 (1) Daily Coffee Coopchebi on La Pavoni Europiccola
            • 25.01.01 (2) Daily Coffee Coopchebi on La Pavoni Europiccola
            • 25.01.03 Dialing In (1) Salaverria on La Pavoni Europiccola
            • 25.01.03 Dialing In (2) Salaverria on La Pavoni Europiccola
            • 25.01.09 Daily Espresso (1) Salaverria on La Pavoni Europiccola
            • 25.01.10 Dialing In (1) Coopchebi on Wacaco Picopresso
            • 25.01.10 Dialing In (2) Coopchebi on Wacaco Picopresso
            • 25.01.11 Daily Espresso (1) Salaverria on La Pavoni Europiccola
            • 25.01.11 Dialing In (2) Los Bellotos on La Pavoni Europiccola
            • 25.01.11 Dialing In (3) Los Bellotos on La Pavoni Europiccola
          • 2024.10.01 The Steam & Co.
          • 2024.10.04 La Esmeralda
          • 2024.10.23 Dambi Uddo
          • 2024.10.23 Filter C
          • 2024.11.13 Coopchebi
          • 2024.11.13 Dambi Uddo
          • 2024.12.12 Salaverria
          • 2025.01.08 Coopchebi
          • 2025.01.08 Los Bellotos
        • Coopchebi
        • Dambi Uddo
        • Filter C
        • La Esmeralda
        • Los Bellotos
        • Salaverria
        • The Steam & Co.
            • La Pavoni Double Basket
            • La Pavoni Lever 51mm IMS Competition Double Filter Basket
            • La Pavoni Single Basket
            • Wacaco Picopresso 18g Basket
          • Bottomless Portafilter
          • Coffee Puck Screen
          • IMS Pavoni Lever Precision Shower Screen 54mm
          • La Pavoni Lever 51.6 mm Adjustable Leveler
          • Orea Negotiator tool - Neon Green +
          • Pällo Grouphead Brush ++
          • Pesado 58.5 — WDT Clump Crusher - Raya+
          • Rubber Knock Box Black ++
          • Tamper
          • Wacaco Picopresso Needle Dispersion Tool
          • Walnut Tamper
          • La Pavoni Europiccola
          • Sworkdesign Dripper
          • Wacaco Picopresso
          • 1ZPresso JX-Pro
          • DF64 Gen 2
          • Brewista Smart Scale II
        • Espresso with Dambi Uddo, 1ZPresso Jx-Pro on La Pavoni Europiccola
        • Espresso with La Esmeralda, 1ZPresso Jx-Pro on La Pavoni Europiccola
        • Espresso with The Steam & Co., 1ZPresso Jx-Pro on Wacaco Picopresso
          • Testing Resistance of a Coffee Machine
          • USB Charging A and C
            • Group Head Temperature
          • Espresso Making
          • Extraction Theory
          • Making Coffee with Sworkdesign Dripper - 1
          • Making Coffee with Sworkdesign Dripper - 2
      • Coffee
        • Cheesecake agli Agrumi
          • Capitulum Primum
          • Exercitia Latina
          • Grammatica Latina
          • Pensa
          • Capitulum Secundum
          • Excercitia Latina
          • Grammatica Latina
          • Pensa
          • Grammatica Latina
          • Pensa
          • Capitulum Primum
          • Exercitia Latina
        • A Mathematical Framework for Transformers Circuits
        • Negative Results for SAEs on Downstream Tasks
        • Quantifying context mixing in transformers
        • Sparse Autoencoders
        • Direct Preference Optimization
        • Reinforcement Learning
        • Reinforcement Learning with Human Feedback
        • Functions
        • Composition of Linear Maps
        • Sets
            • Controllable Text Simplification with Deep Reinforcement Learning
          • Controlled Text Generation
          • Fine-tuning Approaches
          • Post-Processing Approaches
          • Retrain or Refactoring Approaches
          • Hallucination
          • Hallucination Causes
          • Automatic Evaluation of Simplicity
          • Human Assessment for Text Simplification
          • Learning Simplifications for Specific Target Audiences
          • Optimizing Statistical Machine Translation for Text Simplification
            • 24.11.27 - Malvina Nissim - The Language Factor
            • 24.11.27 - Valerio Basile - Modeling and Evaluation for Perspectivist NLP
            • 24.12.06 - Dieuwke Hupkes - Generalization in LLMs and Beyond
            • 25.02.24 - Takeaki Uno -
            • 25.04.04 - Outlier Dimension in LLMs and multimodal-LLMs. Mechanisms for Task Adaptation and Factual Recall - William Rudman
            • 25.03.24 - Neural Propagation in the Framework of Cognidynamics - Marco Gori
            • 25.03.24 - On Continual Learnings and the Dynamics of Forgetting - Tinne Tuytelaars
            • 25.03.25 - New avenues in Long-Sequence Processing without Attention - Antonio Orvieto
            • 25.03.25 - Toward the Post-dataset Eta. Embracing Data Streams - Alexei (Alyosha) Efros
            • 2025.02.04 - Lecture 1
            • 2025.02.06 - Lecture 2
            • 2025.01.08 - Lecture 1
            • 2025.01.08 - Lecture 2
            • 2025.04.30 - Lecture 3
          • Speech - English
          • Speech - Italian
          • Paper
          • Tweets to read
        • Index of Papers
          • @keskarCTRLConditionalTransformer2019
          • A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data
          • A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
          • A Survey of Reinforcement Learning from Human Feedback
          • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
          • A Survey on Hallucination in Large Vision-Language Models
          • Attention is All you Need
          • Controllable Generation from Pre-trained Language Models via Inverse Prompting
          • Controllable Text Simplification with Deep Reinforcement Learning
          • Data-Driven Sentence Simplification: Survey and Benchmark
          • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
          • Evaluating Text-To-Text Framework for Topic and Style Classification of Italian texts
          • Fine-tuning Language Models for Factuality
          • Fine-Tuning Language Models from Human Preferences
          • Finetuned Language Models Are Zero-Shot Learners
          • GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
          • Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
          • Learning Simplifications for Specific Target Audiences
          • Optimizing Statistical Machine Translation for Text Simplification
          • Quantifying Attention Flow in Transformers
          • Quantifying Context Mixing in Transformers
          • Scaling Instruction-Finetuned Language Models
          • Survey of Hallucination in Natural Language Generation
          • Technical Report: Auxiliary Tuning and its Application to Conditional Text Generation
          • The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification
          • Training language models to follow instructions with human feedback
        • Love Hotel
        • Contextualized Adventures - Jobe Bittman
        • Players make your world spin - Mike Breault
      • Python Runnable Code Block
      • Untitled
    Home

    ❯

    Scientific Literature References

    ❯

    literature notes

    ❯

    GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

    GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

    May 06, 20251 min read

    • year : 2023
    • authors : Jie Huang, Jiangshan Hao, Rongshun Juan, Randy Gomez, Keisuke Nakamura, Guangliang Li
    • repository :
    • proceedings : 2023 IEEE International Conference on Robotics and Automation (ICRA)
    • journal :
    • volume :
    • issue :
    • publisher :
    • doi : 10.1109/ICRA48891.2023.10160939
    • Abstract : Generative adversarial imitation learning (GAIL) — a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAIL shares the limitation of other imitation learning methods that they can seldom surpass the performance of demonstrations. In this paper, to address the limit of GAIL, we propose GAN-based interactive reinforcement learning (GAIRL) from demonstrations and human evaluative feedback, by combining the advantages of GAIL and interactive reinforcement learning. We test GAIRL in six physics-based control tasks, ranging from simple low-dimensional control tasks — Cart Pole, Mountain Car and Lunar Lander, to difficult high-dimensional tasks — Inverted Double Pendulum, Hopper and HalfCheetah. Our results suggest that, the GAIRL agent can generally surpass the performance of demonstrations in both low-dimensional and high-dimensional tasks and get an optimal or close to optimal policy.
    • research

    GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

    Huang et al_2023_GAN-Based Interactive Reinforcement Learning from Demonstration and Human.pdf

    Notes


    Graph View

    • GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
    • Notes

    Backlinks

    • Reinforcement Learning with Human Feedback

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community