Controllable Text Simplification with Deep Reinforcement Learning

@yanamotoControllableTextSimplification2022

The Reward function is built upon the sentence level difficulty on a previous controllable text simplification model.

There are then two model: a difficulty estimator and a simplification model. The first estimates the difficulty of the generated sentence, the second is trained through RL to minimize the difference between the estimated and target difficulties.

Difficulty Estimator

The difficulty estimator is a BERT regression model, where the loss function is the MSE of target difficulty $g = (g_{1}, g_{2}, ..., g_{N})$ and the estimated difficulty $\overset{g}{^} = (g_{1}, g_{2}, ..., g_{N})$ :

L = \frac{1}{N} n = 1 \sum N (g_{n} - \overset{g}{^}_{n})^{2}

Where $N$ è la batch size.

Simplification Model

Is a sequence-to-sequence model based off @vaswaniAttentionAllYou2017, where, in the input sentence, the target difficulty level is specified through a special token (e.g. difficulty level “3” is specified as a special token <3>)·
The model is trained in two steps:

during the pretraining step a cross-entropy loss is used to stabilize the RL. Let $x$ be a complex source sentence, and $y = (y_{1}, y_{2}, ..., y_{M})$ be a simple target sentence of length $M$ , the Loss Function is defined as:

L_{c} = - \frac{1}{M} m = 1 \sum M l o g p (y_{m} ∣ y < m, x)

during the Reinforcement Learning step the pre-trained model is reinforced using the REINFORCE algorithm. The reward is calculated on the estimated difficulty of the generated sentence by the simplification model and the target difficulty assigned to the input sentence: a smaller difference between these difficulties results in a larger reward. The Difficulty estiamtor receieves the sentence generated by the simplification model and outputs the estimated difficulty $\overset{g}{^}$ . Based on this, and target difficulty $g$ , the squared error $e = (g - \overset{g}{^})^{2}$ is calculated. This error is then transformed into a reward by applying a normalization technique: $r = \frac{r _{ma x} - r _{min}}{e _{min} - e _{ma x}} (e - e_{ma x}) + r_{min}$ where $r_{ma x}$ and $r_{min}$ are the upper and lower bounds of the reward, similarly, $e_{ma x}$ and $e_{min}$ are the maximum and minimum value of the error $e$ . Finally, this reward $r$ is used to weigh the cross-entropy loss presented: $L_{r} = - r \cdot \frac{1}{M} m = 1 \sum M l o g p (y_{m} ∣ y < m, x)$
Results
Experimental results show that the method is evaluated highly from both automatic evaluations @xuOptimizingStatisticalMachine2016 and human evaluations of the simplified texts.

📚 Michele's Notes

Explorer

Controllable Text Simplification with Deep Reinforcement Learning

Difficulty Estimator

Simplification Model

Results

Graph View

Table of Contents

Backlinks