@yanamotoControllableTextSimplification2022
The Reward function is built upon the sentence level difficulty on a previous controllable text simplification model.
There are then two model: a difficulty estimator and a simplification model. The first estimates the difficulty of the generated sentence, the second is trained through RL to minimize the difference between the estimated and target difficulties.
Difficulty Estimator
The difficulty estimator is a BERT regression model, where the loss function is the MSE of target difficulty and the estimated difficulty :
Where รจ la batch size.
Simplification Model
Is a sequence-to-sequence model based off @vaswaniAttentionAllYou2017, where, in the input sentence, the target difficulty level is specified through a special token (e.g. difficulty level โ3โ is specified as a special token <3>)ยท
The model is trained in two steps:
- during the pretraining step a cross-entropy loss is used to stabilize the RL. Let be a complex source sentence, and be a simple target sentence of length , the Loss Function is defined as:
- during the Reinforcement Learning step the pre-trained model is reinforced using the REINFORCE algorithm. The reward is calculated on the estimated difficulty of the generated sentence by the simplification model and the target difficulty assigned to the input sentence: a smaller difference between these difficulties results in a larger reward. The Difficulty estiamtor receieves the sentence generated by the simplification model and outputs the estimated difficulty . Based on this, and target difficulty , the squared error is calculated. This error is then transformed into a reward by applying a normalization technique:
where and are the upper and lower bounds of the reward, similarly, and are the maximum and minimum value of the error . Finally, this reward is used to weigh the cross-entropy loss presented:
Results
Experimental results show that the method is evaluated highly from both automatic evaluations @xuOptimizingStatisticalMachine2016 and human evaluations of the simplified texts.