Struct renforce::trainer::PolicyGradient
[−]
[src]
pub struct PolicyGradient<F: Float, G: GradientDescAlgo<F>> { /* fields omitted */ }
A variation of the Vanilla Policy Gradient algorithm
Instead of using a baseline, rewards are normalized to mean 0 and variance 1
Methods
impl<G: GradientDescAlgo<f64>> PolicyGradient<f64, G>
[src]
fn default(grad_desc: G) -> PolicyGradient<f64, G>
Creates a PolicyGradient with default parameter values and given action space and gradient descent algorithm
impl<F: Float, G: GradientDescAlgo<F>> PolicyGradient<F, G>
[src]
fn new(grad_desc: G,
gamma: f64,
lr: F,
iters: usize,
eval_period: TimePeriod)
-> PolicyGradient<F, G>
gamma: f64,
lr: F,
iters: usize,
eval_period: TimePeriod)
-> PolicyGradient<F, G>
Constructs a new PolicyGradient with given information
fn gamma(self, gamma: f64) -> PolicyGradient<F, G>
Updates gamma field of self
fn lr(self, lr: F) -> PolicyGradient<F, G>
Updates lr field of self
fn iters(self, iters: usize) -> PolicyGradient<F, G>
Updates iters field of self
fn eval_period(self, eval_period: TimePeriod) -> PolicyGradient<F, G>
Updates eval_period field of self
Trait Implementations
impl<F: Debug + Float, G: Debug + GradientDescAlgo<F>> Debug for PolicyGradient<F, G>
[src]
impl<F: Float, S: Space, A: FiniteSpace, G, T> EpisodicTrainer<S, A, T> for PolicyGradient<F, G> where T: Agent<S, A> + LogDiffFunc<S, A, F>,
G: GradientDescAlgo<F>
[src]
G: GradientDescAlgo<F>
fn train_step(&mut self,
agent: &mut T,
env: &mut Environment<State=S, Action=A>)
agent: &mut T,
env: &mut Environment<State=S, Action=A>)
Trains agent using 1 "episodes" worth of exploration
fn train(&mut self, agent: &mut T, env: &mut Environment<State=S, Action=A>)
Trains agent to perform well in the environment, potentially acting out multiple episodes