Struct renforce::trainer::PolicyGradient [] [src]

pub struct PolicyGradient<F: Float, G: GradientDescAlgo<F>> { /* fields omitted */ }

A variation of the Vanilla Policy Gradient algorithm

Instead of using a baseline, rewards are normalized to mean 0 and variance 1

Methods

impl<G: GradientDescAlgo<f64>> PolicyGradient<f64, G>
[src]

Creates a PolicyGradient with default parameter values and given action space and gradient descent algorithm

impl<F: Float, G: GradientDescAlgo<F>> PolicyGradient<F, G>
[src]

Constructs a new PolicyGradient with given information

Updates gamma field of self

Updates lr field of self

Updates iters field of self

Updates eval_period field of self

Trait Implementations

impl<F: Debug + Float, G: Debug + GradientDescAlgo<F>> Debug for PolicyGradient<F, G>
[src]

Formats the value using the given formatter.

impl<F: Float, S: Space, A: FiniteSpace, G, T> EpisodicTrainer<S, A, T> for PolicyGradient<F, G> where T: Agent<S, A> + LogDiffFunc<S, A, F>,
        G: GradientDescAlgo<F>
[src]

Trains agent using 1 "episodes" worth of exploration

Trains agent to perform well in the environment, potentially acting out multiple episodes