Struct renforce::trainer::PolicyGradient [−] [src]

pub struct PolicyGradient<F: Float, G: GradientDescAlgo<F>> { /* fields omitted */ }

A variation of the Vanilla Policy Gradient algorithm

Instead of using a baseline, rewards are normalized to mean 0 and variance 1

Methods

`impl<G: GradientDescAlgo<f64>> PolicyGradient<f64, G>`
[src]

`fn default(grad_desc: G) -> PolicyGradient<f64, G>`

Creates a PolicyGradient with default parameter values and given action space and gradient descent algorithm

`impl<F: Float, G: GradientDescAlgo<F>> PolicyGradient<F, G>`
[src]

`fn new(grad_desc: G, gamma: f64, lr: F, iters: usize, eval_period: TimePeriod) -> PolicyGradient<F, G>`

Constructs a new PolicyGradient with given information

`fn gamma(self, gamma: f64) -> PolicyGradient<F, G>`

Updates gamma field of self

`fn lr(self, lr: F) -> PolicyGradient<F, G>`

Updates lr field of self

`fn iters(self, iters: usize) -> PolicyGradient<F, G>`

Updates iters field of self

`fn eval_period(self, eval_period: TimePeriod) -> PolicyGradient<F, G>`

Updates eval_period field of self

Trait Implementations

`impl<F: Debug + Float, G: Debug + GradientDescAlgo<F>> Debug for PolicyGradient<F, G>`
[src]

`fn fmt(&self, __arg_0: &mut Formatter) -> Result`

Formats the value using the given formatter.

`impl<F: Float, S: Space, A: FiniteSpace, G, T> EpisodicTrainer<S, A, T> for PolicyGradient<F, G> where T: Agent<S, A> + LogDiffFunc<S, A, F>, G: GradientDescAlgo<F>`
[src]

`fn train_step(&mut self, agent: &mut T, env: &mut Environment<State=S, Action=A>)`

Trains agent using 1 "episodes" worth of exploration

`fn train(&mut self, agent: &mut T, env: &mut Environment<State=S, Action=A>)`

Trains agent to perform well in the environment, potentially acting out multiple episodes

Struct renforce::trainer::PolicyGradient [−] [src]

Methods

impl<G: GradientDescAlgo<f64>> PolicyGradient<f64, G>[src]

fn default(grad_desc: G) -> PolicyGradient<f64, G>

impl<F: Float, G: GradientDescAlgo<F>> PolicyGradient<F, G>[src]

fn new(grad_desc: G, gamma: f64, lr: F, iters: usize, eval_period: TimePeriod) -> PolicyGradient<F, G>

fn gamma(self, gamma: f64) -> PolicyGradient<F, G>

fn lr(self, lr: F) -> PolicyGradient<F, G>

fn iters(self, iters: usize) -> PolicyGradient<F, G>

fn eval_period(self, eval_period: TimePeriod) -> PolicyGradient<F, G>