Struct renforce::agent::PolicyAgent [] [src]

pub struct PolicyAgent<F: Float, S: Space, A: FiniteSpace, D: DifferentiableFunc<S, A, F>> {
    pub log_func: D,
    // some fields omitted
}

Policy Agent

Explicitly stores a stochastic policy as the softmax of some differentiable function

Fields

The function used by this agent to calculate weights passed into Softmax

Methods

impl<S: Space, A: FiniteSpace, D: DifferentiableFunc<S, A, f64>> PolicyAgent<f64, S, A, D>
[src]

Creates a new PolicyAgent with temperature 1.0 used in Softmax

impl<F: Float, S: Space, A: FiniteSpace, D: DifferentiableFunc<S, A, F>> PolicyAgent<F, S, A, D>
[src]

Creates a new PolicyAgent with given parameters

Updates temp field of self

Returns temperature used by agent

Calculates the derivative of the log of this function

Trait Implementations

impl<F: Debug + Float, S: Debug + Space, A: Debug + FiniteSpace, D: Debug + DifferentiableFunc<S, A, F>> Debug for PolicyAgent<F, S, A, D> where A::Element: Debug
[src]

Formats the value using the given formatter.

impl<F: Clone + Float, S: Clone + Space, A: Clone + FiniteSpace, D: Clone + DifferentiableFunc<S, A, F>> Clone for PolicyAgent<F, S, A, D> where A::Element: Clone
[src]

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

impl<F: Float, S: Space, A: FiniteSpace, D> ParameterizedFunc<F> for PolicyAgent<F, S, A, D> where D: DifferentiableFunc<S, A, F>
[src]

Returns number of parameters used by the function

Returns the parameters used by the function

Changes the parameters used by the function

impl<F: Float, S: Space, A: FiniteSpace, D> LogDiffFunc<S, A, F> for PolicyAgent<F, S, A, D> where D: DifferentiableFunc<S, A, F>
[src]

The gradient of the log of the output with respect to the parameters

impl<F: Float, S: Space, A: FiniteSpace, D> Agent<S, A> for PolicyAgent<F, S, A, D> where D: DifferentiableFunc<S, A, F>
[src]

Returns the actions the agent should perform in the given state