Value Iteration

IntervalMDP.value_iteration — Function

value_iteration(problem::Problem; callback=nothing)

Solve minimizes/mazimizes optimistic/pessimistic specification problems using value iteration for interval Markov processes.

It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int).

Examples

prob1 = IntervalProbabilities(;
    lower = [
        0.0 0.5
        0.1 0.3
        0.2 0.1
    ],
    upper = [
        0.5 0.7
        0.6 0.5
        0.7 0.3
    ],
)

prob2 = IntervalProbabilities(;
    lower = [
        0.1 0.2
        0.2 0.3
        0.3 0.4
    ],
    upper = [
        0.6 0.6
        0.5 0.5
        0.4 0.4
    ],
)

prob3 = IntervalProbabilities(;
    lower = [0.0; 0.0; 1.0],
    upper = [0.0; 0.0; 1.0]
)

transition_probs = [prob1, prob2, prob3]
initial_state = 1
mdp = IntervalMarkovDecisionProcess(transition_probs, initial_state)

terminal_states = [3]
time_horizon = 10
prop = FiniteTimeReachability(terminal_states, time_horizon)
spec = Specification(prop, Pessimistic, Maximize)
problem = Problem(mdp, spec)
V, k, residual = value_iteration(problem)

source

IntervalMDP.control_synthesis — Function

control_synthesis(problem::Problem; callback=nothing)

Compute the optimal control strategy for the given problem (system + specification). If the specification is finite time, then the strategy is time-varying, with the returned strategy being in step order (i.e., the first element of the returned vector is the strategy for the first time step). If the specification is infinite time, then the strategy is stationary and only a single vector of length num_states(system) is returned.

It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int).

source

IntervalMDP.StationaryStrategy — Type

StationaryStrategy

A stationary strategy is a strategy that is the same for all time steps.

source

IntervalMDP.TimeVaryingStrategy — Type

TimeVaryingStrategy

A time-varying strategy is a strategy that may vary over time. Since we need to store the strategy for each time step, the strategy is finite, and thus only applies to finite time specifications, of the same length as the strategy.

source

IntervalMDP.bellman — Function

bellman(V, model; upper_bound = false, maximize = true)

Compute robust Bellman update with the value function V and the model model, e.g. IntervalMarkovDecisionProcess, that upper or lower bounds the expectation of the value function V via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound keyword argument. That is, if upper_bound == true then an upper bound is computed and if upper_bound == false then a lower bound is computed.

Examples

prob1 = IntervalProbabilities(;
    lower = [
        0.0 0.5
        0.1 0.3
        0.2 0.1
    ],
    upper = [
        0.5 0.7
        0.6 0.5
        0.7 0.3
    ],
)

prob2 = IntervalProbabilities(;
    lower = [
        0.1 0.2
        0.2 0.3
        0.3 0.4
    ],
    upper = [
        0.6 0.6
        0.5 0.5
        0.4 0.4
    ],
)

prob3 = IntervalProbabilities(; lower = [
    0.0
    0.0
    1.0
][:, :], upper = [
    0.0
    0.0
    1.0
][:, :])

transition_probs = [prob1, prob2, prob3]
istates = [Int32(1)]

model = IntervalMarkovDecisionProcess(transition_probs, istates)

Vprev = [1, 2, 3]
Vcur = bellman(Vprev, model; upper_bound = false)

Note

This function will construct a workspace object and an output vector. For a hot-loop, it is more efficient to use bellman! and pass in pre-allocated objects.

[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.

source

IntervalMDP.bellman! — Function

bellman!(workspace, strategy_cache, Vres, V, model; upper_bound = false, maximize = true)

Compute in-place robust Bellman update with the value function V and the model model, e.g. IntervalMarkovDecisionProcess, that upper or lower bounds the expectation of the value function V via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound keyword argument. That is, if upper_bound == true then an upper bound is computed and if upper_bound == false then a lower bound is computed.

The output is constructed in the input Vres and returned. The workspace object is also modified, and depending on the type, the strategy cache may be modified as well. See construct_workspace and construct_strategy_cache for more details on how to pre-allocate the workspace and strategy cache.

Examples

prob1 = IntervalProbabilities(;
    lower = [
        0.0 0.5
        0.1 0.3
        0.2 0.1
    ],
    upper = [
        0.5 0.7
        0.6 0.5
        0.7 0.3
    ],
)

prob2 = IntervalProbabilities(;
    lower = [
        0.1 0.2
        0.2 0.3
        0.3 0.4
    ],
    upper = [
        0.6 0.6
        0.5 0.5
        0.4 0.4
    ],
)

prob3 = IntervalProbabilities(; lower = [
    0.0
    0.0
    1.0
][:, :], upper = [
    0.0
    0.0
    1.0
][:, :])

transition_probs = [prob1, prob2, prob3]
istates = [Int32(1)]

model = IntervalMarkovDecisionProcess(transition_probs, istates)

V = [1, 2, 3]
workspace = construct_workspace(model)
strategy_cache = construct_strategy_cache(NoStrategyConfig())
Vres = similar(V)

Vres = bellman!(workspace, strategy_cache, Vres, V, model; upper_bound = false, maximize = true)

[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.

source

IntervalMDP.construct_workspace — Function

construct_workspace(proc::ProductProcess)