Value iteration

IntervalMDP.value_iterationFunction
value_iteration(problem::Problem; callback=nothing)

Solve minimizes/mazimizes optimistic/pessimistic specification problems using value iteration for interval Markov processes.

It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int).

Examples

prob1 = IntervalProbabilities(;
    lower = [
        0.0 0.5
        0.1 0.3
        0.2 0.1
    ],
    upper = [
        0.5 0.7
        0.6 0.5
        0.7 0.3
    ],
)

prob2 = IntervalProbabilities(;
    lower = [
        0.1 0.2
        0.2 0.3
        0.3 0.4
    ],
    upper = [
        0.6 0.6
        0.5 0.5
        0.4 0.4
    ],
)

prob3 = IntervalProbabilities(;
    lower = [0.0; 0.0; 1.0],
    upper = [0.0; 0.0; 1.0]
)

transition_probs = [prob1, prob2, prob3]
initial_state = 1
mdp = IntervalMarkovDecisionProcess(transition_probs, initial_state)

terminal_states = [3]
time_horizon = 10
prop = FiniteTimeReachability(terminal_states, time_horizon)
spec = Specification(prop, Pessimistic, Maximize)
problem = Problem(mdp, spec)
V, k, residual = value_iteration(problem)
source
IntervalMDP.control_synthesisFunction
control_synthesis(problem::Problem; callback=nothing)

Compute the optimal control strategy for the given problem (system + specification). If the specification is finite time, then the strategy is time-varying, with the returned strategy being in step order (i.e., the first element of the returned vector is the strategy for the first time step). If the specification is infinite time, then the strategy is stationary and only a single vector of length num_states(system) is returned.

It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int).

source
IntervalMDP.TimeVaryingStrategyType
TimeVaryingStrategy

A time-varying strategy is a strategy that may vary over time. Since we need to store the strategy for each time step, the strategy is finite, and thus only applies to finite time specifications, of the same length as the strategy.

source

Bellman update

IntervalMDP.bellmanFunction
bellman(V, prob; upper_bound = false)

Compute robust Bellman update with the value function V and the interval probabilities prob that upper or lower bounds the expectation of the value function V via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound keyword argument. That is, if upper_bound == true then an upper bound is computed and if upper_bound == false then a lower bound is computed.

Examples

prob = IntervalProbabilities(;
    lower = sparse_hcat(
        SparseVector(15, [4, 10], [0.1, 0.2]),
        SparseVector(15, [5, 6, 7], [0.5, 0.3, 0.1]),
    ),
    upper = sparse_hcat(
        SparseVector(15, [1, 4, 10], [0.5, 0.6, 0.7]),
        SparseVector(15, [5, 6, 7], [0.7, 0.5, 0.3]),
    ),
)

Vprev = collect(1:15)
Vcur = bellman(Vprev, prob; upper_bound = false)
Note

This function will construct a workspace object and an output vector. For a hot-loop, it is more efficient to use bellman! and pass in pre-allocated objects.

[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.

source
IntervalMDP.bellman!Function
bellman!(workspace, strategy_cache, Vres, V, prob, stateptr; upper_bound = false, maximize = true)

Compute in-place robust Bellman update with the value function V and the interval probabilities prob that upper or lower bounds the expectation of the value function V via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound keyword argument. That is, if upper_bound == true then an upper bound is computed and if upper_bound == false then a lower bound is computed.

The output is constructed in the input Vres and returned. The workspace object is also modified, and depending on the type, the strategy cache may be modified as well. See construct_workspace and construct_strategy_cache for more details on how to pre-allocate the workspace and strategy cache.

Examples

prob = IntervalProbabilities(;
    lower = sparse_hcat(
        SparseVector(15, [4, 10], [0.1, 0.2]),
        SparseVector(15, [5, 6, 7], [0.5, 0.3, 0.1]),
    ),
    upper = sparse_hcat(
        SparseVector(15, [1, 4, 10], [0.5, 0.6, 0.7]),
        SparseVector(15, [5, 6, 7], [0.7, 0.5, 0.3]),
    ),
)

V = collect(1:15)
workspace = construct_workspace(prob)
strategy_cache = construct_strategy_cache(NoStrategyConfig())
Vres = similar(V)

Vres = bellman!(workspace, strategy_cache, Vres, V, prob; upper_bound = false, maximize = true)

[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.

source
IntervalMDP.construct_workspaceFunction
construct_workspace(mp::IntervalMarkovProcess)

Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.

The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.

source
construct_workspace(prob::IntervalProbabilities)

Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.

The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.

source
construct_workspace(prob::OrthogonalIntervalProbabilities)

Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.

The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.

source
construct_workspace(prob::OrthogonalIntervalProbabilities)

Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.

The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.

source
construct_workspace(prob::MixtureIntervalProbabilities)

Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.

The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.

source
IntervalMDP.construct_strategy_cacheFunction
construct_strategy_cache(mp::Union{IntervalProbabilities, IntervalMarkovProcess}, config::AbstractStrategyConfig)

Construct a strategy cache from a configuration for a given interval Markov process. The resuling cache type depends on the configuration and the device to store the strategy depends on the device of the Markov process.

source
IntervalMDP.StationaryStrategyConfigType
StationaryStrategyConfig

A configuration for a strategy cache that stores stationary policies. Note that the strategy is updated at each iteration of the value iteration algorithm, if a new choice is strictly better than the previous one. See [1, Section 4.3] for more details why this is necessary. See construct_strategy_cache for more details on how to construct the cache from the configuration.

[1] Forejt, Vojtěch, et al. "Automated verification techniques for probabilistic systems." Formal Methods for Eternal Networked Software Systems: 11th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2011, Bertinoro, Italy, June 13-18, 2011. Advanced Lectures 11 (2011): 53-113.

source