Value iteration
IntervalMDP.value_iteration
— Functionvalue_iteration(problem::Problem; callback=nothing)
Solve minimizes/mazimizes optimistic/pessimistic specification problems using value iteration for interval Markov processes.
It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int)
.
Examples
prob1 = IntervalProbabilities(;
lower = [
0.0 0.5
0.1 0.3
0.2 0.1
],
upper = [
0.5 0.7
0.6 0.5
0.7 0.3
],
)
prob2 = IntervalProbabilities(;
lower = [
0.1 0.2
0.2 0.3
0.3 0.4
],
upper = [
0.6 0.6
0.5 0.5
0.4 0.4
],
)
prob3 = IntervalProbabilities(;
lower = [0.0; 0.0; 1.0],
upper = [0.0; 0.0; 1.0]
)
transition_probs = [prob1, prob2, prob3]
initial_state = 1
mdp = IntervalMarkovDecisionProcess(transition_probs, initial_state)
terminal_states = [3]
time_horizon = 10
prop = FiniteTimeReachability(terminal_states, time_horizon)
spec = Specification(prop, Pessimistic, Maximize)
problem = Problem(mdp, spec)
V, k, residual = value_iteration(problem)
IntervalMDP.control_synthesis
— Functioncontrol_synthesis(problem::Problem; callback=nothing)
Compute the optimal control strategy for the given problem (system + specification). If the specification is finite time, then the strategy is time-varying, with the returned strategy being in step order (i.e., the first element of the returned vector is the strategy for the first time step). If the specification is infinite time, then the strategy is stationary and only a single vector of length num_states(system)
is returned.
It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int)
.
IntervalMDP.StationaryStrategy
— TypeStationaryStrategy
A stationary strategy is a strategy that is the same for all time steps.
IntervalMDP.TimeVaryingStrategy
— TypeTimeVaryingStrategy
A time-varying strategy is a strategy that may vary over time. Since we need to store the strategy for each time step, the strategy is finite, and thus only applies to finite time specifications, of the same length as the strategy.
Bellman update
IntervalMDP.bellman
— Functionbellman(V, prob; upper_bound = false)
Compute robust Bellman update with the value function V
and the interval probabilities prob
that upper or lower bounds the expectation of the value function V
via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound
keyword argument. That is, if upper_bound == true
then an upper bound is computed and if upper_bound == false
then a lower bound is computed.
Examples
prob = IntervalProbabilities(;
lower = sparse_hcat(
SparseVector(15, [4, 10], [0.1, 0.2]),
SparseVector(15, [5, 6, 7], [0.5, 0.3, 0.1]),
),
upper = sparse_hcat(
SparseVector(15, [1, 4, 10], [0.5, 0.6, 0.7]),
SparseVector(15, [5, 6, 7], [0.7, 0.5, 0.3]),
),
)
Vprev = collect(1:15)
Vcur = bellman(Vprev, prob; upper_bound = false)
This function will construct a workspace object and an output vector. For a hot-loop, it is more efficient to use bellman!
and pass in pre-allocated objects.
[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.
IntervalMDP.bellman!
— Functionbellman!(workspace, strategy_cache, Vres, V, prob, stateptr; upper_bound = false, maximize = true)
Compute in-place robust Bellman update with the value function V
and the interval probabilities prob
that upper or lower bounds the expectation of the value function V
via O-maximization [1]. Whether the expectation is maximized or minimized is determined by the upper_bound
keyword argument. That is, if upper_bound == true
then an upper bound is computed and if upper_bound == false
then a lower bound is computed.
The output is constructed in the input Vres
and returned. The workspace object is also modified, and depending on the type, the strategy cache may be modified as well. See construct_workspace
and construct_strategy_cache
for more details on how to pre-allocate the workspace and strategy cache.
Examples
prob = IntervalProbabilities(;
lower = sparse_hcat(
SparseVector(15, [4, 10], [0.1, 0.2]),
SparseVector(15, [5, 6, 7], [0.5, 0.3, 0.1]),
),
upper = sparse_hcat(
SparseVector(15, [1, 4, 10], [0.5, 0.6, 0.7]),
SparseVector(15, [5, 6, 7], [0.7, 0.5, 0.3]),
),
)
V = collect(1:15)
workspace = construct_workspace(prob)
strategy_cache = construct_strategy_cache(NoStrategyConfig())
Vres = similar(V)
Vres = bellman!(workspace, strategy_cache, Vres, V, prob; upper_bound = false, maximize = true)
[1] M. Lahijanian, S. B. Andersson and C. Belta, "Formal Verification and Synthesis for Discrete-Time Stochastic Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031-2045, Aug. 2015, doi: 10.1109/TAC.2015.2398883.
IntervalMDP.construct_workspace
— Functionconstruct_workspace(mp::IntervalMarkovProcess)
Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.
The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.
construct_workspace(prob::IntervalProbabilities)
Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.
The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.
construct_workspace(prob::OrthogonalIntervalProbabilities)
Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.
The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.
construct_workspace(prob::OrthogonalIntervalProbabilities)
Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.
The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.
construct_workspace(prob::MixtureIntervalProbabilities)
Construct a workspace for computing the Bellman update, given a value function. If the Bellman update is used in a hot-loop, it is more efficient to use this function to preallocate the workspace and reuse across iterations.
The workspace type is determined by the type and size of the transition probability matrix, as well as the number of threads available.
IntervalMDP.construct_strategy_cache
— Functionconstruct_strategy_cache(mp::Union{IntervalProbabilities, IntervalMarkovProcess}, config::AbstractStrategyConfig)
Construct a strategy cache from a configuration for a given interval Markov process. The resuling cache type depends on the configuration and the device to store the strategy depends on the device of the Markov process.
IntervalMDP.GivenStrategyConfig
— TypeGivenStrategyConfig
A configuration for a strategy cache where a given strategy is applied.
IntervalMDP.NoStrategyConfig
— TypeNoStrategyConfig
A configuration for a strategy cache that does not store policies. See construct_strategy_cache
for more details on how to construct the cache from the configuration.
IntervalMDP.StationaryStrategyConfig
— TypeStationaryStrategyConfig
A configuration for a strategy cache that stores stationary policies. Note that the strategy is updated at each iteration of the value iteration algorithm, if a new choice is strictly better than the previous one. See [1, Section 4.3] for more details why this is necessary. See construct_strategy_cache
for more details on how to construct the cache from the configuration.
[1] Forejt, Vojtěch, et al. "Automated verification techniques for probabilistic systems." Formal Methods for Eternal Networked Software Systems: 11th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2011, Bertinoro, Italy, June 13-18, 2011. Advanced Lectures 11 (2011): 53-113.
IntervalMDP.TimeVaryingStrategyConfig
— TypeTimeVaryingStrategyConfig
A configuration for a strategy cache that stores time-varying policies. See construct_strategy_cache
for more details on how to construct the cache from the configuration.