mchmm¶
mchmm is a Python package implementing Markov chains and Hidden Markov models in pure NumPy and SciPy. It can also visualize Markov chains.
Installation¶
PyPi¶
pip install mchmm
GitHub¶
git clone https://github.com/maximtrp/mchmm.git
cd mchmm
pip install . --user
Tutorials¶
Discrete Markov chains¶
Initializing a Markov chain using some data.
>>> import mchmm as mc
>>> a = mc.MarkovChain().from_data('AABCABCBAAAACBCBACBABCABCBACBACBABABCBACBBCBBCBCBCBACBABABCBCBAAACABABCBBCBCBCBCBCBAABCBBCBCBCCCBABCBCBBABCBABCABCCABABCBABC')
Now, we can look at the observed transition frequency matrix:
>>> a.observed_matrix
array([[ 7., 18., 7.],
[19., 5., 29.],
[ 5., 30., 3.]])
And the observed transition probability matrix:
>>> a.observed_p_matrix
array([[0.21875 , 0.5625 , 0.21875 ],
[0.35849057, 0.09433962, 0.54716981],
[0.13157895, 0.78947368, 0.07894737]])
You can visualize your Markov chain. First, build a directed graph with graph_make()
method of MarkovChain
object.
Then render()
it.
>>> graph = a.graph_make(
format="png",
graph_attr=[("rankdir", "LR")],
node_attr=[("fontname", "Roboto bold"), ("fontsize", "20")],
edge_attr=[("fontname", "Iosevka"), ("fontsize", "12")]
)
>>> graph.render()
Here is the result:

Pandas can help us annotate columns and rows:
>>> import pandas as pd
>>> pd.DataFrame(a.observed_matrix, index=a.states, columns=a.states, dtype=int)
A B C
A 7 18 7
B 19 5 29
C 5 30 3
Viewing the expected transition frequency matrix:
>>> a.expected_matrix
array([[ 8.06504065, 13.78861789, 10.14634146],
[13.35772358, 22.83739837, 16.80487805],
[ 9.57723577, 16.37398374, 12.04878049]])
Calculating Nth order transition probability matrix:
>>> a.n_order_matrix(a.observed_p_matrix, order=2)
array([[0.2782854 , 0.34881028, 0.37290432],
[0.1842357 , 0.64252707, 0.17323722],
[0.32218957, 0.21081868, 0.46699175]])
Carrying out a chi-squared test:
>>> a.chisquare(a.observed_matrix, a.expected_matrix, axis=None)
Power_divergenceResult(statistic=47.89038802624337, pvalue=1.0367838347591701e-07)
Finally, let’s simulate a Markov chain given our data.
>>> ids, states = a.simulate(10, start='A', seed=np.random.randint(0, 10, 10))
>>> ids
array([0, 2, 1, 0, 2, 1, 0, 2, 1, 0])
>>> states
array(['A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A'], dtype='<U1')
>>> "".join(states)
'ACBACBACBA'
mchmm.MarkovChain¶
- class mchmm.MarkovChain(states: Optional[Union[list, numpy.ndarray]] = None, obs: Optional[Union[list, numpy.ndarray]] = None, obs_p: Optional[Union[list, numpy.ndarray]] = None)¶
Bases:
object
- __init__(states: Optional[Union[list, numpy.ndarray]] = None, obs: Optional[Union[list, numpy.ndarray]] = None, obs_p: Optional[Union[list, numpy.ndarray]] = None)¶
Discrete Markov Chain.
- Parameters
states (Optional[Union[numpy.ndarray, list]) – State names list.
obs (Optional[Union[numpy.ndarray, list]) – Observed transition frequency matrix.
obs_p (Optional[Union[numpy.ndarray, list]) – Observed transition probability matrix.
- _transition_matrix(seq: Optional[Union[str, numpy.ndarray, list]] = None, states: Optional[Union[str, numpy.ndarray, list]] = None) → numpy.ndarray¶
Calculate a transition frequency matrix.
- Parameters
seq (Optional[Union[str, list, numpy.ndarray]]) – Observations sequence.
states (Optional[Union[str, list, numpy.ndarray]]) – List of states.
- Returns
matrix – Transition frequency matrix.
- Return type
numpy.ndarray
- chisquare(obs: Optional[numpy.ndarray] = None, exp: Optional[numpy.ndarray] = None, **kwargs) → Tuple[Union[float, numpy.ndarray], Union[float, numpy.ndarray]]¶
Wrapper function for carrying out a chi-squared test using scipy.stats.chisquare method.
- Parameters
obs (numpy.ndarray) – Observed transition frequency matrix.
exp (numpy.ndarray) – Expected transition frequency matrix.
kwargs (optional) – Keyword arguments passed to scipy.stats.chisquare method.
- Returns
chisq (float or numpy.ndarray) – Chi-squared test statistic.
p (float or numpy.ndarray) – P value of the test.
- from_data(seq: Union[str, numpy.ndarray, list]) → object¶
Infer a Markov chain from data. States, frequency and probability matrices are automatically calculated and assigned to as class attributes.
- Parameters
seq (Union[str, np.ndarray, list]) – Sequence of events. A string or an array-like object exposing the array interface and containing strings or ints.
- Returns
MarkovChain – Trained MarkovChain class instance.
- Return type
object
- graph_make(*args, **kwargs) → graphviz.dot.Digraph¶
Make a directed graph of a Markov chain using graphviz.
- Parameters
args (optional) – Arguments passed to the underlying graphviz.Digraph method.
kwargs (optional) – Keyword arguments passed to the underlying graphviz.Digraph method.
- Returns
graph – Digraph object with its own methods.
- Return type
graphviz.dot.Digraph
Note
graphviz.dot.Digraph.render method should be used to output a file.
- n_order_matrix(mat: Optional[numpy.ndarray] = None, order: int = 2) → numpy.ndarray¶
Create Nth order transition probability matrix.
- Parameters
mat (numpy.ndarray, optional) – Observed transition probability matrix.
order (int, optional) – Order of transition probability matrix to return. Default is 2.
- Returns
x – Nth order transition probability matrix.
- Return type
numpy.ndarray
- prob_to_freq_matrix(mat: Optional[numpy.ndarray] = None, row_totals: Optional[numpy.ndarray] = None) → numpy.ndarray¶
Calculate a transition frequency matrix given a transition probability matrix and row totals. This method is meant to be used to calculate a frequency matrix for a Nth order transition probability matrix.
- Parameters
mat (numpy.ndarray, optional) – Transition probability matrix.
row_totals (numpy.ndarray, optional) – Row totals of transition frequency matrix.
- Returns
x – Transition frequency matrix.
- Return type
numpy.ndarray
- simulate(n: int, tf: Optional[numpy.ndarray] = None, states: Optional[Union[list, numpy.ndarray]] = None, start: Optional[Union[str, int]] = None, ret: str = 'both', seed: Optional[Union[list, numpy.ndarray]] = None) → Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]¶
Markov chain simulation based on scipy.stats.multinomial.
- Parameters
n (int) – Number of states to simulate.
tf (numpy.ndarray, optional) – Transition frequency matrix. If None, observed_matrix instance attribute is used.
states (Optional[Union[np.ndarray, list]]) – State names. If None, states instance attribute is used.
start (Optional[str, int]) – Event to begin with. If integer is passed, the state is chosen by index. If string is passed, the state is chosen by name. If random string is passed, a random state is taken. If left unspecified (None), an event with maximum probability is chosen.
ret (str, optional) – Return state indices if indices is passed. If states is passed, return state names. Return both if both is passed.
seed (Optional[Union[list, numpy.ndarray]]) – Random states used to draw random variates (of size n). Passed to scipy.stats.multinomial method.
- Returns
x (numpy.ndarray) – Sequence of state indices.
y (numpy.ndarray, optional) – Sequence of state names. Returned if return arg is set to ‘states’ or ‘both’.