EM Algorithm#
Implements the EM algorithm for the mixture model.
Using the class models.LymphMixture and its methods, this module provides
functions to compute the expectation and maximization steps of the EM algorithm.
- lymixture.em.RNG = Generator(PCG64) at 0x72253FF8EDC0#
Random number generator for reproducibility.
- lymixture.em.expectation(model: LymphMixture, params: dict[str, float], *, log: bool = False) ndarray[source]#
Compute expected value of latent
modelvariables given theparams.This marks the E-step of the famous EM algorithm. The returned expected values are also often called responsibilities.
If
logis set toTrue, the function returns the logarithm of the responsibilities.
- lymixture.em.init_callback() Callable[source]#
Return a function that logs the optimization progress.
- lymixture.em.maximization(model: LymphMixture, log_resps: ndarray, parallelize: bool = True, method: str = 'Powell') dict[str, float][source]#
Maximize
modelparams given expectation oflatentvariables.This is the corresponding M-step to the
expectation()of the EM algorithm. It first maximizes the mixture coefficients analytically and then optimizes the model parameters of all components sequentially.
- lymixture.em.log_prob_fn_fixed_mixture(theta: Sequence[float], model: LymphMixture) float[source]#
Compute the model’s log-prob, given its params, excluding mixture coefficients.
This function calculates the log-probability of a mixture
modelbased on the provided parameters (theta), assuming that mixture coefficients remain fixed. It ensures that the parameter values are within the valid range [0, 1], and returns negative infinity (-inf) if any parameter is out of bounds.- Returns:
The log-probability of the model if parameters are valid, or
-infif parameters are out of bounds.- Return type:
Note
This function does not modify or include mixture coefficients in
theta; these are assumed to remain unchanged.The _set_params function is used to update the model parameters before computing the likelihood.
- lymixture.em.log_prob_fn(theta: Sequence[float], model: LymphMixture) float[source]#
Compute the log-probability of the model given its parameters.
This function returns the log-probability of the provided mixture
modelbased on the given parameter values (theta). It ensures that parameters stay within predefined bounds (0 to 1). If any parameter is out of bounds, the function returns negative infinity (-inf).Note
The theta array includes mixture parameters, which are not sampled from a simplex. This behavior could be extended to enforce simplex constraints if required.
- lymixture.em.sample_fixed_mixture(model: LymphMixture, steps: int = 100, latent: DataFrame | None = None, filename: str = 'chain_fixed_mix.hdf5', *, continue_sampling: bool = False) tuple[HDFBackend, ndarray][source]#
Sample the parameters of a mixture model, excluding mixture coefficients.
This function performs MCMC sampling for the parameters of a mixture
modelwhile keeping the mixture coefficients fixed. It allows the specification oflatentparameters and offers options to either start a new sampling session orcontinue_samplingfrom an existing HDF5 backend file (namedfilename).Note
The model’s responsibilities (
resps) and mixture coefficients are updated based on the provided or computed latent parameters.Mixture coefficients are fixed during the sampling process.
The function initializes an
emcee.EnsembleSamplerwith a fixed mixture coefficient log-probability function (log_prob_fn_fixed_mixture) and uses multiprocessing to parallelize sampling.
- lymixture.em.sample_model_params(model: LymphMixture, steps: int = 100, latent: DataFrame | None = None, filename: str = 'chain_fixed_latent.hdf5', *, continue_sampling: bool = False) tuple[HDFBackend, ndarray][source]#
Sample the parameters of a mixture model given expectations of latent variables.
This function performs Markov Chain Monte Carlo (MCMC) sampling of the parameters of a provided mixture
model. It allows settinglatentparameters and provides options to either start sampling from scratch orcontinue_samplingfrom a previous state stored in an HDF5 file namedfilename.Note
The model’s responsibilities (
resps) and mixture coefficients are updated based on the provided or computed latent parameters.The function initializes an emcee.EnsembleSampler for MCMC sampling and uses a multiprocessing pool to parallelize the computations.
- lymixture.em.complete_latent_likelihood(theta: Sequence[float], model: LymphMixture) float[source]#
Compute the complete data log-llh for mixture
model, given latent variables.This function evaluates the log-likelihood of the mixture
modelusing a provided set of latent variable assignments (theta). The assignments are set as the responsibilities (resps) of the model before computing the likelihood.
- lymixture.em.mh_latent_sampler_per_patient_2_component(model: LymphMixture, temp: float | None = None) tuple[DataFrame, float][source]#
Perform Metropolis-Hastings for latent variables per-patient for 2 components.
This function implements a basic Metropolis-Hastings (MH) sampler to update the latent variables (responsibilities) of a mixture
modelfor individual patients. It swaps the latent variable assignments for two components, evaluates the log-acceptance ratio, and accepts or rejects the proposed changes based on the Metropolis criterion.It returns the latent variable responsibilities before the sampling step and the log-probability of the model before the sampling step.
Note
The sampler works by proposing a swap of responsibilities between two components for each patient and calculating the acceptance ratio using the patient-specific mixture likelihoods.
Accepted swaps are updated in the latent variable matrix under the header
accepted_position.The current and new log-probabilities are computed using the provided
log_prob_fn.This function is designed for a full AIP algorithm but is not used due to long computation times.
- lymixture.em.aip_sampling_algorithm(model: LymphMixture, ip_rounds: int = 4000, n_steps_params: int = 1, temperature_schedule: Callable[[int], float] | None = None, params_filename: str = '../../params_samples.hdf5') dict[str, list][source]#
Perform Alternating Iterative Posterior (AIP) sampling for a mixture model.
This function alternates between sampling latent variables and
modelparameters to approximate the posterior distribution of a mixture model. The AIP algorithm integrates Metropolis-Hastings (MH) sampling for latent variables and a parameter sampler initialized withemcee. This is computationally intensive and may take a long time to converge and is therefore only used for toy problems.- Returns:
“params_samples” (list): Samples of model parameters.
”latent_samples” (list): Samples of latent variables.
”complete_likelihoods” (list): Complete data log-llhs across iterations.
”incomplete_likelihoods” (list): Incomplete data log-llhs across iterations.
”number_of_swaps” (list): Number of swaps in latent variables btw. iterations.
- Return type:
A dictionary containing