EM Algorithm#

Implements the EM algorithm for the mixture model.

Using the class models.LymphMixture and its methods, this module provides functions to compute the expectation and maximization steps of the EM algorithm.

lymixture.em.RNG = Generator(PCG64) at 0x72253FF8EDC0#

Random number generator for reproducibility.

lymixture.em.expectation(model: LymphMixture, params: dict[str, float], *, log: bool = False) ndarray[source]#

Compute expected value of latent model variables given the params.

This marks the E-step of the famous EM algorithm. The returned expected values are also often called responsibilities.

If log is set to True, the function returns the logarithm of the responsibilities.

lymixture.em.init_callback() Callable[source]#

Return a function that logs the optimization progress.

lymixture.em.maximization(model: LymphMixture, log_resps: ndarray, parallelize: bool = True, method: str = 'Powell') dict[str, float][source]#

Maximize model params given expectation of latent variables.

This is the corresponding M-step to the expectation() of the EM algorithm. It first maximizes the mixture coefficients analytically and then optimizes the model parameters of all components sequentially.

lymixture.em.log_prob_fn_fixed_mixture(theta: Sequence[float], model: LymphMixture) float[source]#

Compute the model’s log-prob, given its params, excluding mixture coefficients.

This function calculates the log-probability of a mixture model based on the provided parameters (theta), assuming that mixture coefficients remain fixed. It ensures that the parameter values are within the valid range [0, 1], and returns negative infinity (-inf) if any parameter is out of bounds.

Returns:

The log-probability of the model if parameters are valid, or -inf if parameters are out of bounds.

Return type:

float

Note

  • This function does not modify or include mixture coefficients in theta; these are assumed to remain unchanged.

  • The _set_params function is used to update the model parameters before computing the likelihood.

lymixture.em.log_prob_fn(theta: Sequence[float], model: LymphMixture) float[source]#

Compute the log-probability of the model given its parameters.

This function returns the log-probability of the provided mixture model based on the given parameter values (theta). It ensures that parameters stay within predefined bounds (0 to 1). If any parameter is out of bounds, the function returns negative infinity (-inf).

Note

The theta array includes mixture parameters, which are not sampled from a simplex. This behavior could be extended to enforce simplex constraints if required.

lymixture.em.sample_fixed_mixture(model: LymphMixture, steps: int = 100, latent: DataFrame | None = None, filename: str = 'chain_fixed_mix.hdf5', *, continue_sampling: bool = False) tuple[HDFBackend, ndarray][source]#

Sample the parameters of a mixture model, excluding mixture coefficients.

This function performs MCMC sampling for the parameters of a mixture model while keeping the mixture coefficients fixed. It allows the specification of latent parameters and offers options to either start a new sampling session or continue_sampling from an existing HDF5 backend file (named filename).

Note

  • The model’s responsibilities (resps) and mixture coefficients are updated based on the provided or computed latent parameters.

  • Mixture coefficients are fixed during the sampling process.

  • The function initializes an emcee.EnsembleSampler with a fixed mixture coefficient log-probability function (log_prob_fn_fixed_mixture) and uses multiprocessing to parallelize sampling.

lymixture.em.sample_model_params(model: LymphMixture, steps: int = 100, latent: DataFrame | None = None, filename: str = 'chain_fixed_latent.hdf5', *, continue_sampling: bool = False) tuple[HDFBackend, ndarray][source]#

Sample the parameters of a mixture model given expectations of latent variables.

This function performs Markov Chain Monte Carlo (MCMC) sampling of the parameters of a provided mixture model. It allows setting latent parameters and provides options to either start sampling from scratch or continue_sampling from a previous state stored in an HDF5 file named filename.

Note

  • The model’s responsibilities (resps) and mixture coefficients are updated based on the provided or computed latent parameters.

  • The function initializes an emcee.EnsembleSampler for MCMC sampling and uses a multiprocessing pool to parallelize the computations.

lymixture.em.complete_latent_likelihood(theta: Sequence[float], model: LymphMixture) float[source]#

Compute the complete data log-llh for mixture model, given latent variables.

This function evaluates the log-likelihood of the mixture model using a provided set of latent variable assignments (theta). The assignments are set as the responsibilities (resps) of the model before computing the likelihood.

lymixture.em.mh_latent_sampler_per_patient_2_component(model: LymphMixture, temp: float | None = None) tuple[DataFrame, float][source]#

Perform Metropolis-Hastings for latent variables per-patient for 2 components.

This function implements a basic Metropolis-Hastings (MH) sampler to update the latent variables (responsibilities) of a mixture model for individual patients. It swaps the latent variable assignments for two components, evaluates the log-acceptance ratio, and accepts or rejects the proposed changes based on the Metropolis criterion.

It returns the latent variable responsibilities before the sampling step and the log-probability of the model before the sampling step.

Note

  • The sampler works by proposing a swap of responsibilities between two components for each patient and calculating the acceptance ratio using the patient-specific mixture likelihoods.

  • Accepted swaps are updated in the latent variable matrix under the header accepted_position.

  • The current and new log-probabilities are computed using the provided log_prob_fn.

  • This function is designed for a full AIP algorithm but is not used due to long computation times.

lymixture.em.aip_sampling_algorithm(model: LymphMixture, ip_rounds: int = 4000, n_steps_params: int = 1, temperature_schedule: Callable[[int], float] | None = None, params_filename: str = '../../params_samples.hdf5') dict[str, list][source]#

Perform Alternating Iterative Posterior (AIP) sampling for a mixture model.

This function alternates between sampling latent variables and model parameters to approximate the posterior distribution of a mixture model. The AIP algorithm integrates Metropolis-Hastings (MH) sampling for latent variables and a parameter sampler initialized with emcee. This is computationally intensive and may take a long time to converge and is therefore only used for toy problems.

Returns:

  • “params_samples” (list): Samples of model parameters.

  • ”latent_samples” (list): Samples of latent variables.

  • ”complete_likelihoods” (list): Complete data log-llhs across iterations.

  • ”incomplete_likelihoods” (list): Incomplete data log-llhs across iterations.

  • ”number_of_swaps” (list): Number of swaps in latent variables btw. iterations.

Return type:

A dictionary containing