Posterior Sampling Likelihoods¶
Class providing likelihood functions for Bayesian posterior sampling simulations—the log-probability \(\ln\mathcal{L}(\boldsymbol{\theta})\) that the observational data would be observed given model parameters \(\boldsymbol{\theta}\). Implementations compare Galacticus model predictions with observational constraints (e.g.stellar mass functions, sizes, colors) and return the log-likelihood and its variance. The likelihood is combined with the prior from modelParameterClass objects to drive the posterior sampler toward the best-fit parameter region.
Methods¶
evaluate→double precisionEvaluate the log-likelihood \(\ln\mathcal{L}\) at the current
simulationStategiven the active and inactive model parameter lists, returning the log-likelihood value and optionally its variance for stochastic estimators.class (posteriorSampleStateClass ), intent(inout) :: simulationStatetype (modelParameterList ), intent(inout), dimension(:) :: modelParametersActive_, modelParametersInactive_class (posteriorSampleConvergenceClass), intent(inout) :: simulationConvergencedouble precision , intent(in ) :: temperature, logLikelihoodCurrent, logPriorCurrent, logPriorProposedreal , intent(inout) :: timeEvaluatedouble precision , intent( out), optional :: logLikelihoodVariancelogical , intent(inout), optional :: forceAcceptance
willEvaluate→logicalReturns true if the likelihood will be evaluated for this state.
class (posteriorSampleStateClass ), intent(inout) :: simulationStatetype (modelParameterList ), intent(in ), dimension(:) :: modelParameters_class (posteriorSampleConvergenceClass), intent(inout) :: simulationConvergencedouble precision , intent(in ) :: temperature, logLikelihoodCurrent, logPriorCurrent, logPriorProposed
functionChanged→voidRespond to possible changes in the likelihood function.
restore→voidLog a report on convergence state to the given file unit.
double precision, intent(in ), dimension(:) :: simulationStatedouble precision, intent(in ) :: logLikelihood
posteriorSampleLikelihoodBaseParameters¶
A posterior sampling likelihood class which allows arbitrary modification of a base parameter object.
Methods
initialize— Initialize pointers into the base parameters object.update— Update values in the base parameters object.
posteriorSampleLikelihoodGalaxyPopulation¶
A posterior sampling likelihood class which evaluates the likelihood of Galacticus galaxy formation model outputs against observational constraints, supporting parallelized model evaluation across MPI process groups. The number of collaborative MPI groups is set by [countCollaborativeGroups], with analysis storage controlled by [storeResults].
Parameters
[baseParametersFileName](string) — The path to the XML parameter file that provides the base configuration for each Galacticus model evaluation, to which parameter changes from the posterior sampler are then applied.[outputAnalyses](boolean; default.false.) — If true, results of the analyses on each step will be stored to the output file.[setOutputGroup](boolean; default.false.) — If true, set the primary output group for each step.[reportEvaluationTimes](boolean; default.false.) — If true, report the time taken to evaluate each model.[countCollaborativeGroups](integer; default1) — The number of groups into which MPI processes should be split for the purpose of model evaluation. Each group will be populated with MPI processes. Processes within a group will collaborate on running a single model evaluation. Setting this parameter to a negative value will result in using the maximum possible number of groups (equal to the number of MPI processes).[firstComeFirstServed](boolean; default.false.) — If true, collaborative groups will take the next available evaluation and process it. Otherwise, each group will be assigned a fixed set of evaluations in advance.[doPing](boolean; default.false.) — If true, the master MPI process will attach to thecalculationResetevent and ping the MPI counter. This can help to ensure that the counter updates regularly.[reportFileName](boolean; default.false.) — If true, report the base parameter file name being evaluated.[reportState](boolean; default.false.) — If true, log a summary of the model parameter state to standard output at the beginning of each likelihood evaluation.[evolveForestsVerbosity](string; one ofsilent,standard,working,warn,info,debug; defaultenumerationVerbosityLevelDecode(displayVerbosity(),includePrefix=.false.)) — The verbosity level to use while performing evolve forests tasks.[failedParametersFileName](string; default./failedParameters.xml) — The name of the file to which parameters of failed models should be written.[changeParametersFileNames](string) — The names of files containing parameter changes to be applied.
posteriorSampleLikelihoodGaussianRegression¶
The likelihood is computed either using another likelihood function (the “simulator”) or via Gaussian regression emulation of that simulator. The details of the emulation algorithm are specified by the following sub-parameters:
emulatorRebuildCountThe number of simulator evaluations from which the emulator is built;
polynomialOrderThe order of the polynomial fitted to the simulator likelihoods prior to Gaussian regression;
sigmaBufferSee below;
logLikelihoodBufferSee below;
logLikelihoodErrorToleranceSee below;
reportCountThe number of likelihood evaluations between successive reports on the status of the emulator;
emulateOutliersIf true, then outlier chains are always emulated post-convergence (this is safe if such chains are not used in constructing proposals for non-outlier chains);
simulatorLikelihoodContains another likelihood function definition which will be used to construct the simulator.
In detail, this likelihood function first collects emulatorRebuildCount likelihood evaluations from the simulator. It then fits a polynomial of order polynomialOrder and of dimension equal to the dimension of the state vector to the simulated likelihoods. Gaussian regression is performed on the residuals of the simulated likelihoods after this polynomial fit is removed. Once the emulator has been built in this way every second simulated state is discarded, and accumulation of new simulated states continues. Once emulatorRebuildCount simulated states have once again been accumulated a new simulator is built. This ensures that the emulator does not lose all information used in building the previous emulatorfootnoteThis would be unfortunate as the second emulator to be built would then contain information on only those regions of the state space that were poorly emulated before., instead information from older emulators decays exponentially.
Once an emulator has been built, on each successive likelihood evaluation the emulated log-likelihood \(\log\mathcal{L}_\mathrm{e}\) and its error estimate \(\sigma_{\log\mathcal{L}_\mathrm{e}}\) are computed. The emulated likelihood is then returned if:
where \(N=\)sigmaBuffer, \(\Delta\log\mathcal{L}=\)logLikelihoodBuffer, \(T\) is the temperature, \(\log\mathcal{L}\) is the current log-likelihood, \(\log\mathcal{P}\) is the current log-prior probability, and \(\log\mathcal{P}^\prime\) is the proposed log-prior probability, or if
where \(\sigma_{\log\mathcal{L}}=\)logLikelihoodErrorTolerance, otherwise the simulator is used to compute the exact likelihood. In this way, the emulated likelihood is used if it is sufficiently below the current likelihood that, even accounting for the emulation error, transition to the new state is highly unlikely, or if the error on the likelihood emulation is sufficiently small that it will not have a significant effect on the transition probability to the proposed state.
If verbosity is set to info or greater than a report will be issued every reportCount evaluations. The report will give the proportions of simulated vs. emulated evaluations. Additionally, during the evaluation where the report is issued, both the emulated and simulated log-likelihoods are evaluated and are tested to see if they lie within \(3 \sigma_{\log\mathcal{L}_\mathrm{e}}\) of each other. The rate of failures (i.e. where the two differ by more than this amount) is then reported.
Methods
separation— Determine the separation between two state vectors.emulate— Evaluate the model emulator.reset— Reset the iterator object to the start of its sequence.iterate— Move to the next iteration of polynomial coefficient indices. Returns true if successful. If no more iterations are available, returns false.index— Return the \(i^\mathrm{th}\) index of the polynomial coefficient.currentOrder— Return the current order of the polynomial coefficient.counter— Return an incremental counter (i.e. begins at \(0\) and increases by \(1\) on each iteration).
Parameters
[emulatorRebuildCount](integer; default100) — The number of steps between rebuilds of the emulator.[polynomialOrder](integer; default2) — The order of the polynomial to fit to the likelihood surface.[sigmaBuffer](real; default3.0d0) — The buffer size in units of the likelihood error to use when deciding whether to emulate the likelihood.[logLikelihoodBuffer](real; default10.0d0) — The buffer size in log-likelihood to use when deciding whether to emulate the likelihood.[logLikelihoodErrorTolerance](real; default0.0d0) — The tolerance on the likelihood error to accept when deciding whether to emulate the likelihood.[reportCount](integer; default10) — The number of steps between reports of emulator performance.[emulateOutliers](boolean; default.true.) — If true, then outlier chains are always emulated once the simulation is converged.[assumeZeroVarianceAtZeroLag](boolean; default.false.) — If true, the variogram model is forced to go to zero for states with zero separation (as expected if the likelihood model being emulated is fully deterministic). Otherwise, the variance at zero separation is treated as a free parameter.[dumpEmulatorFileRoot](string) — The name of a file to which emulator internal state will be dumped. (If empty, no dump occurs.)[dummyEmulator](boolean; default.false.) — If true, then the emulator is constructed, and performance measured, but likelihoods are always simulated directly.
posteriorSampleLikelihoodHaloMassFunction¶
A posterior sampling likelihood class which evaluates the likelihood of a modeled dark matter halo mass function against observed data, supporting Poisson or multivariate normal statistics. The target data file is set by [fileName], with the evaluation redshift, mass limits, and minimum halo count per bin set by [redshift], [massMinimum], and [countMinimum].
Methods
descriptorSpecial— Handle adding special parameters to the descriptor.
Parameters
[baseParametersFileName](string) — The path to the XML parameter file that provides the base configuration for each model evaluation, to which parameter changes from the posterior sampler are then applied.[pathSamples](string; defaultnone) — The path into which the sampled mass functions should be written. Ifnone, then samples are not written.[appendSamples](boolean; default.false.) — If true, samples will be appended to the file. Otherwise, the file is overwritten.[fileNames](string) — The names of the files containing the halo mass functions.[redshifts](real) — The redshifts at which to evaluate the halo mass functions.[massRangeMinimum](real) — The minimum halo mass to include in the likelihood evaluation.[massRangeMaximum](real; defaulthuge(0.0d0)) — The maximum halo mass to include in the likelihood evaluation.[binCountMinimum](integer) — The minimum number of halos per bin required to permit bin to be included in likelihood evaluation.[likelihoodPoisson](boolean; default.false.) — If true, likelihood is computed assuming a Poisson distribution for the number of halos in each bin (with no covariance between bins). Otherwise a multivariate normal is assumed when computing likelihood.[varianceFractionalModelDiscrepancy](real; default0.0d0) — The fractional variance due to model discrepancy.[allowEmptyMassFunction](boolean; default.false.) — If true, empty mass functions (i.e. those with no useable bins) are allowed, and return \(\log \mathcal{L}=0\).[report](boolean; default.false.) — If true, give detailed reporting on likelihood calculations.[includeCorrelations](boolean; default.false.) — If true, account for correlations between halo mass functions measured at different redshifts.[binAverage](boolean; default.true.) — If true, the mass function is averaged over each bin.[changeParametersFileNames](string) — The names of files containing parameter changes to be applied.
posteriorSampleLikelihoodHeated¶
The likelihood of the provided class is heated to a temperature \(T=\)[temperature], such that
Parameters
[temperature](real) — The (dimensionless) temperature to which to heat the provided likelihood.
posteriorSampleLikelihoodIndependentLikelihoods¶
A posterior sampling likelihood class which combines likelihoods from one or more other posteriorSampleLikelihoodClass classes that are assumed to be independent (i.e. the \(\log \mathcal{L}\) of the models are simply summed to find the final likelihood).
Since each posteriorSampleLikelihoodClass class may require a different set of parameters a [parameterMap] parameter may be specified. If present, the number of [parameterMap] parameters must equal the number of [posteriorSampleLikelihood] parameters. Each such parameter should give a (space-separated) list of the names of parameters (as defined in the modelParameterActive model parameter class) which should be passed to the corresponding [posteriorSampleLikelihood]. If no [parameterMap] parameters are given then all parameters are passed to each posteriorSampleLikelihoodClass class.
Similarly, a set of parameterInactiveMap parameters may be given, to specify which (if any, an empty value is permissible) of the inactive parameters specified by modelParameterInactive should be passed to the corresponding [posteriorSampleLikelihood]. If no [parameterInactiveMap] then no inactive parameters are passed to any of the [posteriorSampleLikelihood] classes.
Optionally, a parameter [logLikelihoodAccept] may be specified. Once the likelihood of a chain reaches this value, no further evaluations of the likelihood will be made - the chain is assumed to be sufficiently likely that it is “acceptable”.
Methods
tabulate— Tabulate the virial density contrast as a function of mass and time.restoreTable— Restore a tabulated solution from file.storeTable— Store a tabulated solution to file.
Parameters
[orderRotation](string; one ofnone,byRank,byRankOnNode; defaultnone) — The order in which evaluation of likelihoods should be rotated as a function of process number.[logLikelihoodAccept](defaulthuge(0.0d0)) — The log-likelihood which should be “accepted”—once the log-likelihood reaches this value (or larger) no further updates to the chain will be made.[report](boolean; default.false.) — If true, report on the log-likelihood obtained.
posteriorSampleLikelihoodIndpndntLklhdsSqntl¶
A posterior sampling likelihood class which sequentially combines other likelihoods assumed to be independent. This class begins by evaluating the first likelihood. If the likelihood is negative, then it is immediately returned, without evaluation of any further likelihoods. If it is positive, then the next likelihood is evaluated and the same conditions applied. This process repeats until either a negative likelihood is found, or all likelihoods are evaluated. Once a given likelihood has been evaluated it will be evaluated on all subsequent calls. Additionally, when a new likelihood is evaluated for the first time, acceptance of the proposed state will be forced. This class therefore allows a sequence of likelihoods to be specified which must be sequentially made sufficiently “good” before evaluating the next. The approach is intended to allow crude, but rapid constraints to be placed on parameters before progressing to more detailed, but slow to evaluate constraints.
Parameters
[finalLikelihoodFullEvaluation](boolean; default.true.) — If true the final likelihood is evaluated fully, and not treated as a “lock in” likelihood.[restoreLevels](boolean; default.true.) — If true the level reached by each chain is restored on restarts. Otherwise, the level is initialized to zero (which may be useful for stochastic likelihoods).[orderRotation](string; one ofnone,byRank,byRankOnNode; defaultnone) — The order in which evaluation of likelihoods should be rotated as a function of process number. (inherited fromposteriorSampleLikelihoodIndependentLikelihoods)[logLikelihoodAccept](defaulthuge(0.0d0)) — The log-likelihood which should be “accepted”—once the log-likelihood reaches this value (or larger) no further updates to the chain will be made. (inherited fromposteriorSampleLikelihoodIndependentLikelihoods)[report](boolean; default.false.) — If true, report on the log-likelihood obtained. (inherited fromposteriorSampleLikelihoodIndependentLikelihoods)
posteriorSampleLikelihoodMassFunction¶
The likelihood is computed as
where \(\mathcal{C}\) is the covariance matrix, and \(\Delta_i = w_i^\mathrm{model} - w_i^\mathrm{obs}\), \(w_i^\mathrm{model}\) is the computed mass function at the \(i^\mathrm{th}\) separation, and \(w_i^\mathrm{obs}\) is the observed mass function at the \(i^\mathrm{th}\) separation. The mass function is computed using the halo model and the parameterized conditional galaxy mass function of Behroozi et al. (2010). The details of the mass function calculation are specified by the following subparameters:
haloMass(Min|Max)imumThe minimum/maximum halo mass over which to integrate in the halo model;
redshift(Min|Max)imumThe minimum/maximum redshift over which to integrate in the halo model;
massFunctionFileNameThe name of an HDF5 file containing the observed mass function and its covariance matrix.
The HDF5 file specified by the massFunctionFileName element should contain a mass dataset, giving the masses at which the mass function is measured (in units of \(\mathrm{M}_\odot\)), a massFunctionObserved dataset giving the observed values of the mass function at those masses (in units of Mpc\(^{-3}\) per \(\log M\)), and a covariance dataset, giving the covariance of the mass function (in units of Mpc\(^{-6}\)).
Parameters
[massFunctionFileName](string) — The name of the file containing the mass function.[haloMassMinimum](real) — The minimum halo mass over which to integrate.[haloMassMaximum](real) — The maximum halo mass over which to integrate.[redshiftMinimum](real) — The minimum redshift over which to integrate.[redshiftMaximum](real) — The maximum redshift over which to integrate.[useSurveyLimits](boolean) — If true, limit redshift integration range based on survey geometry limits.[modelSurfaceBrightness](boolean) — If true, model the effects of surface brightness incompleteness on the mass function.[surfaceBrightnessLimit](real) — The limiting surface brightness to which to integrate.
posteriorSampleLikelihoodMltiVrtNormalStochastic¶
The likelihood is identical to that of the multivariateNormal class, except that the likelihood function is evaluated stochastically. In addition to the parameter of the multivariateNormal class, two additional parameters are required and are specified within the likelihood element using:
<realizationCount>4000</realizationCount>
<realizationCountMinimum>10</realizationCountMinimum>
When evaluating the likelihood, the state vector is set equal to
where \(N=\)realizationCount and \(U(x)\) is a uniform random deviate in the range \(0\) to \(x\). This results in a variance in \(S^\prime_i\) of \(S_i^2/3N\). This variance is added to the covariance used in evaluating the likelihood. When evaluating the likelihood at a higher temperature the number of realizations is reduced (which increases the covariance, which has the same effect as increasing the temperature) to speed computation, and the likelihood corrected for this fact. The number of realizations is reduced to \(N/T\), but never allowed to fall below realizationCountMinimum.
Parameters
[means](real) — The mean of the multivariate normal distribution.[covariance](real) — The covariance matrix for the of the multivariate normal distribution.[realizationCount](integer) — The number of realizations of the stochastic likelihood to compute at unit temperature.[realizationCountMinimum](integer) — The minimum number of realizations of the stochastic likelihood to compute at higher temperatures.
posteriorSampleLikelihoodMultivariateNormal¶
The likelihood is a simple multivariate Gaussian, intended primarily for testing purposes. The distribution parameters are specified within the likelihood element using:
<mean>0.45 0.50</mean>
<covariance>
<row>1.0e-4 -0.9e-4</row>
<row>-0.9e-4 1.0e-4</row>
</covariance>
where the mean element gives the mean vector of \(N\) elements, and the covariance element contains \(N\) row elements each containing a vector of \(N\) elements giving a single row of the covariance matrix. The likelihood is then:
where \(\Delta = \theta - \bar{\theta}\), \(\theta\) is the state, \(\bar{\theta}\) is the mean, and \(\mathcal{C}\) is the covariance matrix.
Methods
tabulate— Tabulate the virial density contrast as a function of mass and time.restoreTable— Restore a tabulated solution from file.storeTable— Store a tabulated solution to file.
Parameters
[means](real) — The mean of the multivariate normal distribution.[covariance](real) — The covariance matrix for the of the multivariate normal distribution.
posteriorSampleLikelihoodPosteriorAsPrior¶
The likelihood is computed either using another likelihood function (the “wrapped” likelihood), while including in the likelihood an estimate of the posterior probability of a previous simulation. This effectively allows the posterior of the previous simulation to be used as a prior on the current simulation. The details of the likelihood are specified by the follow subparameters:
chainBaseNameThe base name for the old set of MCMC chains to use as the new prior;
neighborCountThe number of neighbor points to use in kernel density estimation of the posterior probability;
toleranceTolerance used in finding nearest neighbors;
wrappedLikelihoodContains another likelihood function definition which will be used to provide the current likelihood.
This method uses the Approximate Nearest Neighbor library to locate neighborCount nearest neighbor points in the set of converged states found in the given chains. The tolerance element determines the accuracy of nearest neighbor finding (see the Approximate Nearest Neighbor documentation for details).When finding nearest neighbors in the MCMC chains, parameters are mapped using whatever mappings are currently active, and distances in each dimension (as used in the metric to determine nearest neighbors) are scaled by the root-variance in that parameter in the converged MCMC chains. The posterior likelihood of the MCMC chains is then estimated from the nearest neighbors using kernel density estimation with a Gaussian kernel with bandwidth equal to the distance to the furthest of the nearest neighbors.
Methods
initialize— Initialize the posterior-as-prior likelihood.
Parameters
[chainBaseName](string) — The base name of the MCMC chain files to read.[neighborCount](integer) — The number of nearest neighbors to use when estimating the posterior likelihood.[tolerance](real) — Tolerance to use when estimating the posterior likelihood.[exclusions](integer; default[integer :: ]) — List of parameter indices to exclude from the posterior likelihood calculation.
posteriorSampleLikelihoodPrjctdCorrelationFunction¶
The likelihood is computed as
where \(\mathcal{C}\) is the covariance matrix, and \(\Delta_i = w_i^\mathrm{model} - w_i^\mathrm{obs}\), \(w_i^\mathrm{model}\) is the computed projected correlation function at the \(i^\mathrm{th}\) separation, and \(w_i^\mathrm{obs}\) is the observed projected correlation function at the \(i^\mathrm{th}\) separation. The projected correlation function is computed using the halo model and the parameterized conditional galaxy mass function of Behroozi et al. (2010). The details of the projected correlation function calculation are specified by the following subparameters:
haloMass(Min|Max)imumThe minimum/maximum halo mass over which to integrate in the halo model;
redshift(Min|Max)imumThe minimum/maximum redshift over which to integrate in the halo model;
projectedCorrelationFunctionFileNameThe name of an HDF5 file containing the observed projected correlation function and its covariance matrix.
The HDF5 file specified by the projectedCorrelationFunctionFileName element should contain a separation dataset, giving the separations at which the projected correlation function is measured (in units of Mpc), a projectedCorrelationFunctionObserved dataset giving the observed values of the projected correlation function at those separations (in units of Mpc), and a covariance dataset, giving the covariance of the projected correlation function (in units of Mpc\(^2\)).
Parameters
[haloMassMinimum](real) — The minimum halo mass over which to integrate.[haloMassMaximum](real) — The maximum halo mass over which to integrate.[lineOfSightDepth](real) — The line of sight depth over which to integrate.[halfIntegral](boolean) — Set totrueif the projected correlation function is computed as \(w_\mathrm{p}(r_\mathrm{p})=\int_0^{+\pi_\mathrm{max}} \xi(r_\mathrm{p},\pi) \mathrm{d} \pi\), instead of the usual \(w_\mathrm{p}(r_\mathrm{p})=\int_{-\pi_\mathrm{max}}^{+\pi_\mathrm{max}} \xi(r_\mathrm{p},\pi) \mathrm{d} \pi\).[fileName](string) — The name of the file containing the target projected correlation function.
posteriorSampleLikelihoodSEDFit¶
A posterior sampling likelihood class which evaluates the likelihood of observed broadband spectral energy distributions (SEDs) given modeled stellar populations, including dust attenuation effects. Observed magnitudes and uncertainties are specified via [magnitudes] and [errors], with dust model and star formation history burst count controlled by [dustType] and [burstCount].
Parameters
[magnitude](real) — The magnitudes of the broad-band SED.[error](real) — The errors on the magnitudes of the broad-band SED.[filter](string) — The names of the filters in the broad-band SED.[system](string) — The photometric system (AB or Vega) of the broad-band SED.[burstCount](integer) — The number of bursts events to include in the star formation history.[dustType](integer; one ofnull,charlotFall2000,cardelli1989,gordon2003,calzetti2000,wittGordon2000) — The type of dust model to apply to the SED.[startTime](integer; one oftime,age) — The definition of start time (absolutetimeorage).
posteriorSampleLikelihoodSpinDistribution¶
A posterior sampling likelihood class which evaluates the likelihood of modeled dark matter halo spin parameter distributions against N-body simulation measurements, accounting for halo mass function weighting and N-body mass errors. The target spin distribution file is set by [fileName], with redshift, mass limits, and N-body particle count constraints specified by [redshift], [massMinimum], and [countMinimum].
Parameters
[fileName](string) — The name of the file containing the target spin distribution.[distributionType](string; one oflogNormal,bett2007) — The name of the spin distribution to use.[redshift](real) — The redshift at which to compute the spin distribution.[massParticle](real) — The mass of a particle in the N-body simulation.[massHaloMinimum](real) — The minimum halo mass over which to integrate.[particleCountMinimum](integer) — The minimum particle count used in N-body halos.[energyEstimateParticleCountMaximum](real) — The maximum number of N-body particles used in estimating halo energies.[logNormalRange](real; default100.0d0) — The multiplicative range of the log-normal distribution used to model the distribution of the mass and energy terms in the spin parameter. Specifically, the lognormal distribution is truncated outside the range \((\lambda_\mathrm{m}/R,\lambda_\mathrm{m} R\), where \(\lambda_\mathrm{m}\) is the measured spin, and \(R=\)[logNormalRange]