DendroDistance

class turbustat.statistics.DendroDistance(cube1, cube2, min_deltas=None, nbins='best', min_features=100, fiducial_model=None, dendro_params=None)[source] [edit on github]

Bases: object

Calculate the distance between 2 cubes using dendrograms. The number of features vs. minimum delta is fit to a linear model, with an interaction term o gauge the difference. The distance is the t-statistic of that parameter. The Hellinger distance is computed for the histograms at each minimum delta value. The distance is the average of the Hellinger distances.

Parameters:

cube1 : numpy.ndarray or str

Data cube. If a str, it should be the filename of a pickle file saved using Dendrogram_Stats.

cube2 : numpy.ndarray or str

Data cube. If a str, it should be the filename of a pickle file saved using Dendrogram_Stats.

min_deltas : numpy.ndarray or list

Minimum deltas of leaves in the dendrogram.

nbins : str or float, optional

Number of bins for the histograms. ‘best’ sets that number using the square root of the average number of features between the histograms to be compared.

min_features : int, optional

The minimum number of features necessary to compare the histograms.

fiducial_model : Dendrogram_Stats

Computed dendrogram and statistic values. Use to avoid re-computing.

dendro_params : dict or list of dicts, optional

Further parameters for the dendrogram algorithm (see www.dendrograms.org for more info). If a list of dictionaries is given, the first list entry should be the dictionary for cube1, and the second for cube2.

Methods Summary

Methods Documentation

distance_metric(verbose=False)[source] [edit on github]
histogram_stat(verbose=False)[source] [edit on github]

Computes the distance using histograms.

Parameters:

verbose : bool, optional

Enables plotting.

numfeature_stat(verbose=False)[source] [edit on github]

Calculate the distance based on the number of features statistic.

Parameters:

verbose : bool, optional

Enables plotting.