DensityOutliersTransform

class DensityOutliersTransform(in_column: str, window_size: int = 15, distance_coef: float = 3, n_neighbors: int = 3, distance_func: typing.Callable[[float, float], float] = <function absolute_difference_distance>)[source]

Bases: etna.transforms.outliers.base.OutliersTransform

Transform that uses get_anomalies_density() to find anomalies in data.

Warning

This transform can suffer from look-ahead bias. For transforming data at some timestamp it uses information from the whole train part.

Create instance of DensityOutliersTransform.

Parameters
  • in_column (str) – name of processed column

  • window_size (int) – size of windows to build

  • distance_coef (float) – factor for standard deviation that forms distance threshold to determine points are close to each other

  • n_neighbors (int) – min number of close neighbors of point not to be outlier

  • distance_func (Callable[[float, float], float]) – distance function

Inherited-members

Methods

detect_outliers(ts)

Call get_anomalies_density() function with self parameters.

fit(df)

Find outliers using detection method.

fit_transform(df)

May be reimplemented.

inverse_transform(df)

Inverse transformation.

transform(df)

Replace found outliers with NaNs.

detect_outliers(ts: etna.datasets.tsdataset.TSDataset) Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]][source]

Call get_anomalies_density() function with self parameters.

Parameters

ts (etna.datasets.tsdataset.TSDataset) – dataset to process

Returns

dict of outliers in format {segment: [outliers_timestamps]}

Return type

Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]]