jsonvectorizer.vectorizers.TimestampVectorizer

class jsonvectorizer.vectorizers.TimestampVectorizer(bins, min_f=1)

Vectorizer for timestamps

Bins data into the specfied number of equiprobable bins, or using the provded bin edges, and uses one-hot encoding to create a binary feature matrix. After binning, the resulting bins are processed from left to right, and are merged into their right neighbor until all bins contain at least the specified number of items. If necessary,

Parses and converts strings to unix timestamps, bins results into the specified number of equiprobable bins, or using the provided bin edges. and uses one-hot encoding to create a binary feature matrix. After binning, the resulting bins are processed from left to right, and are merged into their right neighbor until all bins contain at least the specified number of items. If necessary, the right-most bin is then merged into its left neighbor. Also, if at least min_f items are not valid timestamps, an additional bin (feature) is created for invalid timestamps.

Parameters:
bins : int or list

Number of bins to generate, or a list of timestamps to use as bin edges (excluding -inf and inf).

min_f : int or float, optional (default=1)

Minimum number of samples in each generated bin. An integer is taken as an absolute count, and a float indicates the proportion of n_total passed to the fit() method.

Raises:
ValueError

If min_f is not a positive number.

Attributes:
feature_names_ : list of str

Methods

fit(self, values[, n_total]) Fit vectorizer to the provided data
fit_transform(self, values, \*\*fit_params) Fit vectorizer to the provided data, then transform it
get_params(self[, deep]) Get parameters for this estimator.
set_params(self, \*\*params) Set the parameters of this estimator.
transform(self, values) Transform values and return the resulting feature matrix
fit(self, values, n_total=None, **kwargs)

Fit vectorizer to the provided data

Parameters:
values : array-like, [n_samples]
n_total : int or None, optional (default=None)

Total Number of documents that values are extracted from. If None, defaults to len(values).

**kwargs

Ignored keyword arguments.

Returns:
self or None

Returns self if at least two bins are generated, otherwise returns None.

fit_transform(self, values, **fit_params)

Fit vectorizer to the provided data, then transform it

Parameters:
values : array-like, [n_samples]
**fit_params

Keyword arguments, passed to the fit() method.

Returns:
X : ndarray, [n_samples, n_features]
get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.

transform(self, values)

Transform values and return the resulting feature matrix

Parameters:
values : array-like, [n_samples]
Returns:
X : sparse matrix, [n_samples, n_features]
Raises:
NotFittedError

If the vectorizer has not yet been fitted.