jsonvectorizer.vectorizers.TimestampVectorizer¶
-
class
jsonvectorizer.vectorizers.
TimestampVectorizer
(bins, min_f=1)¶ Vectorizer for timestamps
Bins data into the specfied number of equiprobable bins, or using the provded bin edges, and uses one-hot encoding to create a binary feature matrix. After binning, the resulting bins are processed from left to right, and are merged into their right neighbor until all bins contain at least the specified number of items. If necessary,
Parses and converts strings to unix timestamps, bins results into the specified number of equiprobable bins, or using the provided bin edges. and uses one-hot encoding to create a binary feature matrix. After binning, the resulting bins are processed from left to right, and are merged into their right neighbor until all bins contain at least the specified number of items. If necessary, the right-most bin is then merged into its left neighbor. Also, if at least min_f items are not valid timestamps, an additional bin (feature) is created for invalid timestamps.
Parameters: - bins : int or list
Number of bins to generate, or a list of timestamps to use as bin edges (excluding -inf and inf).
- min_f : int or float, optional (default=1)
Minimum number of samples in each generated bin. An integer is taken as an absolute count, and a float indicates the proportion of n_total passed to the
fit()
method.
Raises: - ValueError
If min_f is not a positive number.
Attributes: - feature_names_ : list of str
Methods
fit
(self, values[, n_total])Fit vectorizer to the provided data fit_transform
(self, values, \*\*fit_params)Fit vectorizer to the provided data, then transform it get_params
(self[, deep])Get parameters for this estimator. set_params
(self, \*\*params)Set the parameters of this estimator. transform
(self, values)Transform values and return the resulting feature matrix -
fit
(self, values, n_total=None, **kwargs)¶ Fit vectorizer to the provided data
Parameters: - values : array-like, [n_samples]
- n_total : int or None, optional (default=None)
Total Number of documents that values are extracted from. If None, defaults to
len(values)
.- **kwargs
Ignored keyword arguments.
Returns: - self or None
Returns self if at least two bins are generated, otherwise returns None.
-
fit_transform
(self, values, **fit_params)¶ Fit vectorizer to the provided data, then transform it
Parameters: - values : array-like, [n_samples]
- **fit_params
Keyword arguments, passed to the
fit()
method.
Returns: - X : ndarray, [n_samples, n_features]
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : object
Estimator instance.
-
transform
(self, values)¶ Transform values and return the resulting feature matrix
Parameters: - values : array-like, [n_samples]
Returns: - X : sparse matrix, [n_samples, n_features]
Raises: - NotFittedError
If the vectorizer has not yet been fitted.