jsonvectorizer.vectorizers.OneHotVectorizer¶
-
class
jsonvectorizer.vectorizers.
OneHotVectorizer
(min_f=1, min_categories=1, lowercase=True)¶ Vectorizer for categorical values.
One-hot encoding using scikit-learn’s
OneHotEncoder
.Parameters: - min_f : int or float, optional (default=1)
Ignores categories sparser than this threshold. An integer is taken as an absolute count, and a float indicates the proportion of n_total passed to the
fit()
method.- min_categories : int, optional (default=1)
Does not generate any features if the number of extracted categories is lower than this threshold.
- lowercase : bool, optional(default=True)
Whether to convert strings to lowercase.
Raises: - ValueError
If min_f is not a positive number, or if min_categories is not a positive integer.
Attributes: - feature_names_ : list of str
Methods
fit
(self, values[, n_total])Fit vectorizer to the provided data fit_transform
(self, values, \*\*fit_params)Fit vectorizer to the provided data, then transform it get_params
(self[, deep])Get parameters for this estimator. set_params
(self, \*\*params)Set the parameters of this estimator. transform
(self, values)Transform values and return the resulting feature matrix -
fit
(self, values, n_total=None, **kwargs)¶ Fit vectorizer to the provided data
Parameters: - values : array-like, [n_samples]
- n_total : int or None, optional (default=None)
Total Number of documents that values are extracted from. If None, defaults to
len(values)
.- **kwargs:
Ignored keyword arguments.
Returns: - self or None
Returns None if no features were generated, otherwise returns self.
-
fit_transform
(self, values, **fit_params)¶ Fit vectorizer to the provided data, then transform it
Parameters: - values : array-like, [n_samples]
- **fit_params
Keyword arguments, passed to the
fit()
method.
Returns: - X : ndarray, [n_samples, n_features]
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : object
Estimator instance.
-
transform
(self, values)¶ Transform values and return the resulting feature matrix
Parameters: - values : array-like, [n_samples]
Returns: - X : sparse matrix, shape [n_samples, n_features]
Raises: - NotFittedError
If the vectorizer has not yet been fitted.