jsonvectorizer.vectorizers.OneHotVectorizer

class jsonvectorizer.vectorizers.OneHotVectorizer(min_f=1, min_categories=1, lowercase=True)

Vectorizer for categorical values.

One-hot encoding using scikit-learn’s OneHotEncoder.

Parameters:
min_f : int or float, optional (default=1)

Ignores categories sparser than this threshold. An integer is taken as an absolute count, and a float indicates the proportion of n_total passed to the fit() method.

min_categories : int, optional (default=1)

Does not generate any features if the number of extracted categories is lower than this threshold.

lowercase : bool, optional(default=True)

Whether to convert strings to lowercase.

Raises:
ValueError

If min_f is not a positive number, or if min_categories is not a positive integer.

Attributes:
feature_names_ : list of str

Methods

fit(self, values[, n_total]) Fit vectorizer to the provided data
fit_transform(self, values, \*\*fit_params) Fit vectorizer to the provided data, then transform it
get_params(self[, deep]) Get parameters for this estimator.
set_params(self, \*\*params) Set the parameters of this estimator.
transform(self, values) Transform values and return the resulting feature matrix
fit(self, values, n_total=None, **kwargs)

Fit vectorizer to the provided data

Parameters:
values : array-like, [n_samples]
n_total : int or None, optional (default=None)

Total Number of documents that values are extracted from. If None, defaults to len(values).

**kwargs:

Ignored keyword arguments.

Returns:
self or None

Returns None if no features were generated, otherwise returns self.

fit_transform(self, values, **fit_params)

Fit vectorizer to the provided data, then transform it

Parameters:
values : array-like, [n_samples]
**fit_params

Keyword arguments, passed to the fit() method.

Returns:
X : ndarray, [n_samples, n_features]
get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.

transform(self, values)

Transform values and return the resulting feature matrix

Parameters:
values : array-like, [n_samples]
Returns:
X : sparse matrix, shape [n_samples, n_features]
Raises:
NotFittedError

If the vectorizer has not yet been fitted.