jsonvectorizer.vectorizers.StringVectorizer¶
-
class
jsonvectorizer.vectorizers.
StringVectorizer
(min_df=1, **kwargs)¶ Vectorizer for strings
Tokenization using scikit-learn’s
CountVectorizer
.Parameters: - min_df : int or float, optional (default=1)
When using tokenization, ignore terms that have a document frequency strictly lower than this threshold. An integer is taken as an absolute count, and a float indicates the proportion of n_total passed to the
fit()
method.- **kwargs
Passed to scikit-learn’s
CountVectorizer
class for initialization.
Raises: - ValueError
If min_df is not a positive number.
Attributes: - feature_names_ : list of str
Methods
fit
(self, values[, n_total])Fit vectorizer to the provided data fit_transform
(self, values, \*\*fit_params)Fit vectorizer to the provided data, then transform it get_params
(self[, deep])Get parameters for this estimator. set_params
(self, \*\*params)Set the parameters of this estimator. transform
(self, values)Transform values and return the resulting feature matrix -
fit
(self, values, n_total=None, **kwargs)¶ Fit vectorizer to the provided data
Parameters: - values : array-like, [n_samples]
- n_total : int or None, optional (default=None)
Total Number of documents that values are extracted from. If None, defaults to
len(values)
.- **kwargs:
Ignored keyword arguments.
Returns: - self or None
Returns None if no features were generated, otherwise returns self.
-
fit_transform
(self, values, **fit_params)¶ Fit vectorizer to the provided data, then transform it
Parameters: - values : array-like, [n_samples]
- **fit_params
Keyword arguments, passed to the
fit()
method.
Returns: - X : ndarray, [n_samples, n_features]
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : object
Estimator instance.
-
transform
(self, values)¶ Transform values and return the resulting feature matrix
Parameters: - values : array-like, [n_samples]
Returns: - X : sparse matrix, shape [n_samples, n_features]
Raises: - NotFittedError
If the vectorizer has not yet been fitted.