I would like not to have to specify one of the fit_params of a scikit-learn pipeline when calling .fit() on the pipeline but to have one of the p...

I want to join predictions from a model and the input data used by sklearn in Python. The code is
x_train, x_test, y_train, y_test = train_test_...

I have a data set that is has 48000 rows and 24 columns, each column has been normalized so that it is a value between 0 and 1.
I tried to clust...

I have a LSI model stored and the model is getting stored as model.pkl and model.pkl.projection.
However, when I try to load the model the loadi...

If I want the classifier to be SVM (using scikit-learn), how can I modify the 'clf' variable such that the svm classifier used for feature rankin...

I'm trying to solve a linear regression problem and I'm using the LinearRegression() function from sklearn. Is it possible to display the weights...

sklearn provides transform() method to Apply one-hot encoder.
to use transform() method, fit_transform() is needed before calling transform() me...

I'm really having trouble trying to get this project of mine up and running, but I'm remaining resilient and I think I'm close!
I'm trying to c...

I am looking to train either a random forest or gradient boosting algorithm using sklearn. The data I have is structured in a way that it has a v...

I am working on a text-classification problem and have found that SVM is performing best for my text-classification problem. However, I did my ex...

My issue concerns the Value Error raised by SMOTE class.
Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
# imbala...

I'm pretty new to data science and did some of the courses on codecademy and sololearn. I'm having an issue with graphviz and sklearn. It seems t...

I'm running a Random Forest model with extensive cross-validation and then comparing the grid.best_scorer which I believe is the mean_test_score...

I have a data frame with observations
data = [['red', 1, 0.2], ['blue', 1, 0.5], ['green', 2, 0.8], ['blue', 2, 0.55], ['blue', 2, 0.52], ['red...

I am doing a linear regression with scikit-learn in Python3. I have an array of x and y data and want to implement a linear regression using a 3r...

In all of my scripts where I use packages dependent on scipy (such as sklearn and statsmodels) I receive this ImportError.
I uninstalled Anaco...

x_train, x_test, y_train, y_test =
sklearn.model_selection.train_test_split(X, y, test_size=0.2,
shuffle=False)
return(x_train,...

I had this sequence of codes
from sklearn.feature_extraction.text import TfidfVectorizer
tfidfconverter = TfidfVectorizer(max_features=900, min_...

I have created custom classes that want to use with scikit-learn Pipelines and Feature-Unions.
Each class takes as input a data frame with 2 co...

I have a csv file with its data in a particular hierarchical structure. While I can load it into a pandas data frame, I would prefer to have a s...

So, first of all, I'm relatively new to Python so I'm not sure how to achieve my task. I was following an online tutorial on how to plot a decisi...

I want to predict purchases. I use multiple linear regression and as you know i need R square. But when i wrote OLS.fit got error.
I used before...

I was using countVectorizer like this:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_features=2...

I'm building a simple spam filter using SciKit, and I'm a bit unsure with my results. I have a dataset that has around 5000 rows of data, the las...

I have a dataframe pd with two columns, X and y.
In pd[y] I have integers from 1 to 10 inclusive. However they have different frequencies:
df[y...

