Questions tagged [scikit-learn]

0 votes
0 replies
Add fit_params in scikit-learn pipeline step
I would like not to have to specify one of the fit_params of a scikit-learn pipeline when calling .fit() on the pipeline but to have one of the p...
asked 4 months ago
0 votes
1 replies
How to join predictions with input data test in sklearn
I want to join predictions from a model and the input data used by sklearn in Python. The code is x_train, x_test, y_train, y_test = train_test_...
asked 4 months ago
0 votes
0 replies
Could someone provide more details on sklearn's kmean's attributes and verifying the function of each method?
I have a data set that is has 48000 rows and 24 columns, each column has been normalized so that it is a value between 0 and 1. I tried to clust...
asked 4 months ago
0 votes
0 replies
LSI Model fails to load the model
I have a LSI model stored and the model is getting stored as model.pkl and model.pkl.projection. However, when I try to load the model the loadi...
1 votes
0 replies
How can this feature ranking problem be implemented with Support Vector Classification?
If I want the classifier to be SVM (using scikit-learn), how can I modify the 'clf' variable such that the svm classifier used for feature rankin...
1 votes
1 replies
How can I display the weights and bias from LinearRegression()?
I'm trying to solve a linear regression problem and I'm using the LinearRegression() function from sklearn. Is it possible to display the weights...
0 votes
1 replies
sklearn.impute SimpleImputer: why does transform() need fit_transform() first?
sklearn provides transform() method to Apply one-hot encoder. to use transform() method, fit_transform() is needed before calling transform() me...
asked 4 months ago
2 votes
0 replies
Python 3 - ValueError: Found array with 0 sample(s) (shape=(0, 11)) while a minimum of 1 is required by MinMaxScaler
I'm really having trouble trying to get this project of mine up and running, but I'm remaining resilient and I think I'm close! I'm trying to c...
3 votes
1 replies
How to weigh data points with sklearn training algorithms
I am looking to train either a random forest or gradient boosting algorithm using sklearn. The data I have is structured in a way that it has a v...
asked 4 months ago
0 votes
1 replies
Finding The Most Relevant or Important Features for SVM using SGD (loss=hinge)
I am working on a text-classification problem and have found that SVM is performing best for my text-classification problem. However, I did my ex...
1 votes
0 replies
Python oversampling combine several samplers in a pipeline
My issue concerns the Value Error raised by SMOTE class. Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 # imbala...
0 votes
0 replies
Graphviz seems to think each row in column one is an attribute, can't solve
I'm pretty new to data science and did some of the courses on codecademy and sololearn. I'm having an issue with graphviz and sklearn. It seems t...
asked 4 months ago
0 votes
0 replies
Is my discrepancy between cross val and test scores problematic?
I'm running a Random Forest model with extensive cross-validation and then comparing the grid.best_scorer which I believe is the mean_test_score...
0 votes
1 replies
Stratified sampling in python, with constraint
I have a data frame with observations data = [['red', 1, 0.2], ['blue', 1, 0.5], ['green', 2, 0.8], ['blue', 2, 0.55], ['blue', 2, 0.52], ['red...
asked 4 months ago
1 votes
1 replies
In sklearn, how can I get which coefficient corresponds to which parameter in a polynomial linear regression?
I am doing a linear regression with scikit-learn in Python3. I have an array of x and y data and want to implement a linear regression using a 3r...
3 votes
0 replies
Using packages dependent on scipy throws an ImportError (DLL load failed) even with fresh Anaconda install
In all of my scripts where I use packages dependent on scipy (such as sklearn and statsmodels) I receive this ImportError. I uninstalled Anaco...
-1 votes
2 replies
Maximum number of iterations must be positive ERROR when using Logistic Regression (python)
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.2, shuffle=False) return(x_train,...
-1 votes
1 replies
Error: Unknown label type: 'unknown'…y_train values doesn't coincide with x_train values
I had this sequence of codes from sklearn.feature_extraction.text import TfidfVectorizer tfidfconverter = TfidfVectorizer(max_features=900, min_...
0 votes
0 replies
Custom Class raises error when trying to call fit_transform
I have created custom classes that want to use with scikit-learn Pipelines and Feature-Unions. Each class takes as input a data frame with 2 co...
0 votes
0 replies
Parsing from a hierarchical csv file
I have a csv file with its data in a particular hierarchical structure. While I can load it into a pandas data frame, I would prefer to have a s...
asked 4 months ago
0 votes
1 replies
How to plot a regression tree in Python
So, first of all, I'm relatively new to Python so I'm not sure how to achieve my task. I was following an online tutorial on how to plot a decisi...
-2 votes
0 replies
How can fit OLS correctly? I used str before. I could'nt statsmodels. Error: unsupported operand type(s) for -: 'str' and 'str'
I want to predict purchases. I use multiple linear regression and as you know i need R square. But when i wrote OLS.fit got error. I used before...
1 votes
2 replies
How to use countVectorizer to test new data after doing some training
I was using countVectorizer like this: from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(max_features=2...
1 votes
1 replies
Understanding how the bayesian filtering works in SciKit and improving accuracy
I'm building a simple spam filter using SciKit, and I'm a bit unsure with my results. I have a dataset that has around 5000 rows of data, the las...
asked 4 months ago
2 votes
1 replies
How to randomly drop rows in Pandas dataframe until there are equal number of values in a column?
I have a dataframe pd with two columns, X and y. In pd[y] I have integers from 1 to 10 inclusive. However they have different frequencies: df[y...
asked 4 months ago