Skip links

Predicting Stock Prices – Learn Python for Data Science #4

hello world it’s Suraj in this episode we’re going to build a stock price prediction graph using scikit-learn in 40 lines of Python have you ever wanted to get rich quick sir my models are very profitable I always retrain them to prevent overfitting many times a week three times a week those are rookie numbers in this racket really how often do you retrain right after a workout and then once right everyone oh the stock market allows you to buy and sell units of ownership in a company which we call stocks if the company’s profits go up you own some of those profits if they go down you lose profits with them it’s as simple as that so if you were to buy stocks in the right company at the right time you could become rich overnight is there something we could do to predict future stock prices given a data set of past prices machine learning this sounds like a data science problem but according to the efficient market hypothesis the stock market is random and unpredictable but major financial firms like JP Morgan and Goldman Sachs have been hiring quantitative traders for years to build predictive models on past market data and you can be sure that if these firms do have profitable models for trading they are not going to share it with us come at me Wall Street they have no incentive to think about all the features we could incorporate into a financial model sentiment analysis on company opinions past stock prices sales growth dividends all the profits Warren be missed for hitting on Bitcoin changes in stock prices are not completely random but very close to it good traders will use predictive models as a tool when deciding where to invest and we’re going to build three different predictive models that predict the prices of Apple stock then plot them all on a graph to compare their results our steps will be to install our dependencies collect our data set write our script and analyze our graph these are our four dependencies CFD will allow us to read data from the CSV file of stock prices that we later download numpy will let us perform calculations on our data Saiki learn will let us build a predictive model and map top live will let us plot our data points with our models on a graph for us to analyze let’s collect our data set we want a list of stock prices from the past 30 days and we can get this data easily from Google fine as you can see it’d be much higher if they didn’t miss the boat on AI next step write our script our four dependencies are here at the top and we’ll use the given names to reference them throughout our code one thing to note about map taught live is that since it’s a graphical library it will depend on a graphical back-end and there are several options if it doesn’t want to plot a graph on your machine for some reason just use the switch back-end option and try out a few different possible backends all right let’s start hacking on our script first let’s initialize two empty lists dates and prices well then write a function called get data that will fill them both with the relevant data we’ll call it get data and it’s argument will be the name of our stock prices CSV file well use the width as block to open our file and assign it to the CSV file variable the open statement will extract the contents of our CSV file to read it hence the R parameter next we’ll want to create a file reader variable which the CSV module will create for us using the reader method with our CSV file as the parameter this will allow us to iterate over every row in our CSV file and we can return a string for each line using the next method we’ll call the next method first to skip the first row since it’s just column names now for each row in our CSV file reader we’ll add both the date and price values to our respective lists the append function will allow us to add an item to the end of our list we only want the day of the month so we’ll say get that first column in our row which is at the index 0 and use the split function to remove the dashes between each of those three values then get that first value in the list which is the day wrap that using the in turkey word to convert the day to an integer for prices will append that list as well with the opening price which is in the next column of our row and convert that to a float to be more precise in our later calculations ok not that precise well place the return statement at the end to finish our with block let’s move on to our second and last helper function called predict price to build our predictive model and graph it well first use numpy to format our list into an n by 1 matrix the three parameters will be the list we want to reshape the new shape which will be a one dimensional array the size of our dates list and finally the order of elements let’s create three models each of them will be a type of support vector machine a support vector machine is a linear separator it takes data that’s already classified and tries to predict a set of unclassified data so if we only had two data classes it would look like this it will be the line such that the distances from the closest points in each of the two groups would be farthest away when we add a new data point to our graph depending on which side of the line it is we could classify it accordingly with a label but right now we’re not predicting a class label so we don’t need to classify instead we’re predicting the next value in a series which means we want to use regression SVM’s can be used for regression as well the support vector regression is a type of SVM that uses the space between data points as a margin of error and predicts the most likely next point in a data set let’s create our first model a linear support vector regression well use the svr module we imported from scikit-learn to create it and it’s going to take three parameters the kernel which is a type of SVM then our penalty parameter C of the error term we want two things when using an svr aligned with the largest minimum margin in a line that correctly separates as many instances as possible but we can’t always have both C determines how much we want the latter our next SVR is polynomial in math folklore the no free lunch theorem states that there are no guarantees for one optimization to work better than the other so we’ll try both also if you work at Google you actually do get free lunch so take that finally will create one more SVR using a radial basis function our bf defined similarity to be the Euclidean distance between two inputs if both are right on top of each other the max similarity is 1 if too far it’s a zero our gamma defines how far to far is and let’s fit or train each of our models on our date and price data using the fit method it’s time to create our graph we’ll plot the initial data points as black dots with the data label and plot each of our models as well well use the predict method of the svr object in psychie learn using the dates matrix as our parameter each will be a different color and we’ll give them a distinct label we can set the x axis and the y axis accordingly and we’ll add a title in a legend the show function will display it on the screen and we’ll want to return the predictions from each of our models now we can call our get data method on our CSV and create a variable to store our predicted price given our dates and prices for this date will print out results for each of our models – command line let’s analyze our graph we can see that each of our models shows up in our graph and that the RBF model seems to fit our data the best so we use its prediction in command line to stack dead presidents so to break it down the efficient market hypothesis states that the data needed to set the prices for tomorrow stocks only come from tomorrow but well-tuned machine learning models can give us predictions that are slightly better than random if we use the right data and support vector machines are a type of ML model that can be used for both classification and regression to predict novel data points in a graph the winner of the coding challenge from last week’s video is Victor C Rana Victor created a system that recommends artists to users using the last FM music data set badass of the week and the runner-up is Kevin Nelson he demoed his own recommender algorithm the challenge for this video is to create a financial model to predict stock prices with a neural network using both price history and sentiment analysis as features details are in the code readme post your github link in the comments and I’ll announce the winner in the next video please subscribe for more programming videos and for now I’ve got to predict snapchats IPO price so thanks for watching

Leave a comment



six + nine =