Here's the original post: https://www.reddit.com/r/StockMarket/comments/7ypjga/easily_track_prevailing_sentiment_for_over_2500/
Here's a link to the site: https://quikfo.com
Here's the link to the Twitter bot: https://twitter.com/myQuikfo
Before anyone needs to ask: this tool was, and remains, entirely free to use!
Since the last time I posted I have:
1. Reworked the UI/optimized mobile to some degree
2. Reworked the entire classification system to use industry specific training data rather than one market-wide weighting of words
3. Added comment sections so that users can post their own thoughts/ideas/feedback/etc
4. Have begun analyzes article content and summarizing this content into key sentences for use in the classification, allowing the system to move past just basic headline analysis.
5. Fixed about 1000 bugs behind the scenes lol
6. Implemented a login/follow/newsfeed system so now you can create an account and follow all of the companies that you're most interested in... these companies will appear ranked by qScore in your newsfeed
7. I built out a significant portion of the API so that other developers can begin to play around with the data
Here's a glimpse at what's on the horizon:
1. My major goal over the next few weeks in to build and implement a huge spam filtering system. Basically a big system which will learn from user feedback and reports of "bad data" and be able to remove bad data, duplicate data, etc before it makes its way into the scoring algorithms, or worse yet, the training data.
2. I want to further optimize the frontend UI for mobile and desktop
3. I want to build out more charts and visuals like the bubble chart
4. The API could use more parameters to add power to the potential queries
Your feedback was super helpful last time I posted this, and lots has been developed since then, so I wanted to bring it back for another round of testing and feedback!
For those of you who didn't read the last post, Quikfo is a system which I'm making for my senior thesis. It uses a basic machine learning model which loops through data collection, data preparation, data modeling, and data analysis to train a bayesian classifier and produce rankings regarding which companies have news which most closely matches the news of companies who are about to see upward or downward price movements. So, that was a mouthful, let me break it down for you:
-the system collects news data on companies everyday
-at the end of everyday Quikfo analyzes which companies did extraordinarily well or poorly and finds patterns between these out-performers' news and their price movement
-this data news and its patterns with next-day upward and downward price movement are then loaded into training sets for a classifying program which employs bayes theorem
-using this training data and the current news surrounding a company, this classifier can now create a prediction and score the company from 0 to 100; this score can effectively be considered "the probability that this company's news matches the news of the positive set of companies"
This system is nothing more than a helpful research tool, as companies often get scored highly because they have, for example, a lack of significant data so there is just one article with some well placed words. While this isn't a great tool for blindly throwing your money into investments at the whim of a robot, it can be greatly useful in funneling down the Russell 3000 into 10 or 20 companies which may be worth researching for short-run investment opportunities.
The system works quite well so far. Looking back on the SP500, for example, the optimal trading strategy has been to buy at market open and sell at market close anything which is scored >=85. This strategy would have returned >13% between the beginning of December 2017 and the end of February 2018.
And yes, I'm aware that some of the positive/negative word associations seem crazy. How can a good word be bad or a bad word be good, one might ask. Well all this does is indicate that when a word appears, it more often than not leads to a specific market movement. One could notice that 'Fraud' is particularly green...perhaps this is because when everyone is talking about fraud, then it's already priced in and may have found a bottom. The question of why can't really be answered by the system, that's more up to the inferences of the user.
Looking for feedback from the community, please let me know what you think. There is tons of data being collected that I haven't even gotten around to building a front-end interface for. The possibilities are endless and I would love to tailor the project based upon the direction that this, and other communities, would like to see.
Also, if you're interested in using any data that can't be pulled via the API yet, or you would just like to talk to me about a project that you would use my API in, then please feel free to reach out! The other day, for example, I copied the entire articles table that I've collected since November over to a friend of mine who likes to do static data analysis on sites like Kaggle. If you're interested in that at all then just drop me a message!
Thank you!
Submitted March 09, 2018 at 11:18AM by TheLoneDonut http://ift.tt/2Fpe6Zp
No comments:
Post a Comment