Several modules are controlled from the,
“Main_Control_Script.py”. Once per hour the following are run:
RedditScraper.py
PriceApp.py
Analysis.py
/TradeMaker/Trade_Script.py
from github import Github
from time import sleep
import pandas as pd
import subprocess
import datetime
import glob
import os
import csv
count = 0
# How many time to run the loop (once per hour)
while count < 7:
count = count + 1
exec(open("RedditScraper.py").read())
sleep(2)
# Get latest output from the scraper and add it and delete old
one
list_of_files = glob.glob('./*.csv') # *
means all if need specific format then *.csv
latest_file = max(list_of_files,
key=os.path.getctime)
latestFile = pd.read_csv(latest_file,
encoding ='utf-8')
latestFile = latestFile.iloc[0:1]
latestFile.Hour=pd.datetime.now()
mainFile =
pd.read_csv('./ScrappedData/ScrappedReddit.csv',
encoding='utf-8')
os.remove("./ScrappedData/ScrappedReddit.csv")
sleep(2)
latestFileNew =
latestFile[mainFile.columns]
out = pd.concat([mainFile, latestFileNew])
out.to_csv('./ScrappedData/ScrappedReddit.csv',
index=False)
sleep(2)
# Run price scraper
exec(open("PriceApp.py").read())
sleep(2)
# Run analysis and make predictions
exec(open("Analysis.py").read())
sleep(2)
# Upload data to GitHub
# Set user info
g =
Github("f3e423110fbea4df9a0a9ada58caea3700641a14")
repo = g.get_user().get_repo('tradr')
all_files = []
contents = repo.get_contents("")
while contents:
file_content =
contents.pop(0)
if file_content.type ==
"dir":
contents.extend(repo.get_contents(file_content.path))
else:
file = file_content
all_files.append(str(file).replace('ContentFile(path="','').replace('")',''))
# Upload Price Data
with open('./PriceData.csv', 'r') as file:
content = file.read()
git_prefix = 'Data/PriceGrabber/'
git_file = git_prefix + 'PriceData.csv'
if git_file in all_files:
contents =
repo.get_contents(git_file)
repo.update_file(contents.path, "committing files", content,
contents.sha, branch="main")
print(git_file + '
UPDATED')
else:
repo.create_file(git_file, "committing files", content,
branch="main")
print(git_file + '
CREATED')
# Scrapped Reddit
with
open('./ScrappedData/ScrappedReddit.csv', 'r') as file:
content = file.read()
# Upload Scrapped Reddit Data
git_prefix = 'Data/ScrappedData/'
git_file = git_prefix +
'ScrappedReddit.csv'
if git_file in all_files:
contents =
repo.get_contents(git_file)
repo.update_file(contents.path, "committing files", content,
contents.sha, branch="main")
print(git_file + '
UPDATED')
else:
repo.create_file(git_file, "committing files", content,
branch="main")
print(git_file + '
CREATED')
sleep(2)
# Make trades using SignalInput.csv
main_path = '/home/iii/MEGA/NYC
Data/tradr/Data/TradeMaker/'
python_path =
f"{main_path}venv/bin/python3"
args = [python_path,
f"{main_path}trade_script.py",
"/home/iii/MEGA/NYC
Data/tradr/Data/SignalInput.csv"]
process_info = subprocess.run(args)
print(process_info.returncode)
# Wait one hour (or whatever)
sleep(3600)
The Pyramid of Death

1. Scraping Reddit
Selenium is used because it gives more flexibility with no
restrictions, but could try the API too.
The crypotocurrency community is nicely into separated forums
based on asset, so it can give a more granular view. Because
cryptocurrency is traded 24/7 and forums are very active lot of
data is available.
Date scrapped hourly:
All comment text
Current users number
Number of posts in last hour
Number of comments in last hour
Number of votes in the last hour
Hourly price and volume data from Nomic.com API
For Assets/Subreddits
BTC, BCH, ETH, XMR, DASH
r/bitcoin, r/btc, r/ethereum, r/ethtrader, r/ethfinance,
r/monero, r/xmrtrader
Price data is grabbed from the Nomics.com free API.
Code for the Reddit scraper is found here and Price scraper
here.
Features
All together the following features are being used:
Hour of day
Day of week
Number of users/hour
Number of posts/hour
Comments/hour
15 most significant words
NLP on comments
The target for the training data is a +/- in the percent change
for the next hour.
Analysis
Random forest models were used to classify the most significant
words used in the comments and for +/- price signal
classification based on all the features above. Full code is
found here. The output is an update to the file,
“SignalInput.csv” were 1=Buy and 0=Sell signal based on each
subreddit and asset.
Trade
The target output of the analysis is an hourly updated array of
buy/sell signals. After the usr has put in their API in the
config.py file the the signals are read in by the,
“trade_script.py” file and trades made on Binance once per hour.
The basic algorithm for the sales and purchases is that for a
buy signal, 20% of the account is spent on that asset, for sell
signals all of the asset will be sold.
App
At https://TradR.fun a user can choose what features to include
and rerun the model to try and get the best possible score and
predictions.
Results
As far as performance, after about one month the performance is
flat. It is very conservative and I the algorithm spends most of
its time sitting in USD. Sell signals seem to be much more
common and only 10–20% of the time does the algorithm make a
purchase.
Future Work
Continue to test accuracy with new data.
Tweak features.
Optimize the number of words.
Try time series models with price data included.
Use sliding window to test at what time interval signals are
most accurate.
Add ability for users to trade on TradR.fun
Thanks for reading and please get in touch at
https://KyleBenzle.com or on Twitter.