AI Stock Tradng
AI-Powered Stock Trading Strategy
- Developed a data-driven stock trading algorithm utilizing Python with Yahoo Finance API to retrieve historical stock data, implementing a range of technical indicators (SMA, RSI, MACD, Stochastic Oscillator, Bollinger Bands) to inform trading decisions.
- Applied K-Means clustering to group stock data patterns and leveraged a neural network model built with TensorFlow to predict stock price movements, enabling informed decision-making and optimized trade entries.
- Designed an automated hyperparameter optimization process using RandomizedSearchCV to fine-tune strategy parameters, enhancing model performance and portfolio management.
- Integrated data preprocessing with MinMax scaling for normalization and portfolio rebalancing based on dynamic predictions, demonstrating advanced data handling and machine learning techniques in financial analysis.
import yfinance as yf
import pandas as pd
import numpy as np
import ta
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from datetime import datetime, timedelta
from sklearn.model_selection import RandomizedSearchCV
# Define the list of symbols to trade
symbols = ['AAPL', 'MSFT', 'AMZN', 'GOOGL']
# Define the strategy parameters
lookback_period = 20
n_clusters = 3
# Download historical stock data
data = {}
for sym in symbols:
stock_data = yf.download(sym, start=datetime.now() - timedelta(days=365), end=datetime.now())
data[sym] = stock_data
# Calculate technical indicators
for sym in symbols:
sym_data = data[sym]
sym_data['SMA'] = ta.trend.sma_indicator(sym_data['Close'], window=lookback_period)
sym_data['RSI'] = ta.momentum.rsi(sym_data['Close'], window=lookback_period)
sym_data['MACD'] = ta.trend.macd_diff(sym_data['Close'])
sym_data['stoch'] = ta.momentum.stoch(sym_data['High'], sym_data['Low'], sym_data['Close'])
bands = ta.volatility.bollinger_hband(sym_data['Close'])
sym_data['BB_upper'] = bands[:, 0]
sym_data['BB_middle'] = bands[:, 1]
sym_data['BB_lower'] = bands[:, 2]
# Scale the technical indicators
scaler = MinMaxScaler()
for sym in symbols:
sym_data = data[sym].dropna()
sym_data[['SMA', 'RSI', 'MACD', 'stoch', 'BB_upper', 'BB_middle', 'BB_lower']] = scaler.fit_transform(sym_data[['SMA', 'RSI', 'MACD', 'stoch', 'BB_upper', 'BB_middle', 'BB_lower']])
data[sym] = sym_data
# Apply k-means clustering
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(np.concatenate([data[sym][['SMA', 'RSI', 'MACD', 'stoch']] for sym in symbols], axis=0))
# Define the neural network model
model = Sequential()
model.add(Dense(32, input_dim=n_clusters))
model.add(Dropout(0.2))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.001))
# Train the model
X_train = []
y_train = []
for sym in symbols:
sym_data = data[sym]
for i in range(n_clusters, len(sym_data)):
X_train.append(kmeans.predict(sym_data.iloc[i-n_clusters:i][['SMA', 'RSI', 'MACD', 'stoch']]))
y_train.append(sym_data.iloc[i]['Close'] > sym_data.iloc[i-1]['Close'])
X_train = np.array(X_train)
y_train = np.array(y_train)
model.fit(X_train, y_train, epochs=100, batch_size=32)
def trade_stock(symbol, portfolio, rsi_threshold=50, stoch_threshold=50, bb_threshold=2, macd_threshold=0.002, budget=100000):
# Download the historical data
stock_data = yf.download(symbol, start=datetime.now() - timedelta(days=365), end=datetime.now())
# Calculate technical indicators
stock_data['SMA'] = ta.trend.sma_indicator(stock_data['Close'], n=lookback_period)
stock_data['RSI'] = ta.momentum.rsi(stock_data['Close'], n=lookback_period)
stock_data['MACD'] = ta.trend.macd_diff(stock_data['Close'])
stock_data['stoch'] = ta.momentum.stoch(stock_data['High'], stock_data['Low'], stock_data['Close'])
# Bollinger Bands
stock_data['BB_upper'], stock_data['BB_middle'], stock_data['BB_lower'] = ta.volatility.bollinger_hband(stock_data['Close'])
stock_data['BB_upper'], stock_data['BB_middle'], stock_data['BB_lower'] = ta.volatility.bollinger_lband(stock_data['Close'])
# Scale the technical indicators
stock_data[['SMA', 'RSI', 'MACD', 'stoch', 'BB_upper', 'BB_middle', 'BB_lower']] = scaler.transform(stock_data[['SMA', 'RSI', 'MACD', 'stoch', 'BB_upper', 'BB_middle', 'BB_lower']])
# Apply k-means clustering
stock_data['Cluster'] = kmeans.predict(stock_data[['SMA', 'RSI', 'MACD', 'stoch', 'BB_upper', 'BB_middle', 'BB_lower']])
# Make a prediction using the neural network
X_test = stock_data.iloc[-n_clusters:]['Cluster'].values
y_pred = model.predict(np.array([X_test]))[0][0]
# Calculate the position size based on the budget and predicted probability
position_size = budget * y_pred
# Calculate the position size using a combination of momentum, RSI, stoch, Bollinger Bands, and MACD
if (y_pred > 0.5 and
stock_data['Close'].iloc[-1] > stock_data['Close'].iloc[-2] and
stock_data['RSI'].iloc[-1] > rsi_threshold and
stock_data['stoch'].iloc[-1] > stoch_threshold and
stock_data['Close'].iloc[-1] < stock_data['BB_upper'].iloc[-1] - bb_threshold * stock_data['BB_middle'].iloc[-1] and
stock_data['MACD'].iloc[-1] > macd_threshold * stock_data['MACD'].std()):
# Calculate weights for the portfolio
stock_price = stock_data['Close'].iloc[-1]
position_size = min(position_size, budget)
weight = position_size / stock_price
weights = np.zeros(len(portfolio))
weights[portfolio.index(symbol)] = weight
return weights, symbol
else:
return np.zeros(len(portfolio)), symbol
# Define the hyperparameter optimization function
def optimize_hyperparameters(budget=100, n_iter=50):
# Define the hyperparameters to optimize
param_grid = {
'rsi_threshold': [30, 40, 50, 60, 70],
'stoch_threshold': [30, 40, 50, 60, 70],
'bb_threshold': [1, 2, 3, 4, 5],
'macd_threshold': [0.001, 0.002, 0.003, 0.004, 0.005]
}
# Create a dictionary to store the results
results = {}
# Initialize the portfolio
portfolio = symbols
weights = np.zeros(len(portfolio))
# Loop over all trading days
for i in range(lookback_period, len(data[symbols[0]]), 1):
# Get the current date
current_date = data[symbols[0]].index[i]
# Rebalance the portfolio
if current_date.month != data[symbols[0]].index[i-1].month:
# Sell all positions
weights = np.zeros(len(portfolio))
# Buy new positions using randomized search
clf = RandomizedSearchCV(estimator=trade_stock, param_distributions=param_grid, n_iter=n_iter, cv=3)
clf.fit(portfolio, weights, budget)
weights = clf.best_estimator_.weights_
print("Trades on date ", current_date, ":")
for sym, weight in zip(portfolio, weights):
if weight > 0:
print("Buy", weight * 100, "% of", sym)
else:
print("Sell", abs(weight) * 100, "% of", sym)
# Normalize the weights
weights /= np.sum(weights)
# Update the budget
budget *= 1.0003
# Calculate the final portfolio value
portfolio_value = budget
for sym in symbols:
stock_data = data[sym]
stock_price = stock_data['Close'].iloc[-1]
weight = weights[portfolio.index(sym)]
stock_value = weight * portfolio_value
portfolio_value += stock_value
results[clf.best_params_] = portfolio_value
return results
This Python-based script automates a stock trading strategy by using historical data, technical indicators, clustering, and machine learning to make predictions on stock movements. It combines data preprocessing, clustering with K-Means, and a neural network model for predictive analytics, with further hyperparameter optimization to improve trading decisions. The primary goal of this script is to identify trading signals based on technical indicators and machine learning predictions, optimizing for profitability.
Steps and Logic
- Define Symbols and Parameters:
The script first defines a list of stock symbols (e.g., Apple, Microsoft, Amazon, Google) and initializes strategy parameters, such as the lookback period for technical indicators and the number of clusters for K-Means. - Data Collection:
- Using Yahoo Finance (yfinance) API, the script downloads one year of historical stock data for each symbol, including Open, High, Low, Close, and Volume prices. This data forms the foundation for calculating technical indicators, which will help predict stock trends.
- Calculate Technical Indicators:
- Several key technical indicators are calculated for each stock symbol:
- Simple Moving Average (SMA): Tracks average price over a specified window to smooth out short-term fluctuations.
- Relative Strength Index (RSI): Measures momentum by comparing recent gains and losses, indicating potential overbought or oversold conditions.
- Moving Average Convergence Divergence (MACD): Identifies trend changes by calculating the difference between a fast and a slow exponential moving average.
- Stochastic Oscillator: Compares a stock’s closing price to its price range over a lookback period, showing momentum.
- Bollinger Bands: Calculate upper, middle, and lower bands based on a moving average and standard deviation, helping to identify volatility.
- These indicators are stored in a dictionary for each symbol, creating a dataset rich in market analysis features.
- Several key technical indicators are calculated for each stock symbol:
- Data Scaling:
- To standardize the data, MinMaxScaler from
sklearn
scales each technical indicator to a range of 0 to 1, improving model convergence and ensuring consistent data formatting. This step is crucial for machine learning algorithms, especially clustering and neural networks.
- To standardize the data, MinMaxScaler from
- Apply K-Means Clustering:
- The script uses K-Means clustering to identify distinct patterns or “clusters” in the data based on technical indicators. Clustering helps categorize market conditions, identifying groups of similar data points (e.g., bearish, bullish, or neutral trends).
- The clustering results are then assigned back to the dataset for each stock symbol, marking each observation with a “cluster label” that represents its pattern.
- Define and Train Neural Network Model:
- A neural network is constructed using TensorFlow’s Keras library, consisting of several dense layers with dropout for regularization.
- The model takes the clustered data as input and is trained to predict a binary outcome: whether the stock’s price will increase (1) or decrease (0) on the next trading day.
- Training data is compiled by iterating over each stock’s historical data, extracting clusters, and labeling each observation based on price movement. The model is trained over multiple epochs to improve prediction accuracy.
- Trade Execution Logic:
- The function
trade_stock
performs real-time trading analysis based on:- Technical Indicators: RSI, Stochastic Oscillator, Bollinger Bands, and MACD are recalculated on the latest data to determine potential entry points.
- Clustering and Model Prediction: The most recent clusters are used as inputs for the neural network, which then outputs a prediction for the probability of a price increase. If the prediction exceeds a threshold (e.g., 0.5), the script will suggest a buy decision.
- This function also evaluates the predicted price direction relative to technical indicators to calculate a position size, controlling the allocation of the budget for each trade based on risk factors.
- The function
- Portfolio Rebalancing with Hyperparameter Optimization:
- To improve trading results, the script uses RandomizedSearchCV to find optimal parameters for the trading thresholds (e.g., RSI, Stochastic, Bollinger Bands) and MACD signal thresholds.
- At each monthly interval, the portfolio is rebalanced, selling all positions and using the optimized parameters to select new positions. This allows the strategy to adapt dynamically to changes in the market environment, recalibrating entry and exit points based on the latest insights.
- Result Calculation:
- After a series of trades, the script calculates the portfolio’s final value, incorporating gains and losses from each position based on the stock’s closing price.
- By looping through trading days, the model outputs final portfolio values, which indicate the performance of the strategy with the optimized parameters.
Summary
This trading script provides an advanced approach to algorithmic trading by blending technical indicators, clustering, and machine learning predictions. Using historical data, it creates dynamic trading signals that adapt to the latest market conditions, making use of Python’s machine learning libraries and Yahoo Finance data. The hyperparameter optimization and rebalancing logic make it highly flexible, allowing it to adapt and recalibrate based on trading performance.