A professor testing ChatGPT's, DeepSeek's and Grok's stock-picking skills suggests stockbrokers should worry

Dow Jones
Yesterday

MW A professor testing ChatGPT's, DeepSeek's and Grok's stock-picking skills suggests stockbrokers should worry

By Laila Maidan

Alejandro Lopez-Lira has been impressed with how current AI models can trade markets

Is artificial intelligence coming for the jobs of Wall Street traders? An assistant professor of finance at the University of Florida, Alejandro Lopez-Lira, has spent the past few years trying to answer that question.

Lopez-Lira has been experimenting with ChatGPT, DeepSeek and Grok to see if AI can be used to pick stocks. So far, he's impressed with what the currently available AI chatbots can do when it comes to trading equities.

In an interview, Lopez-Lira acknowledged that AI is prone to making mistakes, but he has not seen the three versions he's been using do anything "stupid." His work comes as more market participants are thinking about the implications of AI for investing and trading.

"I don't know what tasks out there analysts are doing with information that can't be done with large language models," Lopez-Lira said. "The only two exceptions are things that involve interacting in the physical world or having in-person conversations. But, other than that, I would imagine all of the tasks or most of the tasks can already be automated."

Shortly after OpenAI Inc. released ChatGPT in 2022, Lopez-Lira began testing the chatbot's skills. He wanted to know if ChatGPT, and AI in general, would show an ability to pick stocks. While there are numerous ways to approach that question, Lopez-Lira began with a simple exercise: Could the AI application accurately interpret whether a headline on a news story is good or bad for a stock? What he found surprised him.

Conducting a back test simulating historical stock-market returns, the study used more than 134,000 headlines from press releases and news articles for over 4,000 companies that were pulled from third-party data providers. The headlines were fed into ChatGPT using a programming language called Python. ChatGPT would then decide whether a headline was positive for a company, negative or unknown. The results were then saved in a data file and uploaded into statistical software in which headlines perceived as positive would result in a stock purchase. Negative headlines would trigger short sales, effectively betting against a stock in anticipation that it will fall in price. If ChatGPT was uncertain, no action was taken.

Because this was an academic simulation, no actual stocks were traded. But the software did compare the simulated performance against historical outcomes. The stock picks were made daily, with a median of 70 stocks bought and a median of 20 shorted.

For Lopez-Lira, the tricky thing about using a back-testing approach was that the AI could know what, in the end, had transpired. OpenAI had trained ChatGPT in 2022 on data up until September 2021. So Lopez-Lira tested the chatbot using headlines after October 2021. This way, ChatGPT wouldn't know what was going to happen and would need to rely on reason to come to conclusions.

His findings were released on the SSRN preprint platform in April 2023 in a paper titled "Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models." The study, currently being peer reviewed, found that ChatGPT had "significant predictive power for economic outcomes in asset markets." The GPT-4 version had an average daily return of 0.38% with a compounded cumulative return of over 650% from October 2021 to December 2023.

Now, obviously, this academic study had limitations. In the real world, frictions exist that would strain returns, including brokerage transaction costs and fees; the availability of shares; taxes; and price impact, which is when relatively large trades move a stock's price. Additionally, about 76% of the gains came from shorts, a trading strategy that can be more fraught due to short-interest fees and the need to find the shares to borrow and sell short.

"So, our results on paper are much more optimistic than what the performance in reality would be with a reasonable investment size," Lopez-Lira said. But the tilt toward positive returns was enough for him to conclude that ChatGPT had understood economic markets and shown an ability to forecast stock outcomes.

Putting AI to the test again

About a month after the preprint was published, Lopez-Lira got the chance to take his experiment outside of the academy after being contacted by Autopilot, an investment app that mimics the trades of notable public figures. He was asked to help create a portfolio that would be based on investment picks made by ChatGPT. It was an opportunity for him to see how his academic experiment would perform in the real world.

By September 2023, he'd begun providing the Autopilot app with the investment picks made by ChatGPT on a monthly basis. The Autopilot team would then upload the selections, and Autopilot users could link their brokerage accounts to the stock picks. This time, since real money was involved, Lopez-Lira had to do more than just feed ChatGPT a few news headlines. He had to provide it with a wide range of information to be sure it was making decisions based on the macroeconomic environment and company financials.

Available AI models are not currently in a place where you can just ask them to pick investments, said Lopez-Lira. The process still requires a human in the loop to feed it with the information it needs to consider before making a decision. This is mostly because AI models aren't trained on real-time data, which means their knowledge is often outdated, including for such basics as the price of a stock's last trade. Even as AI models are able to conduct live web searches, they don't always know what information to search for in order to make the most informed decisions, he added.

"Large language models are tricky to handle, they can make stuff up and sometimes they don't have the right information," Lopez-Lira said. "So you have to know how to prompt the AI."

The process

The portfolio managed by ChatGPT would consist of 15 positions, 10 of which had to be stocks from the S&P 500 SPX and five of which had to be exchange-traded funds that have exposure to a sector or industry.

To get there, Lopez-Lira used Python to pull information from third-party data providers and news websites about the macroeconomic environment, geopolitical risks, company financials and the latest prices for stocks within the S&P 500. He then asked ChatGPT to consider the information and assign companies a score on a scale of 1 to 100, with a higher score representing a better investment. Once the AI had decided on its scoring, it was then asked to create a portfolio of stocks and exchange-traded funds based on that information.

More recently, in February, Lopez-Lira added investing accounts on Autopilot that use Grok and DeepSeek.

Since then, the Florida professor has been gradually removing restrictions placed on the three AI models. For example, in March, the models were allowed to decide on the weightings of each holding. In April, the models were freed to balance up to 15 positions outside the initial parameters of 10 stocks and five ETFs, allowing them to pick a combination of their choosing. They could also pick ETFs that had exposure to additional asset classes, like bonds and commodities, excluding ones that use leverage, derivatives and short positions.

To date, the latest AI models running the investment accounts are OpenAI's o3, xAI's Grok 3 and DeepSeek R1. The models are periodically updated based on the latest versions available. Lopez-Lira also rotates which AI model he uses to summarize macroeconomic risks and score companies based on the 1-to-100 scale, but all three models receive the same input, regardless of which model did the screening.

Below is an example of what each AI application picked for its 15 positions on April 1, the day before President Donald Trump's "liberation day" announcement on tariffs, until May 5, when the selections were rebalanced. It was based on data Lopez-Lira provided up until March 31. He used Grok 3 to organize the data. All three AI applications were fed the same information, but OpenAI's o1 pro and Grok 3 were able to include information from web searches, according to Lopez-Lira. The tables below show each model's selection, including the weightings, reason (thesis), the advantage of the position (edge) and the risk associated with the position.

Securities picked by OpenAI's o1 pro

   Asset                                Ticker symbol  Weight  Thesis                                                                                                                                           Edge                                                                                                         Risk 
   1   SPDR S&P 500 ETF Trust               SPY            15%     Core broad-market exposure to capture potential equity rebound if inflation/cooler data lead to Fed pivot or if tariffs don't derail sentiment.  Broad diversification; quick participation in any relief rally.                                              If inflation re-accelerates or tariffs escalate, S&P could tumble broadly. 
   2   iShares 20+ Year Treasury Bond ETF   TLT            8%      Potential capital-gains hedge if rates drop on recession fears or Fed rate cut signals.                                                          Good inverse correlation to equities if risk-off. May rally on rate-cut bets.                                If inflation remains sticky, yields could rise, hurting long-term Treasuries. 
   3   SPDR Bloomberg 1-3 Month T-Bill ETF  BIL            5%      Capital preservation and liquidity amid uncertainty of tariffs & data in next month.                                                             Very low volatility, stable yield.                                                                           Opportunity cost if equities rally strongly. 

(MORE TO FOLLOW) Dow Jones Newswires

June 07, 2025 08:30 ET (12:30 GMT)

Copyright (c) 2025 Dow Jones & Company, Inc.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10