There is no substitute for domain expertise. I see a million part time wannabe quants that have never worked a day in the business running “quant” substacks. They lack domain expertise but plenty of retail idiots willing to pay them for the substack so they can finance their “quant” trading. 🤣
“The traders who do well will still be the ones who start somewhere different. Not with a backtest, but with that question: why would the market pay me to do this trade?”
But why couldn’t I just prompt my AI with this to come up with high quality hypotheses via proper context injections and then test them with the proper over fitting guardrails?
Really appreciate the depth here — you're asking the right question and honestly it's the same one that drove what we've been building for 2 years.
You're right that most AI trading approaches skip the "who's on the other side" question entirely. We agree
with that. Where we might see it differently is whether an LLM is fundamentally incapable of answering it, or whether the real problem is that nobody's bothering to track and penalize what the LLM already knows coming in.
Here's what I mean. When a human says "I think there's an edge in ag spread reversion during rebalancing
periods" — that idea came from somewhere. Papers they read, conversations at conferences, patterns they
noticed over 20 years. That's massive hidden K. Thousands of implicit hypothesis tests baked into one
"original idea." And none of it ever gets penalized.
A well crafted system generates hypotheses through context injections — specific literature, specific market structure data, cross-market relationships. The AI might look at wheat-corn dynamics and reason that large ag funds have quarterly rebalancing constraints, and when spreads hit extremes the big players can't react fast enough because of mandate restrictions. That's a structural theory of edge. It knows who's on the other side and why they'll keep showing up.
The difference is we know exactly what the AI saw. The context injection is the bibliography. We can penalize it.
Then it goes further — every hypothesis gets vectorized and fingerprinted. If the system ever approaches a
similar idea from a different angle, it doesn't get a fresh start. It resumes with the accumulated K from
before. No free looks, ever. Every subsequent test, every variation, every new market gets
Bonferroni (or similar methodologies) penalized against everything that came before it.
A human who forgets they explored something similar 6 months ago gets unlimited free looks. The system never forgets.
You laid out what actually matters really well — "why does this exist? Who's the counterparty? What are their constraints, and why will they keep being there?" We'd argue that's exactly what a properly constrained LLM can answer — you just have to set it up right.
Ask an LLM "what's a good oversold strategy for SPY" with zero context and you'll get RSI < 30 every single time. That's the vibe quant problem you're describing and it's real. But give that same LLM specific market microstructure data, cross-asset flow research, institutional rebalancing constraints — suddenly it's
reasoning about who's on the other side and why they'll keep showing up. The answer changes completely based on what you inject, and you can see exactly what it saw.
The trick is two things: you never let it run calculations you can't see (every stat comes from tools, not the
LLM's head), and you penalize everything. Every context injection is a traceable bibliography entry. Every
hypothesis gets fingerprinted and vectorized. Come back to a similar idea six months later from a different
angle — the system catches it and resumes with all the accumulated K. No free looks, ever.
You wrote "there's no learning loop because the LLM did the thinking." That's true when someone just prompts and prays. But when the research infrastructure itself compounds — when every hypothesis tested makes the next one more penalized, more constrained, more honest — the loop is there. It's just encoded in the fingerprints instead of in someone's intuition.
Really enjoy your writing — you're one of the few people asking the right questions instead of just
cheerleading. Would love to compare notes sometime.
Mate. You’ve copied and pasted an answer straight out of an LLM. It has all the tells. You didn’t even fix the paragraph formatting!
I get it. You’re trying to put good things out in the world and you don’t trust your ability to convey them as well as you want to. But you should get over that and just give it a try. I won’t judge. And who cares what the internet thinks anyway?
Re-read your LLM’s response. It’s awful. In both content and character. Bland. Vanilla. Slightly sycophantic in a “the most important thing is that nobody’s offended” kind of way. The kind of drivel that makes one despair for the future of humanity.
(I’m criticising your LLM, not you).
More to the point, I don’t want to talk to your LLM. Especially about a topic that it’s so confused about.
But I’d love to talk to you. And I don’t care if you get stuff wrong or make mistakes or disagree with me. That’s all part of the beauty of human to human interaction.
We all get stuff wrong and mis-speak (or mis-write). I did it myself in my response to your first comment (you mentioned lookahead bias but my 11pm brain read “survivorship bias”). It happens.
Don’t let the fear of that take away the opportunity for real human interaction. I won’t judge you for your response. I’ll engage with it, and hopefully we both learn something and make a real connection.
Your LLM is steering you wrong - all that stuff about Bonferroni corrections and traceable injections is at best (and this is being way too generous) a poor substitute for developing a good mental model of the market, it’s participants, their goals and constraints… which is where edge comes from in the first place. At worst, it will obscure the thing you care about and waste the precious time you could be putting in to proper research.
That’s what we’ve been working on for the past couple years. Comes down to hypothesis fingerprinting in vectorized databases and proper k tracking, p hacking, lookahead bias, slippage and commissions etc.
Dealing with lookahead bias is just a research hygiene thing. Easy enough to deal with without an LLM - if you need to deal with it at all. Incorporating slippage and commissions is a technical detail of backtesting. Nothing to do with edge.
You can’t effectively deal with p-hacking, and neither can an LLM. (No one knows how many things the author of that paper tried before he published the thing he published).
The point is, those “guardrails” don’t result in good research. They don’t even really relate to research at all - they’re more implementation/technical stuff.
The root of the problem is that an LLM is trained on the internet. There’s a smattering of useful trading information on the internet, but it’s buried in a sea of nonsense and grift. The LLM can’t tell the difference, and it spouts what it thinks it knows on the subject with the same conviction that it tells you how to write a web scraper in Python.
In theory, you could train an LLM on that “good” trading material specifically. But honestly, you’re better off just learning it. It ain’t that hard.
There is no substitute for domain expertise. I see a million part time wannabe quants that have never worked a day in the business running “quant” substacks. They lack domain expertise but plenty of retail idiots willing to pay them for the substack so they can finance their “quant” trading. 🤣
Great post!
“The traders who do well will still be the ones who start somewhere different. Not with a backtest, but with that question: why would the market pay me to do this trade?”
But why couldn’t I just prompt my AI with this to come up with high quality hypotheses via proper context injections and then test them with the proper over fitting guardrails?
What would you consider a proper over fitting guardrail?
Really appreciate the depth here — you're asking the right question and honestly it's the same one that drove what we've been building for 2 years.
You're right that most AI trading approaches skip the "who's on the other side" question entirely. We agree
with that. Where we might see it differently is whether an LLM is fundamentally incapable of answering it, or whether the real problem is that nobody's bothering to track and penalize what the LLM already knows coming in.
Here's what I mean. When a human says "I think there's an edge in ag spread reversion during rebalancing
periods" — that idea came from somewhere. Papers they read, conversations at conferences, patterns they
noticed over 20 years. That's massive hidden K. Thousands of implicit hypothesis tests baked into one
"original idea." And none of it ever gets penalized.
A well crafted system generates hypotheses through context injections — specific literature, specific market structure data, cross-market relationships. The AI might look at wheat-corn dynamics and reason that large ag funds have quarterly rebalancing constraints, and when spreads hit extremes the big players can't react fast enough because of mandate restrictions. That's a structural theory of edge. It knows who's on the other side and why they'll keep showing up.
The difference is we know exactly what the AI saw. The context injection is the bibliography. We can penalize it.
Then it goes further — every hypothesis gets vectorized and fingerprinted. If the system ever approaches a
similar idea from a different angle, it doesn't get a fresh start. It resumes with the accumulated K from
before. No free looks, ever. Every subsequent test, every variation, every new market gets
Bonferroni (or similar methodologies) penalized against everything that came before it.
A human who forgets they explored something similar 6 months ago gets unlimited free looks. The system never forgets.
You laid out what actually matters really well — "why does this exist? Who's the counterparty? What are their constraints, and why will they keep being there?" We'd argue that's exactly what a properly constrained LLM can answer — you just have to set it up right.
Ask an LLM "what's a good oversold strategy for SPY" with zero context and you'll get RSI < 30 every single time. That's the vibe quant problem you're describing and it's real. But give that same LLM specific market microstructure data, cross-asset flow research, institutional rebalancing constraints — suddenly it's
reasoning about who's on the other side and why they'll keep showing up. The answer changes completely based on what you inject, and you can see exactly what it saw.
The trick is two things: you never let it run calculations you can't see (every stat comes from tools, not the
LLM's head), and you penalize everything. Every context injection is a traceable bibliography entry. Every
hypothesis gets fingerprinted and vectorized. Come back to a similar idea six months later from a different
angle — the system catches it and resumes with all the accumulated K. No free looks, ever.
You wrote "there's no learning loop because the LLM did the thinking." That's true when someone just prompts and prays. But when the research infrastructure itself compounds — when every hypothesis tested makes the next one more penalized, more constrained, more honest — the loop is there. It's just encoded in the fingerprints instead of in someone's intuition.
Really enjoy your writing — you're one of the few people asking the right questions instead of just
cheerleading. Would love to compare notes sometime.
Mate. You’ve copied and pasted an answer straight out of an LLM. It has all the tells. You didn’t even fix the paragraph formatting!
I get it. You’re trying to put good things out in the world and you don’t trust your ability to convey them as well as you want to. But you should get over that and just give it a try. I won’t judge. And who cares what the internet thinks anyway?
Re-read your LLM’s response. It’s awful. In both content and character. Bland. Vanilla. Slightly sycophantic in a “the most important thing is that nobody’s offended” kind of way. The kind of drivel that makes one despair for the future of humanity.
(I’m criticising your LLM, not you).
More to the point, I don’t want to talk to your LLM. Especially about a topic that it’s so confused about.
But I’d love to talk to you. And I don’t care if you get stuff wrong or make mistakes or disagree with me. That’s all part of the beauty of human to human interaction.
We all get stuff wrong and mis-speak (or mis-write). I did it myself in my response to your first comment (you mentioned lookahead bias but my 11pm brain read “survivorship bias”). It happens.
Don’t let the fear of that take away the opportunity for real human interaction. I won’t judge you for your response. I’ll engage with it, and hopefully we both learn something and make a real connection.
Your LLM is steering you wrong - all that stuff about Bonferroni corrections and traceable injections is at best (and this is being way too generous) a poor substitute for developing a good mental model of the market, it’s participants, their goals and constraints… which is where edge comes from in the first place. At worst, it will obscure the thing you care about and waste the precious time you could be putting in to proper research.
That’s what we’ve been working on for the past couple years. Comes down to hypothesis fingerprinting in vectorized databases and proper k tracking, p hacking, lookahead bias, slippage and commissions etc.
What’s hypothesis fingerprinting? k tracking?
Dealing with lookahead bias is just a research hygiene thing. Easy enough to deal with without an LLM - if you need to deal with it at all. Incorporating slippage and commissions is a technical detail of backtesting. Nothing to do with edge.
You can’t effectively deal with p-hacking, and neither can an LLM. (No one knows how many things the author of that paper tried before he published the thing he published).
The point is, those “guardrails” don’t result in good research. They don’t even really relate to research at all - they’re more implementation/technical stuff.
Ask an LLM to design your trading strategy with those guardrails in place, and you’re literally doing the thing I described here: https://krislongmore.substack.com/p/more-of-the-disease-faster
The root of the problem is that an LLM is trained on the internet. There’s a smattering of useful trading information on the internet, but it’s buried in a sea of nonsense and grift. The LLM can’t tell the difference, and it spouts what it thinks it knows on the subject with the same conviction that it tells you how to write a web scraper in Python.
In theory, you could train an LLM on that “good” trading material specifically. But honestly, you’re better off just learning it. It ain’t that hard.
Yes! Couldn’t agree more!!
At least for a while it 10,000 retail built trading models will make the market less efficient.
We might see a little more noise trading at the margins. I don’t think it moves the dial all that much in tradfi. Crypto could be a different story.
Read academic papers test them across regimes. Just know that when you backtest with LLMs they will try to “solve” trading.