So far, I think Google Gemini is the clear winner. I asked ChatGPT and Grok a somewhat complex annuity problem, and while they came close, only Gemini got the right answer.
What is the best kitchen tool for rowing?
I tried a spoon and it was ok.
But then I saw the person next to me using an oar! they went much faster than me and got to their destination quicker!
Choose the correct tool for the job.
LLM can write all those slides no one reads but don’t use it for the maths.
I suggest you watch one of the many videos of LLMs playing chess. They do quite well but occasionally they just make up illegal moves, move the other persons pieces etc etc.
LLMS will have too many random errors for precision in mathematical problems.
What is the best LLM for finance?
I actually thought this was asking about law degrees
Just curious: what was the annuity problem?
You are building an investment portfolio (balance reducing) to ensure you can maintain purchasing power of $100k a year for 10 years, assuming 3% inflation. The expected return on the portfolio you choose is 2.2%. How much do you need on Day 0?
I found this problem on this forum. I solved it using excel. But could not using the TA Calc.
I saw the movie about Google DeepMind, AlphaGo. It terrified me.
What did AI give as a solution???
Gemini: $1,013,669.77
Grok was off
ChatGpt was off
I think some “very clever person” showed you how to use the calculator to solve it and also gave you a formula.
ChatGPT made me log in to get the answer. This was with ChatGPT4, not the new release ChatGPT5.
I did it twice with separate logins/accounts, and it gave 2 completely different answers:
$899,287.64 and $1,034,942. Both are wrong.
For the second answer, I tried to check the working, and it seemed to have the correct formula, but got elementary calculations wrong.
For example,
Going from the 1st column to the 2nd column, it gets the value of (1+0.022)^n wrong for n\ge 4
Going from the 2nd column to the 3rd column, all the calculations are wrong, for example 100,000/1.022 should be 97,847.36 but it gets 97,942.42.
As to how I could get 2 completely different answers from different logins for the same problem ???
I’m hearing that some LLMs are better suited for finance than others.
Thanks for sharing.
yeah! That guy is a L E G E N D!
I’m in credit risk and the errors ChatGPT4 made were crazy and I feel the quality dropped at some point earlier this year. ChatGPT5 is a clear improvement but hallucinations are still a big issue. But very good on term sheets, covenant calculations, asset pledges and sensitivity calculations etc.