Fund Investing

The challenge of distinguishing between skill and luck
By John Rekenthaler | 27/03/18

This week, Morningstar published a paper by Paul Kaplan and Maciej Kowara, entitled "How Long Can a Good Fund Underperform its Benchmark?

About the Author
John Rekenthaler is vice president of research for Morningstar. He joined Morningstar in 1988 and has served in several capacities. He has overseen Morningstar’s research methodologies, led thought leadership initiatives such as the Global Fund Investor Experience report that assesses the experiences of mutual fund investors globally, and been involved in a variety of new development efforts. He currently writes regular columns for and Morningstar magazine. He holds an MBA with high honours from the University of Chicago Booth School of Business.

A baseball analogy may help answer this question. (Sports analogies usually do not, but this time may be an exception.) Assume a player who is an excellent hitter. In a typical season, he will bat .300, while hitting 35 home runs. However, you do not know that player's true skill. All you know is what you observe, once you begin watching him. How many at bats would it take before his quality becomes apparent? Could you reliably do so at 100? Might it require a full season?

For baseball, the problem is mostly theoretical, because good players rapidly accumulate thousands of observations. An All Star's slump can be safely ignored; almost certainly, his performance will soon revert to the mean. For fund investors, however (and the advisors and consultants who serve them), the answers to such questions are critical. There is much less information about their abilities. Their lapse might be just that, or it may reflect their actual talents.

Paul and Maciej examined mutual-fund slumps from two perspectives. One was to use existing fund histories, from the Canadian, U.S., European and developed Asian markets (excluding Japan and Australia, due to data availability.) The other was to create hypothetical funds, run by hypothetically skilled managers, and run simulations. (Get two PhDs working on a project and simulations are inevitable.)

Real-world results

Their paper begins with the fund histories. The authors calculated gross returns over the 15 years from 2003 through 2017, by adding each fund's expense ratio back to its official results. Gross returns matter not to investors, who can't spend what they don't receive. But it is the correct measure for evaluating manager skill.

Of the 5,500 equity funds that qualified for the study, two thirds had higher returns over that 15-year period than did the costless benchmarks. That is an impressive showing, but it is affected by survivorship bias, as several thousand funds that existed in 2003 disappeared before 2017 concluded. The true winning percentage was probably close to 50%.

Still, that makes for almost 4,000 funds that beat their relevant indexes over a 15-year stretch. Naturally, they did not do so always. At times, every winner was a loser. Thus, the question: What was the lengthiest period in which these successful funds were unsuccessful? Specifically, what was the longest time that the fund's gross returns trailed that of the index?

Dry spells

Oh, boy! The median length of the Longest Underperformance Period (LUP), as the authors term their measure, was … a decade! (Technically, one group of funds had a median of 8.5 years and another of 11 years, the details of which are immaterial for the purposes of this column.) Consultants place funds on watch lists if they lag their indexes (or most of their competitors) for three years; and Morningstar assigns its initial fund-star rating after that same span. Yet the typical 15-year winner suffered a 10-year dry spell.

At this point, you may be wondering about the math. How can a fund trail for 10 years out of 15, yet finish ahead? The answer is that during its LUP, the fund barely lags the index. This occurs by construction; if one more month could be added to the LUP, then it would cease to be the LUP! Thus, the Longest Underperformance Period measures the time over which a fund's gross returns almost, but not quite, match those of its benchmark.

That most successful funds' LUPs are so prolonged suggests that most of them didn't beat the index by much during those 15 years. Neither did most of the unsuccessful funds trail by much. Just as the winners suffered through times when they looked like losers, so too did the losers often appear to be winners. In fact, on average, their Longest Outperformance Periods (LOPs) were even longer than were the slumps for the index-beating funds. On average, funds that trailed the indexes for the full 15 years, had 11 to 12 year stretches of outperformance.

Those are daunting statistics. How can one distinguish between skill and luck when the stronger funds can slump for a decade, and the weaker ones can thrive for that long (or longer)? Perhaps the task is futile. Perhaps that 15-year measurement period is arbitrary, so that its list of good and bad funds is merely an accident of the time period. Perhaps that list would look very different if the authors had evaluated the funds over 20 years.

The simulations

Which leads us to the second part of Paul and Maciej's paper: the hypothetical funds, run by hypothetical managers. They might only be figments of a computer program's imagination, but they are skilled figments. On 75% of occasions, the gross returns for their funds beat the benchmark indexes over a 15-year trial. Their superiority is built into the simulation.

And … pfft. The cyber-managers also have prolonged LUPs. On average, in fact, their LUPs are even longer than those of actual fund managers. When evaluating their results, Paul and Maciej found that a logical investor who had no other information about the hypothetical fund than its Length of Underperformance, would conclude on 45% of occasions that these incontestably skilled managers had skill. On 32% of simulations, they would perceive no skill, and 23% of the time they would decide that the manager had negative skill.

One of four simulations, a cyber manager that was programmed to be skilled, would have a Length of Underperformance that matched that of the typical bad manager! On the bright side, Paul & Maciej did conclude that over a 100-year simulation, one could generally tell the difference between their managers who were programmed to be strong, and those who were programmed to be weak.

I will leave the task to you, dear reader, to determine the disadvantages of investing with a 100-year time horizon. I believe that there are some.

Don’t miss out on communications from Morningstar Canada! Sign up for our specialised newsletters, get early notice of our events, and get access to exclusive promotional content. Manage your subscriptions here.
Video Reports
Click here to view all