My Smart Beta ETF Premised on Cats Rang Up an 849,751% Return

I was rich. Right?

I mean, that’s what my Bloomberg said. I’d just entered in an index built from companies with “cat” in their names — yes, the furry felines — hit a button and watched it back-test to an 849,751 percent return. Forget the internet, I thought. Cats are about to take over smart beta.

This is the story of the time I designed my own factor fund as a way of learning about one of Wall Street’s hottest trends — and its pitfalls. There are already ETFs that focus on themes, such as “biblically responsible” companies or ones popular with millennials. Quants have hundreds of style tilts, and their exploding popularity has created a gold rush for creators. I wanted in.

I notified Andrew Ang, head of factor investing strategies at BlackRock Inc. Everything in my program was by the book, I assured him. It was rules-based, equal-weighted and premised on a simple story — that people love cats.

“I love cats, too, and obviously cats are superior, so this is a great investment strategy,” Ang said, as I began to plot my career as a quant. Then he said, “I’m joking, of course.”

Alas, though decades of research back up the idea that you can sort stocks by traits like volatility and momentum and beat the market, Ang saw a far less glorious future for my Abyssinian anomaly. Actually, it failed virtually every conceptual test he could think of, a lesson for anyone convinced she’s found the key to riches in statistical engineering.

“The No. 1 thing is that it lacks an economic foundation,” Ang said.

Pitfall 1: Economic Intuition

So how, exactly, did I go about investing in cats? Factor funds rely on formulas, preset criteria that tell you which stocks to include and which to chuck out. It’s the idea behind things like value ETFs, which gather groups of shares that share the common characteristic of cheapness. The idea is that put together, they’ll beat the wider market.

My model buys any U.S. company with “cat” in it, like CATerpillar, or when “communiCATion” is in the name. It rebalances quarterly to keep trading costs low. That’s important for when Vanguard or BlackRock license it and charge a competitively low fee.

Full disclosure, I’m a dog person, and believe a company runs better when its spirit animal takes a labradoodle form. But building a dog factor portfolio leaves you with penny stocks like Junkiedog.com Inc., offered at $5 in 2013 and now trading at less than 2 cents.

It just so happens that when I ran the study with cats, it returned nearly 850,000 percent on a six-year backtest. That led me to ex-post facto assign an economic rationale to the benefit of cat-containing names. And although keyboard cat is an internet star, I’m told by Goldman Sachs Asset Management this isn’t a real economic story that would lead to robust returns over time.

“It’s very curious, and I appreciate the effort,” said Nicholas Chan, portfolio manager in the firm’s Quantitative Investment Strategies group. “But you came up with an investment idea that doesn’t have economic intuition. When we come up with an investment hypotheses, we’re economists first and statisticians second.”

BlackRock and Goldman build strategies around factors like value and low volatility because there’s a clear explanation for why they might work: investors under-price boring stocks, for example. By coming up with a thesis only after the results were known, I’ve data snooped my way into an unreliable factor. Unfortunately for me, there’s little evidence that investors are pulled towards catty stocks.

Pitfall 2: P-Hacking

Because of my stubborn desire to produce claw-some returns, I took my thesis and ran with it. Fine, so my first few trials didn’t spit out exactly what I wanted. No biggie, I’ve got the statistical resources of Bloomberg LP at my fingertips — so I tinkered with the data until it did.

At first, I only invested in companies beginning with C – A – T to capture the essence of my investment thesis. But that backtest spit out this:

Not great. But expand the data-set a little, CAT anywhere, and the returns look stellar, making my hypothesis look better. In the scientific community, this is called p-hacking, and it got me into trouble with Ang.

“We’re after broad and consistent sources of returns,” he said. “Since you’ve tweaked it so much, that gives me less confidence that there’s underlying economics in the source.”

If tweaking one minor parameter causes the model to fail, it likely isn’t robust enough to stand the test of time, Ang said. For example, the value factor works no matter if you use price-to-book or price-to-earnings. By overfitting my cat model, I probably picked up on a random past occurrence that’s unlikely to repeat itself.

Pitfall 3: Equal weighting

Smart beta has its roots in the idea that indexes like the S&P 500, weighted by market capitalization, are a dumb idea. To honor its forebears, my portfolio became equal weight. This, as it turns out, gave me a false signal.

A few penny stocks with scant liquidity but big returns dominated. Ang told me that the source of a factor’s returns should be diversified, but the cat factor’s returns were hijacked by the basically untradeable Catskill Litigation Trust, which gained 79,000 percent this year (to trade at one penny).

Similarly, researchers from Ohio State and the University of Cincinnati academics found that most anomalies were imaginary, because their discoverers had used too broad a universe of stocks. Trading edges work best when they’re used on large caps, and all but evaporate on microcaps when trading costs come into play, the academics wrote in a recent paper.

Pitfall 4: The Backtest

My backtest did not hold up to Goldman’s standards.

The real sustainability test comes from whether a factor looks good outside of the original time frame it was run on. Before pitching my factor to Chan, I hadn’t set the cats loose on different periods or other markets to confirm the validity of my anomaly.

“The more you can check off on the list of robustness, the more confidence you can give us. Like time periods, or does it work across large-cap and small-cap stocks, regions and countries,” he said.

Taking Chan’s advice to heart, I turned to Europe. Picking European stocks that contain “gat” (which I figured captured most European translations like the Spanish “gato” and Italian “gatto”), my model underperforms the Stoxx Europe 600 index by 10 percentage points in the five years through January. Hiss.

If I change my model to only capture American cat stocks with a market cap larger than $10 million, my edge disappears again. Over the past five years, that strategy would have returned 42 percent, compared to the S&P 500’s total return of 105 percent.

I presented this evidence to BlackRock’s Ang. His final assessment? “We would pass on the cat factor.” Me-ouch.

Pitfall 5: Cats

Like any enterprising quant, I decided to get another opinion. For this, I conferred with Cliff Asness, founder of AQR Capital Management and a pioneer of factor investing.

“Everything you can sort on can be a factor, but not all factors are interesting. Factors need some economics, theory or intuition even, to be at all interesting to us. Thus the cat factor fails as we have no story for why it should matter at all,” Asness said. “Now, in contrast, we are active traders of the dog and parakeet factors, which are based on hard neo-classical economics married to behavioral finance and machine learning. But the cat factor is just silly.”

He’s got a point. Seems like the tail risks here might be a little high.

— Related on ThinkAdvisor:

Stocks