I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?
And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…
teruakohatu 3 hours ago [-]
What is not generally understood is that these models don’t predict egg prices or inflation in Italy.
They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.
They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
lordgrenville 20 minutes ago [-]
That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")
graemep 18 minutes ago [-]
Do these models predict on just a single time series then?
it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.
cybrox 3 hours ago [-]
Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.
jofzar 2 hours ago [-]
I mean it's super obvious, it's directly tied to scrubs popularity.
New season of scrubs = new war in the middle east.
rubyn00bie 2 hours ago [-]
I totally agree with the sentiment but from what I can tell, I’d say they tend happen immediately before or after markets open and close. Essentially, and to their maximum, screwing absolutely everyone who isn’t in the clique from participating in the trade.
FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.
0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.
perks_12 2 hours ago [-]
I am not familiar with time series models, but judging from your answer, it would be necessary to feed long time series into this model for it to detect trends. What is a token here? Can it, for the lack of a better example, take in all intraday movements of a stock for a day, a week, a month, etc?
teruakohatu 1 hours ago [-]
I tend to avoid time series forecasting when I can help it because I find it hard to communicate to stakeholders that a neural network (or another method) is not an oracle.
If you are talking about granularity of observations, it would depend on what you are trying to predict (the price in an hour or the price in 12 months?) and how quickly you need the prediction (100ms? Tomorrow morning?). If I had infinite data I would use granularity as a hyper parameter and tune that to a level that produced the best test results.
I am for example currently using weekly averages for non-price data forecasting. I could use daily data but weekly is absolutely adequate for this purpose.
ReptileMan 1 hours ago [-]
It is the Middle East. Wars are always in season. And supply is more than the demand.
visarga 3 hours ago [-]
ARIMA and ARMA models
d--b 3 hours ago [-]
The main issue is that people do use them to predict bitcoin prices intraday and that sort of things.
nico 2 hours ago [-]
Is it an issue because it works, or because it doesn’t? Or because it’s bitcoin?
I genuinely want to know. Thank you
d--b 55 minutes ago [-]
It is an issue because bitcoin is highly unpredictable.
These tools are good at predicting timeseries that are in fact quite predictable. Like insurances will use this to estimate the number of people who will die from cancer in the next year, the year after that, and so on up to 50 years in the future. The model will extrapolate the progresses made in cancer treatment from the current trend, etc. It is a prediction, cause it's still possible that a breakthrough comes in and suddenly people don't die from a certain form of cancer, but generally it should be roughly correct.
Bitcoin prices are a lot more chaotic, influenced by a ton of unrelated events that shape its path a certain way. There is absolutely no certainty that studying the shape of its past evolution will help in any way understand its future evolution.
Of course here I mean by studying its price alone. If you add more information, like who's behind each trend and why, you have a much better sense of what could happen next.
pasanhk 3 hours ago [-]
[dead]
annie511266728 45 minutes ago [-]
It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.
The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.
People ask how one model can understand everything, but that assumes there’s any understanding involved at all.
At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?
lovelearning 3 hours ago [-]
My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.
As they say in appendix 8:
> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:
> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.
> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.
> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.
If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.
And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.
eru 2 hours ago [-]
> How can the same model predict egg prices in Italy, and global inflation in a reliable way?
How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?
cenamus 2 hours ago [-]
It can't compress pictures of everything in a reliable way.
Text and anything with lots of high frequency components looks terrible
eru 39 minutes ago [-]
It still doesn't pretty well on text. And we have newer formats and ideas that would also deal with that. (To be really dead simple: have a minimal container format that decides between png or jpg, use png for text.)
However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.
Similar for real world time series: reality mostly doesn't look like white noise.
at_compile_time 2 hours ago [-]
Reliably terrible.
benob 3 hours ago [-]
I would say:
- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors
- memorization: some patterns are recurrent in many domains such as power low
- multitask: exploit cross-domain connections such as weather vs electricity
pplonski86 18 minutes ago [-]
Can someone explain ELI5 how it does work? and how many data points it can read?
So the time series are provided with no context? It's just trained on lots of sets of numbers? Then you give it a new set of numbers and it guesses the rest, again with no context?
My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.
That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.
Tarq0n 2 hours ago [-]
If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.
ra 3 hours ago [-]
This has been around a few months now, has anyone built anything on it?
Foobar8568 3 hours ago [-]
Somehow I missed that one.
Are there any competition on this?
I always had difficulties with ML and time series, I'll need to try that out.
Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when compared to LLMs so interested to know how close this is to something trainable by your average hobbyist/university/small lab
OliverGuy 2 hours ago [-]
Edit, it looks like the paper does
TPUv5e with 16 tensor cores for 2 days for the 200M param model.
Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs
This has been around a few months now, has anyone built anything on it?
magimas 2 hours ago [-]
we did some internal tests.
The quality isn't bad, it works quite well. But it's essentially on the same level of an ARIMA model trained on the data just much bigger and slower.
So in my opinion it currently falls into a kind of void. If your use case is worth predicting and you put a data scientist on it, you're better off just training cheaper ARIMA models.
clarionbell 2 minutes ago [-]
That is disappointing. One would say that with all the budget and compute, Google would be able to create something that beats methods from 70s. Maybe we are hitting some hard limits.
Maybe it would be better to train an LLM with various tuning methodologies and make a dedicated ARIMA agent. You throw in data, some metadata and requested window of forecast. Out comes parameters for "optimal" conventional model.
emsign 2 hours ago [-]
Can this finally break the stock markets?
jdthedisciple 3 hours ago [-]
Let me be blunt: Shannon would tell us that time forecasting is bullshit:
There is infinitely more entropy in the real world out there than any model can even remotely capture.
The world is not minecraft.
mikkom 2 hours ago [-]
Yeah all weather forecasts are just magic
kgwgk 2 hours ago [-]
Whether forecasting is simple: it either rains or it doesn’t. 50/50 probability!
tgv 2 hours ago [-]
Weather forecasts are notoriously iffy, and accuracy drops with time, but we understand the physics behind it (to a large extent). There's also a lot of fine-grained data available. For some arbitrary time series, there's only one data sequence, and the model is unknown. Extrapolation then becomes a lot more magical.
eru 2 hours ago [-]
And JPG doesn't work either..
charlotte12345 2 hours ago [-]
[dead]
Rendered at 08:58:51 GMT+0000 (Coordinated Universal Time) with Vercel.
And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…
They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.
They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.
New season of scrubs = new war in the middle east.
FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.
0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.
If you are talking about granularity of observations, it would depend on what you are trying to predict (the price in an hour or the price in 12 months?) and how quickly you need the prediction (100ms? Tomorrow morning?). If I had infinite data I would use granularity as a hyper parameter and tune that to a level that produced the best test results.
I am for example currently using weekly averages for non-price data forecasting. I could use daily data but weekly is absolutely adequate for this purpose.
I genuinely want to know. Thank you
These tools are good at predicting timeseries that are in fact quite predictable. Like insurances will use this to estimate the number of people who will die from cancer in the next year, the year after that, and so on up to 50 years in the future. The model will extrapolate the progresses made in cancer treatment from the current trend, etc. It is a prediction, cause it's still possible that a breakthrough comes in and suddenly people don't die from a certain form of cancer, but generally it should be roughly correct.
Bitcoin prices are a lot more chaotic, influenced by a ton of unrelated events that shape its path a certain way. There is absolutely no certainty that studying the shape of its past evolution will help in any way understand its future evolution.
Of course here I mean by studying its price alone. If you add more information, like who's behind each trend and why, you have a much better sense of what could happen next.
The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.
People ask how one model can understand everything, but that assumes there’s any understanding involved at all.
At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?
As they say in appendix 8:
> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:
> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.
> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.
> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.
If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.
And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.
How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?
Text and anything with lots of high frequency components looks terrible
However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.
Similar for real world time series: reality mostly doesn't look like white noise.
- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors
- memorization: some patterns are recurrent in many domains such as power low
- multitask: exploit cross-domain connections such as weather vs electricity
My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.
That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.
I always had difficulties with ML and time series, I'll need to try that out.
https://moment-timeseries-foundation-model.github.io/
https://arxiv.org/abs/2403.07815
A friend at work used one to predict when our CEO would post in Slack, which is verry entertaining to see if correct.
[1] https://priorlabs.ai/tabpfn
TPUv5e with 16 tensor cores for 2 days for the 200M param model.
Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs
So in my opinion it currently falls into a kind of void. If your use case is worth predicting and you put a data scientist on it, you're better off just training cheaper ARIMA models.
Maybe it would be better to train an LLM with various tuning methodologies and make a dedicated ARIMA agent. You throw in data, some metadata and requested window of forecast. Out comes parameters for "optimal" conventional model.
There is infinitely more entropy in the real world out there than any model can even remotely capture.
The world is not minecraft.