- azurecoder

# Kusto Lightning Fact 6: series-fir() - Generating Moving Averages for Time Series data

A key initiative and a first step in introducing time series analytics into a data model is to generate moving averages. Indeed, the most methodology ARIMA, **A**uto**R**egressive **I**ntegrated**M**oving**A**verages.

In this post we'll focus on the moving average part and in subsequent posts we'll focus on regression.

Moving averages are incredibly useful in that they allow us to compress and smooth out the volume of data effectively averaging a group of close data points into a single one so the period that the moving average is calculated is all important.

It gives for a much smoother understanding of data over time without the peaks, troughs and anomalies.

We'll consider our sales dataset again which will pick up from the previous post which described the use of __make-series__ to create a series capability using KQL.

```
let series = sales
| where getyear(Order_Date) == 2010 and (getmonth(Order_Date) == 1
or getmonth(Order_Date) == 2 or getmonth(Order_Date) == 3) and (Country == "United Kingdom")
| summarize UnitsSold = sum(Units_Sold) by Country, Timestamp = Order_Date
| order by Timestamp asc;
let min_t = toscalar(series | summarize min(Timestamp));
let max_t = toscalar(series | summarize max(Timestamp));
series
| make-series NumSold=sum(UnitsSold) default=0 on Timestamp in range(min_t, max_t, 1d) by Country
| extend NumSoldMA=series_fir(NumSold, repeat(1, 8))
| render timechart
```

The query above looks at the the first 3 months of 2010 for United Kingdom "units sold" going through the motions of creating a series like we did in the previous post. However, you'll see the addition of the **series_fir()** (**f**inite **i**mpulse **r**esponse) function. This taks our newly created series **NumSold **a period to roll up data points **repeat(1, 8) **- or 8 data points and creates a a moving average. The original series is in blue and the new moving average series is in red** **as in the below graph. You should be able to see the different immediately with the overall up and down trends captured without the spikes. Lowering the repeat period will increase the size of the spikes and troughs in our moving average series.

One key aspect of generating this new moving average series is being able to capture the trend. If we then use that trend to generate the difference between the real numbers in the data we're left a bit of a random walk showing how the series varies from the moving average.

```
let series = sales
| where getyear(Order_Date) == 2010 and (Country == "United Kingdom")
| summarize UnitsSold = sum(Units_Sold) by Country, Timestamp = Order_Date
| order by Timestamp asc;
let min_t = toscalar(series | summarize min(Timestamp));
let max_t = toscalar(series | summarize max(Timestamp));
series
| make-series NumSold=sum(UnitsSold) default=0 on Timestamp in range(min_t, max_t, 1d) by Country
| extend NumSoldMA=series_fir(NumSold, repeat(1, 8))
| extend Residuals=series_subtract(NumSold, NumSoldMA)
| render timechart
```

We've now extended our dataset with a new column called **Residuals **which is a simple subtraction of our 2 time series. The original one and the one we created from the moving average. The data in the above query is extended to over a year which is looking at at a 10-day moving average so the peaks and troughs will be closer to the actual peaks and troughs as you can see below. The original series is in blue and moving average series in red. The new series which looks like a random walk is in green and shows how the actual changes in data vary from our calculated moving average.

In the next couple of posts we'll look at using **regression **in time series and continue on with the concept of **seasonality**.

Keep reading!