Luis Campos (with Luke Miratrix)

4/19/2016

**Electricity consumption over time**

- Electric companies supply
**a large number**\( (N) \) of households with electricity - Automated sensors provide
**detailed measurements**of usage over time \( \ f_i(t) \) (e.g. 30 sec. grid) - Would like to
**estimate of the average or overall**electricity use as a function of time, \( \ \mu(t) \) or \( \ N\mu(t) \)

**Storage and transmission** of this data starts becoming an issue

- Generally the company
**does not have detailed usage**\( \ f_i(t) \)- The sensors are attached to people's homes

- Transmitting and storing all this data could get out of hand
- remeber: time grid

- But
**total usage**\( (x_i) \) is either automatically or manually gathered for billing \( (!) \)- This will prove useful later on

**Some requirements**:

- Need good estimates of average
- Sample should be useful for more than just \( \mu(t) \)
- ($)
- e.g. mean estimate of high-volume users

**Summary**:

- Lots of data from a known
**population** - Can't
**afford**to**sample**all of them - One strategy that may be effective:
**sampling**

**Population of units**- households,
*sampling frame*: \( i \in \{1, ..., N\} \) - sensor measurements: \( \ f_i(t), t = t_1, ..., t_T \)
- assume same time grid

- households,
Population

**quantity of interest**, e.g. \( \ \mu(t) = \frac{1}{N}\sum_{i = 1}^N f_i(t) \)One question:

**How do we estimate**\( \ \mu(t) \) from a sample?- Some solutions: Simple average, Horvitz-Thompson Estimate, Hajek (Ratio) Estimator, Model-assisted estimators, Model-based estimators… Will depend on how we sample.

**Sampling indicator** \( \ S_i \) and **selection probability** \( \ \pi_i \)

**Bernoulli Sampling**: \( \ \pi_i = n/N \) and \( \ S_i \stackrel{ind}{\sim} Bern (\pi_i) \)**Simple Random Sampling**: slightly more complicated, i.e. dependend, but possible**Poisson Sampling**, using auxiliary information:- \( \pi_i \propto x_i \)
- \( S_i \stackrel{ind}{\sim} Bern (\pi_i) \)

- There are many other options.
- We're going to use
**Poisson sampling**, more on this later.

*Sample average* \( (S) \), *Horvitz-Thompson* \( (HT) \), *Hajek* \( (H) \): for \( \ t \)
\[
\begin{align}
&\widehat\mu_{S}(t) = \frac{1}{n}\sum_{i = 1}^N S_i\ f_i(t),& &\widehat\mu_{HT}(t)= \frac{1}{N}\sum_{i = 1}^N\frac{S_i}{\pi_i} f_i(t)\\
&\widehat\mu_{H}(t) = \frac{1}{\widehat N}\sum_{i = 1}^N\frac{S_i}{\pi_i} f_i(t),& & \widehat N = \sum_{i = 1}^N\frac{S_i}{\pi_i}
\end{align}
\]

\( \widehat N \) is called the **sample weight**

- \( \mathbb{E}[\widehat N] = N \), \( \ \ H \) is a stable version of \( HT \)

**Cardot, Goga, Lardin [2013a, b, 2014, 2015]** - Asymptotics, properties, cool data, and more estimators.

Previous month's everage usage \( \ (x_i) \), current usage \( (\ f_i(t)) \). \( i \in \{1, ..., N\} \), \( t \in \{t_1, ..., t_T\} \)

**Population**:
\[
\begin{align}
&x_i \sim Gamma(2)\\
&f_i(t) \sim 10 + x_i(1 + sin(x_i)) + \varepsilon_i(t) \\
&[\varepsilon_i(t_1), ..., \varepsilon_i(t_T)] \sim N_T({\bf{0}}, K) \\
&(K)_{kl} = exp(-(t_k - t_l)^2/2)
\end{align}
\]

**Sampling Mechanism** - Poisson Sampling:

\[ \begin{align} \pi_i = 100\frac{x_i}{\sum_{i = 1}^N x_i},\ \ S_i \stackrel{ind}{\sim} Bern(\pi_i),\ \ n = \sum_{i = 1}^N S_i,\ \ \mathbb{E}(n) = 100 \end{align} \]

Root Mean Squared Error:

\[ Err(\widehat{\mu}_{*}, {\mu}) = \left(\frac{1}{T}\sum_{t = t_1}^{t_T}\left(\widehat{\mu}_{*}(t) - {\mu}(t)\right)^2 \right)^{1/2} \]

How did the Estimators do?

For this example, the RMSEs are:

Estimate | Error |
---|---|

Simple Estimate | 1.759 |

H-T Estimate | 0.309 |

Hajek Estimate | 0.229 |

**Given the population, repeat 1,000 times**:

- Take a sample: \( \ S_i\sim Bern(\pi_i) \)
- Calculate estimators: \( \ \widehat{\mu}_{*}(t) \)
- Calculate errors: \( Err(\widehat{\mu}_{*}, {\mu}) \)

**Simulation Results**:

Error of Estimates:

Estimate | Mean | SD | min | max |
---|---|---|---|---|

Simple Average | 1.48 | 0.22 | 0.59 | 2.35 |

H-T Estimate | 1.25 | 0.97 | 0.07 | 9.16 |

Hajek Estimate | 0.26 | 0.15 | 0.06 | 1.33 |

Covariance between \( \hat\mu_{HT}(r) \) and \( \hat\mu_{HT}(t) \):

\[ \gamma_{HT}(r, t) = \frac{1}{N^2}\left[\sum_{i\in U}\frac{(1-\pi_i)}{\pi_i} f_i(r)\ f_i(t)\right] \]

An estimate (**H-T**-ification)

\[ \widehat\gamma_{HT}(r, t) = \frac{1}{N^2}\left[\sum_{i\in U} \color{red}{\frac{S_i}{\pi_i} }\frac{(1-\pi_i)}{\pi_i} f_i(r)\ f_i(t)\right] \]

- \( \widehat\gamma_{H}(r, t) \) is similar, but more complex

**Simulation based inference**: Gelman and Hill (Ch. 7, 8), Cardot 2013a.

**Sample**\( J \) times from \[ \mu_j^*(t) \sim GP(\hat\mu_{HT}(t), \Sigma),\ \ \ (\Sigma)_{ik} = \widehat\gamma_{HT}(t_i, t_k)\\ \]**Take quantiles**of \( \{\mu_j^*(t)\} \) for each \( \ t = t_1, ..., t_T \)

- We share information across \( t \)
- Simple to implement \( (GP := N_T) \)

Consider the following:

- \( \pi_1 = 1 \), this household is always selected
- \( \pi_{153} = 1/2 \), this household selected half the time
- \( \pi_{42} = 1/100 \), this household is rarely selected

**One interpretation** for weights:

- \( 1/\pi_{1} = 1 = 1 + 0 \)
- \( 1/\pi_{153} = 2 = 1 + 1 \)
- \( 1/\pi_{42} = 100 = 1 + 99 \)

**We can write**: \( 1/\pi_i= 1 + (1/\pi_i - 1) \)

**Why this interpretation?** \( \ \mathbb{E}\left(\sum_{i = 1}^N\frac{S_i}{\pi_i}\right) = N \)

Continuing this logic

\[ \begin{align} &\mu(t) = \frac{1}{N} \sum_{i = 1}^N f_i(t) = \frac{1}{N} \left(\sum_{i \in {\bf{S}}} f_i(t) + \color{red}{ \sum_{i \notin {\bf{S}}} f_i(t) }\right)\\ \end{align} \]

\[ \begin{align} \hat\mu_H(t) &= \frac{1}{N} \sum_{i = 1}^N \frac{S_i}{\pi_i} \ f_i(t)\\ &=\frac{1}{N} \left(\sum_{i \in {\bf{S}}} f_i(t) + \color{red}{ \sum_{i \in {\bf{S}}} \left(\frac{1}{\pi_i} - 1\right)\ f_i(t)}\right) \end{align} \]

- We
**are modeling**the unobserved population quantities - We
**could model**better

Why model?

- Even simple modeling has been shown to outperform weighting
- Little et.al. (2013, 2015) - penalized splines to model univariate outcomes
- Si, Pillai, Gelman (2015) - GP to model univariate outcomes

- Models are easily
**extendable**. Estimators maybe not. - Modern computation allows for
**sophisticated modeling** - \( \pi_i \) may not mean what we think it means
- incorporate
**prior information*** **stability of estimates***

What Model? GP is **natural model** for functional data.

Model \( \ f_i(t) \) as a function of time and the **auxiliary variable** \( x_i \)

\[ \begin{align*} f_i(t) = f(t, x_i) &\sim GP(\nu(t, x_i), K)\\ K\left((t, x), (t', x')\right) &= \sigma_1^2 exp\left(-\frac{(t - t')^2}{l_1^2} -\frac{(x - x')^2}{l_2^2}\right)\\ &+\ \sigma_2^2 \delta_{x = x'} exp\left(-\frac{(t - t')^2}{l_3^2}\right) + \sigma_3^2 \delta_{x = x',\ t = t'} \end{align*} \]

\[ X_{obs} = \left( \begin{array}{cc} t_1 & x_1 \\ \vdots & \vdots\\ t_T & x_1 \\ \vdots & \vdots\\ t_1 & x_n \\ \vdots & \vdots\\ t_T & x_n \end{array} \right), Y_{obs} = \left( \begin{array}{c} f_1(t_1) \\ \vdots\\ f_1(t_T) \\ \vdots\\ f_n(t_1) \\ \vdots\\ f_n(t_T) \end{array} \right) \]

We can similarly define \( X_{*} \), the locations of **unobserved** function values and \( Y_{*} \) the unobserved function values.

We can model the **unobserved outcomes**.

Lifted straight out of Rasmussen & Williams (R&W 2006)

\[ \begin{align*} &Y_{*}|X_{obs}, Y_{obs}, X_{*} \sim N(\nu^*, \Sigma^*)\\ &\nu^* = K(X_{*}, X_{obs})\left[K(X_{obs}, X_{obs}) + \sigma^2 I\right]^{-1}Y_{obs}\\ &\Sigma^* = K(X_{*}, X_{obs}) + \sigma^2 I \\ &\ \ \ \ \ + K(X_{*}, X_{obs})\left[K(X_{obs}, X_{obs}) + \sigma^2 I\right]^{-1}K(X_{obs}, X_{*}) \end{align*} \]

**Challenges**:

- For \( \ n = 100 \), \( \ X_{obs} \) has \( \ 5100 \) rows.
- Imputation dataset, \( \ N-n = 900 \), is
**very big**.

**Solutions**:

- Fit with Sparse-GP (Hensman, et.al. 2013, CH 8 R&W 2006)
- select \( \ m \) inducing points (anchor points)
- replace: \( \ K_{N'N'} \approx K_{N'm}K_{mm}^{-1}K_{mN'} \) (NystrÃ¶m approx.)

- Impute in chunks (more on this later)