{"id":665,"date":"2011-05-18T19:08:09","date_gmt":"2011-05-19T02:08:09","guid":{"rendered":"http:\/\/www.dannyadam.com\/blog\/?p=665"},"modified":"2020-02-19T01:18:06","modified_gmt":"2020-02-19T06:18:06","slug":"predicting-y-when-the-dependent-variable-is-a-transformation-of-y","status":"publish","type":"post","link":"https:\/\/www.dannyadam.com\/blog\/2011\/05\/predicting-y-when-the-dependent-variable-is-a-transformation-of-y\/","title":{"rendered":"Predicting Y When the Dependent Variable is a Transformation of Y"},"content":{"rendered":"<p>I am going to try to start posting more frequently. This post covers topics that I&#8217;ve been thinking about lately, including model estimation with ordinary least squares (OLS) and forecasting when OLS is used to fit a statistical model with a dependent variable that is a transformation of some variable we wish to forecast.<\/p>\n<p>Suppose we run a regression with the following specification:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=Y%3D%5Cbeta_%7B0%7D%2B%5Cbeta_%7B1%7DX_%7B1%7D%2B%5Cbeta_%7B2%7DX_%7B2%7D%2B%5Cldots%2B%5Cbeta_%7Bn%7DX_%7Bn%7D%2B%5Cvarepsilon&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"Y=&#92;beta_{0}+&#92;beta_{1}X_{1}+&#92;beta_{2}X_{2}+&#92;ldots+&#92;beta_{n}X_{n}+&#92;varepsilon\" class=\"latex\" \/><\/p>\n<p>Let&#8217;s assume that the error term is distributed normally, and let&#8217;s use OLS to solve for the coefficients in the model. Using a superscript to denote the m observations in the dataset, our <em>m<\/em>-by-<em>n+1<\/em> design matrix <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbf%7BX%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;mathbf{X}\" class=\"latex\" \/> is<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cleft%5B%5Cbegin%7Barray%7D%7Bccccc%7D+1+%26+X_%7B1%7D%5E%7B%281%29%7D+%26+X_%7B2%7D%5E%7B%281%29%7D+%26+%5Ccdots+%26+X_%7Bn%7D%5E%7B%281%29%7D%5C%5C+1+%26+X_%7B1%7D%5E%7B%282%29%7D+%26+X_%7B2%7D%5E%7B%282%29%7D+%26+%5Ccdots+%26+X_%7Bn%7D%5E%7B%282%29%7D%5C%5C+%5Cvdots+%26+%5Cvdots+%26+%5Cvdots+%26+%5Cddots+%26+%5Cvdots%5C%5C+1+%26+X_%7B1%7D%5E%7B%28m%29%7D+%26+X_%7B2%7D%5E%7B%28m%29%7D+%26+%5Ccdots+%26+X_%7Bn%7D%5E%7B%28m%29%7D%5Cend%7Barray%7D%5Cright%5D&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;left[&#92;begin{array}{ccccc} 1 &amp; X_{1}^{(1)} &amp; X_{2}^{(1)} &amp; &#92;cdots &amp; X_{n}^{(1)}&#92;&#92; 1 &amp; X_{1}^{(2)} &amp; X_{2}^{(2)} &amp; &#92;cdots &amp; X_{n}^{(2)}&#92;&#92; &#92;vdots &amp; &#92;vdots &amp; &#92;vdots &amp; &#92;ddots &amp; &#92;vdots&#92;&#92; 1 &amp; X_{1}^{(m)} &amp; X_{2}^{(m)} &amp; &#92;cdots &amp; X_{n}^{(m)}&#92;end{array}&#92;right]\" class=\"latex\" \/><\/p>\n<p>If the coefficient vector is labeled\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7B%5Cbeta%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;overrightarrow{&#92;beta}\" class=\"latex\" \/> and the vector containing the Y variable&#8217;s values is labeled\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7By%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;overrightarrow{y}\" class=\"latex\" \/>, then the OLS estimation for the coefficients can be calculated by solving <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Cmathbf%7BX%7D%5Coverrightarrow%7B%5Cbeta%7D%3D%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Coverrightarrow%7By%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;mathbf{X}^{&#92;top}&#92;mathbf{X}&#92;overrightarrow{&#92;beta}=&#92;mathbf{X}^{&#92;top}&#92;overrightarrow{y}\" class=\"latex\" \/> for\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7B%5Cbeta%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;overrightarrow{&#92;beta}\" class=\"latex\" \/>.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7B%5Cbeta%7D%3D%5Cleft%28%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Cmathbf%7BX%7D%5Cright%29%5E%7B-1%7D%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Coverrightarrow%7By%7D&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;overrightarrow{&#92;beta}=&#92;left(&#92;mathbf{X}^{&#92;top}&#92;mathbf{X}&#92;right)^{-1}&#92;mathbf{X}^{&#92;top}&#92;overrightarrow{y}\" class=\"latex\" \/><\/p>\n<p>Now we have a set of coefficients that we can use to predict values of Y when we receive additional observations that have values for our independent variables X<sub>1<\/sub>, X<sub>2<\/sub>, &#8230;, X<sub>n<\/sub>, and no observed values of Y. Everything is fine.<\/p>\n<p><!--more-->Now suppose that our original specification was not a good fit for the data and the error terms were not distributed normally. After messing around with the specification, we find that taking the natural logarithm of Y produces a much better fit, and has normally distributed errors. Our new specification is<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Clog%5Cleft%28Y%5Cright%29%3D%5Cbeta_%7B0%7D%2B%5Cbeta_%7B1%7DX_%7B1%7D%2B%5Cbeta_%7B2%7DX_%7B2%7D%2B%5Cldots%2B%5Cbeta_%7Bn%7DX_%7Bn%7D%2B%5Cvarepsilon&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;log&#92;left(Y&#92;right)=&#92;beta_{0}+&#92;beta_{1}X_{1}+&#92;beta_{2}X_{2}+&#92;ldots+&#92;beta_{n}X_{n}+&#92;varepsilon\" class=\"latex\" \/><\/p>\n<p>If we repeat the steps described above, we can estimate a new set of coefficients by calculating <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7B%5Cbeta%7D%3D%5Cleft%28%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Cmathbf%7BX%7D%5Cright%29%5E%7B-1%7D%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Clog%5Cleft%28%5Coverrightarrow%7By%7D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;overrightarrow{&#92;beta}=&#92;left(&#92;mathbf{X}^{&#92;top}&#92;mathbf{X}&#92;right)^{-1}&#92;mathbf{X}^{&#92;top}&#92;log&#92;left(&#92;overrightarrow{y}&#92;right)\" class=\"latex\" \/> where I use <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Clog%5Cleft%28%5Coverrightarrow%7By%7D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;log&#92;left(&#92;overrightarrow{y}&#92;right)\" class=\"latex\" \/> to denote the vector that is returned when taking the natural logarithm of each element of\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Coverrightarrow%7By%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;overrightarrow{y}\" class=\"latex\" \/>.<\/p>\n<p>We now have a set of coefficients that we can use to predict values of log(Y) when we receive additional observations that have values for our independent variables X<sub>1<\/sub>, X<sub>2<\/sub>, &#8230;, X<sub>n<\/sub>, and no observed value of Y. However, we want to predict Y. It may seem natural to just predict Y by taking the exponential function of the predicted value of log(Y). This is where a problem occurs. Our regression is basically returning coefficient values that can be used to express the expected value of the dependent variable, conditioned on the observed values of the independent variables. More formally,<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbb%7BE%7D%5Cleft%5B%5Clog%5Cleft%28Y%5Cright%29%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D%5Cright%5D%3D%5Cbeta_%7B0%7D%2B%5Cbeta_%7B1%7DX_%7B1%7D%2B%5Cbeta_%7B2%7DX_%7B2%7D%2B%5Cldots%2B%5Cbeta_%7Bn%7DX_%7Bn%7D&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;mathbb{E}&#92;left[&#92;log&#92;left(Y&#92;right)|X_{1},X_{2},&#92;ldots,X_{n}&#92;right]=&#92;beta_{0}+&#92;beta_{1}X_{1}+&#92;beta_{2}X_{2}+&#92;ldots+&#92;beta_{n}X_{n}\" class=\"latex\" \/><\/p>\n<p>While it may seem natural to just take the exponential function of the predicted value of log(Y) to predict Y, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Jensen's_inequality\">Jensen&#8217;s Inequality<\/a> implies that\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbb%7BE%7D%5Cleft%5Bf%5Cleft%28X%5Cright%29%5Cright%5D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;mathbb{E}&#92;left[f&#92;left(X&#92;right)&#92;right]\" class=\"latex\" \/> is not necessarily equal to\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=f%5Cleft%28%5Cmathbb%7BE%7D%5Cleft%5BX%5Cright%5D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"f&#92;left(&#92;mathbb{E}&#92;left[X&#92;right]&#92;right)\" class=\"latex\" \/>, where X is a random variable and <em>f<\/em> is some function of that random variable. We can&#8217;t simply take the exponential function of the predicted value of log(Y). Fortunately, for the log-linear model described above, the adjustment is very simple. Since <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Clog%5Cleft%28Y%5Cright%29%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;log&#92;left(Y&#92;right)|X_{1},X_{2},&#92;ldots,X_{n}\" class=\"latex\" \/>\u00a0is distributed normally, we know that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=Y%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"Y|X_{1},X_{2},&#92;ldots,X_{n}\" class=\"latex\" \/> follows a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Log-normal_distribution\">log-normal distribution<\/a> (see my <a href=\"https:\/\/www.dannyadam.com\/blog\/2011\/03\/probability-distributions-reference-table\/\">prior post<\/a>\u2014which included a probability distributions reference table\u2014for more details on the log-normal distribution). The expected value of a log-normal distribution is\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=e%5E%7B%5Cmu%2B%5Ctfrac%7B1%7D%7B2%7D%5Csigma%5E%7B2%7D%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"e^{&#92;mu+&#92;tfrac{1}{2}&#92;sigma^{2}}\" class=\"latex\" \/>, where\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmu&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;mu\" class=\"latex\" \/> is the mean of the underlying normal distribution and\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Csigma%5E%7B2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;sigma^{2}\" class=\"latex\" \/> is its variance. Using this property, we can predict Y by taking the exponential function of the predicted value of log(Y) plus one half of the variance of the error terms in the model, which I will call the mean squared error (MSE).<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbb%7BE%7D%5Cleft%5BY%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D%5Cright%5D%3D%5Cexp%5Cleft%28%5Cmathbb%7BE%7D%5Cleft%5B%5Clog%5Cleft%28Y%5Cright%29%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D%5Cright%5D%2B%5Cfrac%7B1%7D%7B2%7DMSE%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;mathbb{E}&#92;left[Y|X_{1},X_{2},&#92;ldots,X_{n}&#92;right]=&#92;exp&#92;left(&#92;mathbb{E}&#92;left[&#92;log&#92;left(Y&#92;right)|X_{1},X_{2},&#92;ldots,X_{n}&#92;right]+&#92;frac{1}{2}MSE&#92;right)\" class=\"latex\" \/><\/p>\n<p>The MSE can be calculated by taking the model&#8217;s sum of squared errors, and dividing by the number of observations minus the number of independent variables in the model minus 1.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=MSE%3D%5Cfrac%7B%5Cleft%28%5Cmathbf%7BX%7D%5Coverrightarrow%7B%5Cbeta%7D-%5Coverrightarrow%7By%7D%5Cright%29%5E%7B%5Ctop%7D%5Cleft%28%5Cmathbf%7BX%7D%5Coverrightarrow%7B%5Cbeta%7D-%5Coverrightarrow%7By%7D%5Cright%29%7D%7Bm-n-1%7D&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"MSE=&#92;frac{&#92;left(&#92;mathbf{X}&#92;overrightarrow{&#92;beta}-&#92;overrightarrow{y}&#92;right)^{&#92;top}&#92;left(&#92;mathbf{X}&#92;overrightarrow{&#92;beta}-&#92;overrightarrow{y}&#92;right)}{m-n-1}\" class=\"latex\" \/><\/p>\n<p>Using\u00a0<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbb%7BE%7D%5Cleft%5B%5Clog%5Cleft%28Y%5Cright%29%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D%5Cright%5D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;mathbb{E}&#92;left[&#92;log&#92;left(Y&#92;right)|X_{1},X_{2},&#92;ldots,X_{n}&#92;right]\" class=\"latex\" \/> and the calculated MSE, we now have our formula for predicting Y.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cmathbb%7BE%7D%5Cleft%5BY%7CX_%7B1%7D%2CX_%7B2%7D%2C%5Cldots%2CX_%7Bn%7D%5Cright%5D&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"&#92;mathbb{E}&#92;left[Y|X_{1},X_{2},&#92;ldots,X_{n}&#92;right]\" class=\"latex\" \/><br \/>\n<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%3D%5Cexp%5Cleft%28%5Cbeta_%7B0%7D%2B%5Cbeta_%7B1%7DX_%7B1%7D%2B%5Cbeta_%7B2%7DX_%7B2%7D%2B%5Cldots%2B%5Cbeta_%7Bn%7DX_%7Bn%7D%2B%5Cfrac%7B%5Cleft%28%5Cmathbf%7BX%7D%5Coverrightarrow%7B%5Cbeta%7D-%5Coverrightarrow%7By%7D%5Cright%29%5E%7B%5Ctop%7D%5Cleft%28%5Cmathbf%7BX%7D%5Coverrightarrow%7B%5Cbeta%7D-%5Coverrightarrow%7By%7D%5Cright%29%7D%7B2%5Ccdot%5Cleft%28m-n-1%5Cright%29%7D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=1&#038;c=20201002\" alt=\"=&#92;exp&#92;left(&#92;beta_{0}+&#92;beta_{1}X_{1}+&#92;beta_{2}X_{2}+&#92;ldots+&#92;beta_{n}X_{n}+&#92;frac{&#92;left(&#92;mathbf{X}&#92;overrightarrow{&#92;beta}-&#92;overrightarrow{y}&#92;right)^{&#92;top}&#92;left(&#92;mathbf{X}&#92;overrightarrow{&#92;beta}-&#92;overrightarrow{y}&#92;right)}{2&#92;cdot&#92;left(m-n-1&#92;right)}&#92;right)\" class=\"latex\" \/><\/p>\n<p>The formula above can be used to predict values of Y using the coefficients estimated by OLS for our log-linear model, and observed values of the independent variables. However, the adjustment is not always so simple. In some cases, more work would be required to calculate how to predict Y when other functions besides the natural logarithm are used to transform Y. In such cases, it might be a good idea to use a different method to model the data, as opposed to trying to transform the variables such that coefficients can be estimated using OLS. While the OLS results often have an intuitive appeal, which can be useful for inference, other models might be more suited for forecasting.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am going to try to start posting more frequently. This post covers topics that I&#8217;ve been thinking about lately, including model estimation with ordinary least squares (OLS) and forecasting when OLS is used to fit a statistical model with a dependent variable that is a transformation of some variable we wish to forecast. Suppose [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[1],"tags":[8,15,14],"class_list":["post-665","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-math","tag-regressions","tag-statistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1sCC6-aJ","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/665","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/comments?post=665"}],"version-history":[{"count":112,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/665\/revisions"}],"predecessor-version":[{"id":25539,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/665\/revisions\/25539"}],"wp:attachment":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/media?parent=665"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/categories?post=665"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/tags?post=665"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}