{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/datascience/variance","result":{"data":{"markdownRemark":{"id":"b1afa624-865c-51f6-8680-681078df76eb","html":"<p><strong>Random Variable:</strong> A numerical function of a random sample, a statistic.\n<strong>Expectation:</strong> Weighted average of the values of X, where the weights are the probabilities of the values\n<strong>Variance:</strong> Expected squared deviation from the expectation of X, the units of variances are the units of X squared.</p>\n<h2 id=\"interpretation-of-variance\" style=\"position:relative;\"><a href=\"#interpretation-of-variance\" aria-label=\"interpretation of variance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Interpretation of Variance</h2>\n<ul>\n<li>Var(X) = E[X^2] - (E[X])^2</li>\n<li>The main use of variance is to quantify chance error</li>\n<li><strong>Chebyshev’s inequality:</strong> The vast majority of the probability is around the expectation plus minus a few standard deviations</li>\n<li>If X is centered, ie E[X] = 0, then Var(X) = E[X^2]</li>\n</ul>\n<h2 id=\"linear-transformations\" style=\"position:relative;\"><a href=\"#linear-transformations\" aria-label=\"linear transformations permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Linear Transformations</h2>\n<ul>\n<li>E[aX + b] = aE[X] + b</li>\n<li>Var(aX+b)= a^2 * Var(X)</li>\n<li>SD(aX+b) = |a|SD(X)</li>\n<li>Var(aX+b)=Var(aX)</li>\n<li>A shift by b units does not affect spread</li>\n<li>The multiplication by a does affect spread</li>\n</ul>\n<h2 id=\"standardization-of-random-variables\" style=\"position:relative;\"><a href=\"#standardization-of-random-variables\" aria-label=\"standardization of random variables permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Standardization of random variables</h2>\n<ul>\n<li>X in standard units = (X-E[X]) / SD(X)</li>\n<li>X in standard units measures the number of SDs from expectation</li>\n<li>It is a linear transformation of X</li>\n<li>E[X_su] = 0, SD[X_su] = 1</li>\n<li>E[X_su^2] = Var[X_su] = 1</li>\n</ul>\n<h2 id=\"variance-of-a-sum---covariance\" style=\"position:relative;\"><a href=\"#variance-of-a-sum---covariance\" aria-label=\"variance of a sum   covariance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Variance of a sum -> Covariance</h2>\n<ul>\n<li>Var[X+Y] = Var[X] + Var[Y] + 2*Covariance</li>\n<li>Covariance = 2E[(x-E[X])(Y-E[Y])] =</li>\n<li>Covariance is 0 if X and Y are independent</li>\n<li>To get right of units for covariance, scale it by the standard deviation to get correlation</li>\n<li>Correlation = Cov[X,Y] / (SD[X] * SD[Y])</li>\n<li>Uncorrelated random variables is when covariance = 0</li>\n</ul>\n<h2 id=\"iid-sample-sum\" style=\"position:relative;\"><a href=\"#iid-sample-sum\" aria-label=\"iid sample sum permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>I.I.D. sample sum</h2>\n<ul>\n<li>independent and identically distributed</li>\n<li>draws at random with replacement from a population are i.i.d.</li>\n<li>E[S_n] = n _ sample mean, Var[S_n]=n<em>(std dev)^2, SD[S</em>n] = sqrt(n) * std dev</li>\n</ul>\n<h2 id=\"model-risk\" style=\"position:relative;\"><a href=\"#model-risk\" aria-label=\"model risk permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Model Risk</h2>\n<ul>\n<li>Mean squared error of prediction</li>\n<li>Model risk = E[(Y-Y(x))^2], Y(x) is a model that we are using</li>\n<li><strong>Chance Error:</strong> Due to randomness alone in the new observations</li>\n<li><strong>Bias:</strong> Non random error due to model being different from the true underlying function</li>\n</ul>\n<h2 id=\"bias-and-overfitting\" style=\"position:relative;\"><a href=\"#bias-and-overfitting\" aria-label=\"bias and overfitting permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Bias and Overfitting</h2>\n<ul>\n<li>Overfitting: small differences in random samples which leads to large differences in the fitted model</li>\n<li>Overfitting solution: Reduce model complexity</li>\n<li>Model risk = std dev^2 + (model bias)^2 + model variance</li>\n</ul>\n<h1 id=\"questions\" style=\"position:relative;\"><a href=\"#questions\" aria-label=\"questions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Questions</h1>","fields":{"slug":"/posts/datascience/variance","tagSlugs":["/tag/notes/","/tag/lecture/","/tag/data-science/"]},"frontmatter":{"date":"2021-10-21T23:46:37.121Z","description":"Notes on variance","tags":["Notes","Lecture","Data Science"],"title":"Data 100 Variance"}}},"pageContext":{"slug":"/posts/datascience/variance"}},"staticQueryHashes":["251939775","401334301","825871152"]}