{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/datascience/inference","result":{"data":{"markdownRemark":{"id":"03dfd9ea-e514-56a0-93bd-dde4296be440","html":"<h1 id=\"inference\" style=\"position:relative;\"><a href=\"#inference\" aria-label=\"inference permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Inference</h1>\n<ul>\n<li>Moving from premises to logical consequences</li>\n<li>Induction is inference from a particular premise to a universal conclusion</li>\n<li>Statistical inference: Using data analysis to deduce properties of underlying distribution</li>\n</ul>\n<h2 id=\"prediction-vs-inference\" style=\"position:relative;\"><a href=\"#prediction-vs-inference\" aria-label=\"prediction vs inference permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Prediction vs Inference</h2>\n<ul>\n<li>Prediction: Using our model to make predictions for unseen data. (Given attributes of house, how much is it worth?) We don’t care about how</li>\n<li>Inference: Using our model to draw conclusions about the underlying true relationships between our features and response. (How much extra will a house be worth if it has a view of the river?) We care about model parameters that are interpretable and meaningful</li>\n</ul>\n<h2 id=\"statistical-inference\" style=\"position:relative;\"><a href=\"#statistical-inference\" aria-label=\"statistical inference permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Statistical Inference</h2>\n<ul>\n<li>Draw conclusions about a population parameters given only a random sample</li>\n<li>Parameter: some function of population, ie population mean</li>\n<li>\n<p>Estimator: some function of a sample, whose goal is to estimate a population parameter, ie sample mean. Estimators are random variables</p>\n<ul>\n<li>Bias of an estimator: difference between estimator’s expected value and the true value of the parameter being estimated.</li>\n<li>Variance of an estimator: Expected squared deviation of an estimator from its mean</li>\n</ul>\n</li>\n</ul>\n<h2 id=\"bootstrapping\" style=\"position:relative;\"><a href=\"#bootstrapping\" aria-label=\"bootstrapping permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Bootstrapping</h2>\n<ul>\n<li>Idea: treat our random sample as a population and resample from it</li>\n<li>\n<p>Psuedocode</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">collect random sample of size n(aka bootstrap population)\ninitialize list of estimates\nrepeat 10,000 times:\n    resample with replacement from bootstrap population\n    apply estimator f to resample\n    store in list\nlist of estimates is the bootstrapped sampling distribution of f</code></pre></div>\n</li>\n<li>The median cannot be accurately drawn from bootstrapping</li>\n<li>If the sample is too small, bootstrapping won’t work</li>\n</ul>\n<h2 id=\"confidence-interval\" style=\"position:relative;\"><a href=\"#confidence-interval\" aria-label=\"confidence interval permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Confidence Interval</h2>\n<ul>\n<li>\n<p>What does a confidence interval p% mean?</p>\n<ul>\n<li>If we take a sample from the population and compute P% confidence interval for the true population parameter, and repeat this many times, our population parameter will be in our interval P% of the time.</li>\n</ul>\n</li>\n<li>to compute confidence interval(s,f,P), approximate sampling distribution of f using sample s. Choose middle P% of samples from this approximate distribution</li>\n<li>A 95% confidence interval does not mean that there is a 95% chance that the population parameter is in the interval, either the population parameter is in or isn’t</li>\n</ul>\n<h2 id=\"bootstrapping-model-parameters\" style=\"position:relative;\"><a href=\"#bootstrapping-model-parameters\" aria-label=\"bootstrapping model parameters permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Bootstrapping Model Parameters</h2>\n<ul>\n<li>Our estimate for theta depends on what our training data was.</li>\n<li>We want to think about all of the different ways that our training data and our parameter estimate could have come out</li>\n<li>\n<p>We want to test whether a feature has any effect on the outcome. This works for linear and logistic regression models with any number of features.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">Estimate theta1 each time\nMake confidence interval for theta 1 and see if 0 is in the interval\nIf yes, theta 1 is not significantly different than 0\nIf no, theta 1 is significantly different than 0</code></pre></div>\n</li>\n</ul>\n<h2 id=\"multicollinearity\" style=\"position:relative;\"><a href=\"#multicollinearity\" aria-label=\"multicollinearity permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Multicollinearity</h2>\n<ul>\n<li>If features are related to one another, it might not be possible to have a change in one while holding the others constant</li>\n<li>\n<p>Multicollinearity: Where a feature can be predicted fairly accurately by a lienar combination of other features.</p>\n<ul>\n<li>Doesn’t impact model predictability, only interpretability</li>\n</ul>\n</li>\n<li>Perfect Multicollinearity: one feature can be written exactly as linear combination of other features</li>\n</ul>\n<h2 id=\"summary\" style=\"position:relative;\"><a href=\"#summary\" aria-label=\"summary permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Summary</h2>\n<ul>\n<li>Estimators are functions that provide estimates of true population parameters</li>\n<li>We can bootstrap to estimate the sampling distribution of an estimator</li>\n<li>\n<p>Using the bootstrapped sampling distribution, we can compute a confidence interval for our estimator</p>\n<ul>\n<li>This gives a rough idea of how uncertain we are about the true population parameter</li>\n<li>Only valid if the original random sample is representative</li>\n</ul>\n</li>\n<li>\n<p>The assumption when performing linear regression is that there is some true parameter theta that defines the linear relationship between features X and response Y</p>\n<ul>\n<li>We can use bootstrap to determine whether or not an individual feature is significant</li>\n</ul>\n</li>\n<li>Multicollinearity arises when features are correlated with one another </li>\n<li>Supervised Learning = We have an X and Y</li>\n<li>\n<p>Unsupervised Learning = We only have x, want to learn about X</p>\n<h1 id=\"questions\" style=\"position:relative;\"><a href=\"#questions\" aria-label=\"questions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Questions</h1>\n<p>What is confidence interval</p>\n</li>\n</ul>","fields":{"slug":"/posts/datascience/inference","tagSlugs":["/tag/notes/","/tag/lecture/","/tag/data-science/"]},"frontmatter":{"date":"2021-10-21T23:46:37.121Z","description":"Notes on Inference","tags":["Notes","Lecture","Data Science"],"title":"Inference"}}},"pageContext":{"slug":"/posts/datascience/inference"}},"staticQueryHashes":["251939775","401334301","825871152"]}