{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/datascience/logistic-regression","result":{"data":{"markdownRemark":{"id":"1dc6f05c-9a9f-5c0c-8e16-5cab23ff9826","html":"<h2 id=\"linear-regression\" style=\"position:relative;\"><a href=\"#linear-regression\" aria-label=\"linear regression permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Linear Regression</h2>\n<p>Goal: Predict a number from a set of features\nOutput: A real number\nProcess: determine optimal parameters by minimizing some average loss and something with a regularization penalty.</p>\n<h2 id=\"classification\" style=\"position:relative;\"><a href=\"#classification\" aria-label=\"classification permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Classification</h2>\n<p>Goal: Predict a categorical variable(ie win or lose, disease or no disease, spam or ham)\nTypes: Binary Classification(Two classes, ie spam or not spam), Multiclass classication(type of animal)</p>\n<h2 id=\"logistic-regression-model\" style=\"position:relative;\"><a href=\"#logistic-regression-model\" aria-label=\"logistic regression model permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Logistic Regression Model</h2>\n<p>P(Y=1|x) = 1 / (1 + e^((-x^t)_theta)) = sigma(x^T _ theta)\nThe probability of an observation belonging to class 1 given x(vector of features) = Linear regression function on linear transformation of a linear combination of x(vector of features)</p>\n<p>Thus, the logistic regression is sometimes known as a generalized linear model since it is a non-linear transformation of a linear model.</p>\n<h2 id=\"pitfalls-of-squared-loss-functions-for-logitisic-regression\" style=\"position:relative;\"><a href=\"#pitfalls-of-squared-loss-functions-for-logitisic-regression\" aria-label=\"pitfalls of squared loss functions for logitisic regression permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Pitfalls of Squared Loss functions for logitisic regression</h2>\n<p>Can get trapped in a flat region if the surface is not convex\nSolution: Use Cross-entropy Loss!\nEmpirical Risk for logistic regression when using cross-entropy loss<br>\n<code class=\"language-text\">R(theta) = -(1/n)sum from i=1 to n( y_i * log(logistic regression(X_i^T * theta)) + (1 - y_i) * (1 - logistic regression(X_i^T * theta)))</code></p>\n<h3 id=\"advantages-of-cross-entropy-loss\" style=\"position:relative;\"><a href=\"#advantages-of-cross-entropy-loss\" aria-label=\"advantages of cross entropy loss permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Advantages of Cross-entropy loss</h3>\n<ul>\n<li>Loss surface is guaranteed to be convex</li>\n<li>More strongly penalizes bad predictions</li>\n<li>Has roots in probability and information theory</li>\n</ul>\n<h2 id=\"logistic-regression\" style=\"position:relative;\"><a href=\"#logistic-regression\" aria-label=\"logistic regression permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Logistic Regression</h2>\n<ul>\n<li>Good for outliers since we wrap our data into a logistic function call</li>\n<li>L2 and L1 Loss Motivation: We want to measure the size of the error(difference between prediction and data), and for it to be indifferent to sign.</li>\n<li>To measure the difference between two probability distributions, the cross entropy loss is useful</li>\n<li>Use a threshold to classify a point on the logistic curve into a graph, this turns our decision rule into a classifier</li>\n</ul>\n<h2 id=\"evaluating-classifiers\" style=\"position:relative;\"><a href=\"#evaluating-classifiers\" aria-label=\"evaluating classifiers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Evaluating Classifiers</h2>\n<ul>\n<li>The most basic evaluation for a classifier is accuracy. This is widely used, but in the presence of class imbalance it doesn’t mean much(since if we classify all the pictures as dog and dogs compose of 95% of the dataset our accuracy is 95%)</li>\n<li>Types of classification errors: True positive and True negatives(correct when classify observations as positive or negative), False positive(False alarm), False negative(We failed to detect)</li>\n<li>Confusion matrix: Gives us true + false positives and negatives for a given classifier</li>\n</ul>\n<h2 id=\"precision-and-recall\" style=\"position:relative;\"><a href=\"#precision-and-recall\" aria-label=\"precision and recall permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Precision and Recall</h2>\n<ul>\n<li>\n<p><code class=\"language-text\">Accuracy = (True positive + True negative) / number of data points</code></p>\n<ul>\n<li>This tells us what proportion of points did our classifier classify correctly</li>\n</ul>\n</li>\n<li>\n<p><code class=\"language-text\">Precision = True positive / (True positive + False positive)</code></p>\n<ul>\n<li>Of all observations that were predicted to be 1, what proportion is actually 1? Penalized false positives</li>\n</ul>\n</li>\n<li>\n<p><code class=\"language-text\">Recall = True positive / (True positive + false negative)</code> </p>\n<ul>\n<li>Of all observations that were actually 1, what proportion did we predict to be 1? Penalizes false negatives</li>\n<li>Can achieve 100% recall by making the classifier output 1 regardless of input, but makes precision low</li>\n</ul>\n</li>\n<li>Generally, a higher classification threshold for positives equals fewer false positives, increases precision.</li>\n<li>On the other hand, a lower classification threshold has fewer false negatives, increases recall.</li>\n</ul>\n<h2 id=\"visual-metrics\" style=\"position:relative;\"><a href=\"#visual-metrics\" aria-label=\"visual metrics permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Visual Metrics</h2>\n<ul>\n<li>The only thing we can change after getting our model is to play around with our classification threshold</li>\n<li>Accuracy vs threshold: <code class=\"language-text\">Threshold too high = false negatives</code>, <code class=\"language-text\">Threshold too low = false positives</code></li>\n<li>Precision vs threshold: <code class=\"language-text\">Threshold increases = fewer false positives</code>, precision tends to increase(not always)</li>\n<li>Recall vs threshold: <code class=\"language-text\">Threshold increases = more false negatives</code>, recall tends to decrease</li>\n</ul>\n<h2 id=\"other-metrics\" style=\"position:relative;\"><a href=\"#other-metrics\" aria-label=\"other metrics permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Other Metrics</h2>\n<ul>\n<li>\n<p><code class=\"language-text\">False Positive Rate = False Positives / (False Positives + True Negative)</code></p>\n<ul>\n<li>What proportion of innocent people did I convict</li>\n</ul>\n</li>\n<li>\n<p><code class=\"language-text\">True Positive Rate = True Positives / (True Positives + False Negatives)</code></p>\n<ul>\n<li>What proportion of guilty people did I convict? aka recall</li>\n</ul>\n</li>\n<li>ROC Curve(Receiver Operating Characteristic): plots the false positive rate vs true positive rate</li>\n<li>\n<p>AUC(Area under curve): Area under ROC curve, best possible = 1, worst possible = 0.5(randomly guessing)</p>\n<h1 id=\"questions\" style=\"position:relative;\"><a href=\"#questions\" aria-label=\"questions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Questions</h1>\n</li>\n<li>Can a threshold be a sloped line os is this useless?</li>\n</ul>","fields":{"slug":"/posts/datascience/logistic-regression","tagSlugs":["/tag/notes/","/tag/lecture/","/tag/data-science/"]},"frontmatter":{"date":"2021-10-21T23:46:37.121Z","description":"Notes on Logistic Regression","tags":["Notes","Lecture","Data Science"],"title":"Logistic Regression"}}},"pageContext":{"slug":"/posts/datascience/logistic-regression"}},"staticQueryHashes":["251939775","401334301","825871152"]}