{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/datascience/cross-validation-and-regularization","result":{"data":{"markdownRemark":{"id":"a7d5b84c-6f27-5f34-b836-929e8711c637","html":"<h1 id=\"the-train-test-split\" style=\"position:relative;\"><a href=\"#the-train-test-split\" aria-label=\"the train test split permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>The train-test Split</h1>\n<ul>\n<li><strong>Training Data:</strong> Used to fit model</li>\n<li><strong>Test Data:</strong> Used to check generalization error</li>\n<li>How to split? Randomly, Temporally, Geographically (Usually Random)</li>\n<li>What size? Larger training set = more complex models, Larger test set = better estimate of generalization error (Usually 90%-10% or 75%-25%)</li>\n</ul>\n<h1 id=\"recipe-for-successful-generalization\" style=\"position:relative;\"><a href=\"#recipe-for-successful-generalization\" aria-label=\"recipe for successful generalization permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Recipe for Successful Generalization</h1>\n<ol>\n<li>Split data into training(90%) and testing set(10%)</li>\n<li>Use only training data when training, use cross validation to test generalization(do not look at testing data!)</li>\n<li>Commit to the model and train once more using only training data</li>\n<li>Test the model using testing data. If accuracy is not acceptable, return to step 2</li>\n<li>Train on all available data and ship it</li>\n</ol>\n<h1 id=\"regularization\" style=\"position:relative;\"><a href=\"#regularization\" aria-label=\"regularization permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Regularization</h1>\n<p>Parametrically controlling the model complexity\n<strong>Naive Solution:</strong> Find the best value of theta which uses less than B features. Unfortunately, this is an NP hard combinatorial search problem.</p>\n<ul>\n<li><strong>L0 Norm Ball</strong> Ideal for feature selection but combinatorically difficult to optimize</li>\n<li><strong>L1 Norm Ball</strong> Encourages sparse solutions and is convex!</li>\n<li><strong>L2 Norm Ball</strong> Spreads weight over features(robust) and does not encourage sparsity</li>\n<li><strong>L1 + L2 Norm(Elastic Net)</strong> Compromise, need to tune two regularization parameters</li>\n</ul>\n<p><strong>Standardization:</strong> Ensure each dimension has the same scale and is centered around 0\n<strong>Intercept Terms:</strong> Typically doesn’t regularize the intercept term </p>\n<h1 id=\"ridge-regression\" style=\"position:relative;\"><a href=\"#ridge-regression\" aria-label=\"ridge regression permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Ridge Regression</h1>\n<p><strong>Model:</strong> Y = X * theta\n<strong>Loss:</strong> Squared Loss\n<strong>Regularization:</strong> L2 regularization\n<strong>Objective Function</strong> Squared Loss + added penalty\n<span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/6fa38369db8c6fcd6f756f1ac081badf/0c20b/RidgeRegression.jpg\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 22.916666666666668%; position: relative; bottom: 0; left: 0; background-image: url('data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAFABQDASIAAhEBAxEB/8QAFgABAQEAAAAAAAAAAAAAAAAAAAMF/8QAFQEBAQAAAAAAAAAAAAAAAAAAAQL/2gAMAwEAAhADEAAAAd6RLUUf/8QAGBAAAgMAAAAAAAAAAAAAAAAAAAECETH/2gAIAQEAAQUCJuhZ/8QAFhEAAwAAAAAAAAAAAAAAAAAAAhAx/9oACAEDAQE/ARq//8QAFhEAAwAAAAAAAAAAAAAAAAAAARAx/9oACAECAQE/ATF//8QAFBABAAAAAAAAAAAAAAAAAAAAEP/aAAgBAQAGPwJ//8QAGBAAAwEBAAAAAAAAAAAAAAAAAAERMWH/2gAIAQEAAT8hnWMmMwP/2gAMAwEAAgADAAAAEH/f/8QAFhEBAQEAAAAAAAAAAAAAAAAAAQAR/9oACAEDAQE/EBo2b//EABcRAAMBAAAAAAAAAAAAAAAAAAABESH/2gAIAQIBAT8QeYEf/8QAGxAAAgEFAAAAAAAAAAAAAAAAAREAITFBUXH/2gAIAQEAAT8QtRMOzZYFtysxyHP/2Q=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/6fa38369db8c6fcd6f756f1ac081badf/8ac56/RidgeRegression.webp 240w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/d3be9/RidgeRegression.webp 480w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/e46b2/RidgeRegression.webp 960w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/9012d/RidgeRegression.webp 992w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/6fa38369db8c6fcd6f756f1ac081badf/09b79/RidgeRegression.jpg 240w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/7cc5e/RidgeRegression.jpg 480w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/6a068/RidgeRegression.jpg 960w,\n/static/6fa38369db8c6fcd6f756f1ac081badf/0c20b/RidgeRegression.jpg 992w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/jpeg\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/6fa38369db8c6fcd6f756f1ac081badf/6a068/RidgeRegression.jpg\"\n            alt=\"Ridge Regression\"\n            title=\"Ridge Regression\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Note that there is always a unique optimal parameter vector for Ridge Regression!</p>\n<h1 id=\"lasso-regression\" style=\"position:relative;\"><a href=\"#lasso-regression\" aria-label=\"lasso regression permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Lasso Regression</h1>\n<p><strong>Model:</strong> Y = X * theta\n<strong>Loss:</strong> Squared Loss\n<strong>Regularization:</strong> L2 regularization\n<strong>Objective Function</strong> Average Squared Loss + added penalty\n<span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 854px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/b1459002d02fc9ee0e047aa5e319ef26/f537d/LassoRegression.jpg\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 34.583333333333336%; position: relative; bottom: 0; left: 0; background-image: url('data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAHABQDASIAAhEBAxEB/8QAFgABAQEAAAAAAAAAAAAAAAAAAAEF/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/aAAwDAQACEAMQAAAB3oFB/8QAFhAAAwAAAAAAAAAAAAAAAAAAAAEQ/9oACAEBAAEFAoj/xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/AT//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/AT//xAAUEAEAAAAAAAAAAAAAAAAAAAAQ/9oACAEBAAY/An//xAAZEAADAAMAAAAAAAAAAAAAAAAAARFRofD/2gAIAQEAAT8ha6keNiRH/9oADAMBAAIAAwAAABBzz//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8QP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8QP//EABsQAAICAwEAAAAAAAAAAAAAAAABESFRYaHB/9oACAEBAAE/EGNuukKBbZFB7J//2Q=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/b1459002d02fc9ee0e047aa5e319ef26/8ac56/LassoRegression.webp 240w,\n/static/b1459002d02fc9ee0e047aa5e319ef26/d3be9/LassoRegression.webp 480w,\n/static/b1459002d02fc9ee0e047aa5e319ef26/a7c35/LassoRegression.webp 854w\"\n              sizes=\"(max-width: 854px) 100vw, 854px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/b1459002d02fc9ee0e047aa5e319ef26/09b79/LassoRegression.jpg 240w,\n/static/b1459002d02fc9ee0e047aa5e319ef26/7cc5e/LassoRegression.jpg 480w,\n/static/b1459002d02fc9ee0e047aa5e319ef26/f537d/LassoRegression.jpg 854w\"\n            sizes=\"(max-width: 854px) 100vw, 854px\"\n            type=\"image/jpeg\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/b1459002d02fc9ee0e047aa5e319ef26/f537d/LassoRegression.jpg\"\n            alt=\"Ridge Regression\"\n            title=\"Ridge Regression\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Note that there is NO closed form solution for the optimal parameter vector for LASSO so we must use numerical methods like gradient descent</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/84f58b5f886fdbf5f4f61b15a2d75765/8351c/SummaryRegression.jpg\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 35%; position: relative; bottom: 0; left: 0; background-image: url('data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAHABQDASIAAhEBAxEB/8QAFQABAQAAAAAAAAAAAAAAAAAAAAX/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAAG8AD//xAAUEAEAAAAAAAAAAAAAAAAAAAAQ/9oACAEBAAEFAn//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/AT//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/AT//xAAUEAEAAAAAAAAAAAAAAAAAAAAQ/9oACAEBAAY/An//xAAUEAEAAAAAAAAAAAAAAAAAAAAQ/9oACAEBAAE/IX//2gAMAwEAAgADAAAAEAPP/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPxA//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPxA//8QAGRAAAwADAAAAAAAAAAAAAAAAAAEhEUFR/9oACAEBAAE/ENpDdwXp/9k='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/84f58b5f886fdbf5f4f61b15a2d75765/8ac56/SummaryRegression.webp 240w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/d3be9/SummaryRegression.webp 480w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/e46b2/SummaryRegression.webp 960w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/a1214/SummaryRegression.webp 1254w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/84f58b5f886fdbf5f4f61b15a2d75765/09b79/SummaryRegression.jpg 240w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/7cc5e/SummaryRegression.jpg 480w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/6a068/SummaryRegression.jpg 960w,\n/static/84f58b5f886fdbf5f4f61b15a2d75765/8351c/SummaryRegression.jpg 1254w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/jpeg\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/84f58b5f886fdbf5f4f61b15a2d75765/6a068/SummaryRegression.jpg\"\n            alt=\"Summary\"\n            title=\"Summary\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<h1 id=\"questions\" style=\"position:relative;\"><a href=\"#questions\" aria-label=\"questions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Questions</h1>","fields":{"slug":"/posts/datascience/cross-validation-and-regularization","tagSlugs":["/tag/notes/","/tag/lecture/","/tag/data-science/"]},"frontmatter":{"date":"2021-10-21T23:46:37.121Z","description":"Notes on Cross Validation and Regularization","tags":["Notes","Lecture","Data Science"],"title":"Cross Validation and Regularization"}}},"pageContext":{"slug":"/posts/datascience/cross-validation-and-regularization"}},"staticQueryHashes":["251939775","401334301","825871152"]}