{"id":803,"date":"2023-08-26T18:30:25","date_gmt":"2023-08-26T18:30:25","guid":{"rendered":"https:\/\/savvaspanagi.com\/?p=803"},"modified":"2023-08-29T10:58:18","modified_gmt":"2023-08-29T10:58:18","slug":"handling-cyclical-features-for-use-in-method-such-us-machine-learning","status":"publish","type":"post","link":"https:\/\/savvaspanagi.com\/?p=803","title":{"rendered":"How to handle cyclical features, for distance methodologies like k-means machine learning algorithms"},"content":{"rendered":"\n<p class=\"has-ast-global-color-2-color has-text-color\">Often in situations of use machine learnings methods, we have to consider how to handle the cyclic features. For example in K-Means algorithm it use Euclidean distance in order to sort the available data&#8217;s in clusters. In this situations the distance between the hour 0 (00:00) from 23 (23:00) is bigger than what really is. Base on the literature in order to overcome this problem they are use the sin and cosine method to represent each hour in a different cyclic form. With the help of <a href=\"https:\/\/www.sefidian.com\/\"><span style=\"text-decoration: underline;\">Sefidian Academy<\/span> <\/a>and his corresponding article [1] about <a href=\"https:\/\/www.sefidian.com\/2021\/03\/26\/handling-cyclical-features-such-as-hours-in-a-day-for-machine-learning-pipelines-with-python-example\/\"><span style=\"text-decoration: underline;\">handling cyclical features<\/span><\/a> i will try to analytical explain the method and with a simple example i will represent how this help in distance measures. <\/p>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color has-medium-font-size\"><strong>Equations<\/strong><\/p>\n\n\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 92px;\"><span class=\"ql-right-eqno\"> (1) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" src=\"https:\/\/savvaspanagi.com\/wp-content\/ql-cache\/quicklatex.com-a9b92d84f66b49331edfcbe82285299b_l3.png\" height=\"92\" width=\"174\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#120;&#32;&#38;&#61;&#32;&#92;&#115;&#105;&#110;&#92;&#108;&#101;&#102;&#116;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#97;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#50;&#92;&#112;&#105;&#125;&#123;&#92;&#109;&#97;&#120;&#40;&#97;&#41;&#32;&#43;&#32;&#49;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#41;&#32;&#92;&#92; &#121;&#32;&#38;&#61;&#32;&#92;&#99;&#111;&#115;&#92;&#108;&#101;&#102;&#116;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#97;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#50;&#92;&#112;&#105;&#125;&#123;&#92;&#109;&#97;&#120;&#40;&#97;&#41;&#32;&#43;&#32;&#49;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#41; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">Base on the above equations we transform real values to a new cyclical form. So, lets think a simple cyclical problem of seasons values and try to solve it for the proof of operation. If we declare the seasons Winter, Spring , Summer and Autumn to a number value it will be 0,1,2 and 3 corresponding. As we know after Autumn the next season is the winter but a distance of 3 (3-0) is the maximum that can occur in our problem if we use the declared number values. After the implementation of transform we have the following form of datas:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"421\" height=\"98\" src=\"https:\/\/savvaspanagi.com\/wp-content\/uploads\/2023\/08\/image.png\" alt=\"\" class=\"wp-image-815\"\/><\/figure>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">Now let&#8217;s try to represent the above data&#8217;s in x and y axes of scatter plot (x_axes=Sin , y_axes=Cos) : <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/savvaspanagi.com\/wp-content\/uploads\/2023\/08\/image-1.png\" alt=\"\" class=\"wp-image-816\" style=\"width:579px;height:364px\" width=\"579\" height=\"364\"\/><\/figure>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">Finally, how does this help in K-Means algorithm and other distance methods that i will use ? Euclidian distance calculated based on the below formulas.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/savvaspanagi.com\/wp-content\/uploads\/2023\/08\/image-2-1024x576.png\" alt=\"\" class=\"wp-image-819\" style=\"width:505px;height:284px\" width=\"505\" height=\"284\"\/><\/figure>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">In our simple problem there are 2D points and after applying the formulas the distance is the following:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/savvaspanagi.com\/wp-content\/uploads\/2023\/08\/image-3.png\" alt=\"\" class=\"wp-image-820\" style=\"width:407px;height:169px\" width=\"407\" height=\"169\"\/><\/figure>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">So is this Euclidean distance true? if we think the reality this represent exactly how far is each season for each other.<\/p>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\">Your thoughts and questions are important for me. Feel free to share your insights or inquire about anything in the comments section below. Let&#8217;s keep the conversation going!<\/p>\n\n\n\n<p class=\"has-ast-global-color-2-color has-text-color\"><strong>References:<\/strong><\/p>\n\n\n\n<p>[1] : <a href=\"https:\/\/www.sefidian.com\/2021\/03\/26\/handling-cyclical-features-such-as-hours-in-a-day-for-machine-learning-pipelines-with-python-example\/\">https:\/\/www.sefidian.com\/2021\/03\/26\/handling-cyclical-features-such-as-hours-in-a-day-for-machine-learning-pipelines-with-python-example\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often in situations of use machine learnings methods, we have to consider how to handle the cyclic features. For example in K-Means algorithm it use Euclidean distance in order to sort the available data&#8217;s in clusters. In this situations the distance between the hour 0 (00:00) from 23 (23:00) is bigger than what really is. &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/savvaspanagi.com\/?p=803\"> <span class=\"screen-reader-text\">How to handle cyclical features, for distance methodologies like k-means machine learning algorithms<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[7],"tags":[],"jetpack_featured_media_url":"","rttpg_featured_image_url":null,"rttpg_author":{"display_name":"Savvas Panagi","author_link":"https:\/\/savvaspanagi.com\/?author=1"},"rttpg_comment":1,"rttpg_category":"<a href=\"https:\/\/savvaspanagi.com\/?cat=7\" rel=\"category\">Machine Learning<\/a>","rttpg_excerpt":"Often in situations of use machine learnings methods, we have to consider how to handle the cyclic features. For example in K-Means algorithm it use Euclidean distance in order to sort the available data&#8217;s in clusters. In this situations the distance between the hour 0 (00:00) from 23 (23:00) is bigger than what really is.&hellip;","_links":{"self":[{"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/posts\/803"}],"collection":[{"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=803"}],"version-history":[{"count":28,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/posts\/803\/revisions"}],"predecessor-version":[{"id":845,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=\/wp\/v2\/posts\/803\/revisions\/845"}],"wp:attachment":[{"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/savvaspanagi.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}