K-means

 


Original Implementation

 The blog wjmn.github.io (link) proposes a naive version (according to the author) of k-means :

init =. ([ ? #@:]) { ]
alloc =. {.@:I.@:(= <./)@:(+/@:*:@:-"1)
step =. (+/ % #)&>@:(alloc"1 2 </. [)
kmc =. ] step^:_ init

As far as I am concerned, I am not competent enough to decide on the naivety of this implementation. However, several things bother me in this version:

  • @: is used everywhere, even when it is not necessary

  • ~ is not used when it is useful

  • even if it works, it would be good to simplify the way of addressing the rank of the verb alloc

  • The verb alloc is oddly implemented.

As a result, it's slow and this version is difficult to read and understand.

Improved Version 

We start by removing the parentheses in the verb init:

init2 =: ]{~[?#@]

This is easily done by using ~.

Next, the calculation of the Euclidean distance: The author proposes (+/@:*:@:-"1) to calculate the Euclidean distance. A robust rank variant that I also use is +/&.:*:@:-"1. There is an issue with selecting the index of the smallest element: it is perfectly fine to do as the author suggests, but there is a dedicated form in J. So we replace {.@:I.@:(= <./)@: with (i. <./)"1.

Finally, we arrange the step verb as it is written backwards (in my opinion). I suspect this way of writing it is one of the main causes of the slowdown in the author's version.

Once cleaned and rearranged, it looks like this:

init2 =: ]{~[?#@]
alloc2=: [: (i. <./)"1 +/&.:*:@:-"1 / ~
mean =: +/ % #
kmc2 =: init2 (alloc2 mean/. ])^:_ ]

And as a bonus, a one-liner version:

kmc2OneLiner =: (]{~[?#@]) (([: (i. <./)"1 +/&.:*:@:-"1 / ~) (+/ % #)/. ])^:_ ]

Speed Test 

We will compare the execution speed of the different versions (the original, my version, and the one-liner version) on a set of 5000 points located in a three-dimensional space:

dt =: 5000 3 ?@$ 0
(,[:<"0[:(,:(%~{.))timex&>)'40 kmc dt';'40 kmc2 dt';'40 kmc2OneLiner dt
 
Output:

The time gain is notable.

All the code

NB. ---------------------------------------------------------
NB. implementation from https://wjmn.github.io/posts/j-can-look-like-apl.html
NB. ---------------------------------------------------------
init =. ([ ? #@:]) { ]
alloc =. {.@:I.@:(= <./)@:(+/@:*:@:-"1)
step =. (+/ % #)&>@:(alloc"1 2 </. [)
kmc =. ] step^:_ init
NB. ---------------------------------------------------------
NB. faster version
NB. ---------------------------------------------------------
init2 =: ]{~[?#@]
alloc2=: [: (i. <./)"1 +/&.:*:@:-"1 / ~
mean =: +/ % #
kmc2 =: init2 (alloc2 mean/. ])^:_ ]

kmc2OneLiner =: (]{~[?#@]) (([: (i. <./)"1 +/&.:*:@:-"1 / ~) (+/ % #)/. ])^:_ ]
NB. ---------------------------------------------------------
NB. speed test
NB. ---------------------------------------------------------
dt =: 500 3 ?@$ 0
(,[:<"0[:(,:(%~{.))timex&>)'40 kmc dt';'40 kmc2 dt';'40 kmc2OneLiner dt'

NB. Line1 : what is executed, line2 : execution speed, line3 : xtimes faster than first column
NB. ┌─────────┬──────────┬──────────────────┐
NB. │40 kmc dt│40 kmc2 dt│40 kmc2OneLiner dt│
NB. ├─────────┼──────────┼──────────────────┤
NB. │32.2017  │0.803435  │0.701243          │
NB. ├─────────┼──────────┼──────────────────┤
NB. │1        │40.08     │45.9208           │
NB. └─────────┴──────────┴──────────────────┘



Commentaires