Original Implementation
The blog wjmn.github.io (link) proposes a naive version (according to the author) of k-means :
init =. ([ ? #@:]) { ]
alloc =. {.@:I.@:(= <./)@:(+/@:*:@:-"1)
step =. (+/ % #)&>@:(alloc"1 2 </. [)
kmc =. ] step^:_ init
As far as I am concerned, I am not competent enough to decide on the naivety of this implementation. However, several things bother me in this version:
@:
is used everywhere, even when it is not necessary
~
is not used when it is useful
even if it works, it would be good to simplify the way of addressing the rank of the verb alloc
The verb alloc is oddly implemented.
As a result, it's slow and this version is difficult to read and understand.
Improved Version
We start by removing the parentheses in the verb init:
This is easily done by using ~
.
Next, the calculation of the Euclidean distance:
The author proposes (+/@:*:@:-"1)
to calculate the Euclidean distance. A robust rank variant that I also use is +/&.:*:@:-"1
.
There is an issue with selecting the index of the smallest element: it is perfectly fine to do as the author suggests, but there is a dedicated form in J. So we replace {.@:I.@:(= <./)@:
with (i. <./)"1
.
Finally, we arrange the step verb as it is written backwards (in my opinion). I suspect this way of writing it is one of the main causes of the slowdown in the author's version.
Once cleaned and rearranged, it looks like this:
init2 =: ]{~[?#@]
alloc2=: [: (i. <./)"1 +/&.:*:@:-"1 / ~
mean =: +/ % #
kmc2 =: init2 (alloc2 mean/. ])^:_ ]
And as a bonus, a one-liner version:
kmc2OneLiner =: (]{~[?#@]) (([: (i. <./)"1 +/&.:*:@:-"1 / ~) (+/ % #)/. ])^:_ ]
Speed Test
We will compare the execution speed of the different versions (the original, my version, and the one-liner version) on a set of 5000 points located in a three-dimensional space:
dt =: 5000 3 ?@$ 0
(,[:<"0[:(,:(%~{.))timex&>)'40 kmc dt';'40 kmc2 dt';'40 kmc2OneLiner dt
Output:
The time gain is notable.
All the code
NB. ---------------------------------------------------------
NB. implementation from https://wjmn.github.io/posts/j-can-look-like-apl.html
NB. ---------------------------------------------------------
init =. ([ ? #@:]) { ]
alloc =. {.@:I.@:(= <./)@:(+/@:*:@:-"1)
step =. (+/ % #)&>@:(alloc"1 2 </. [)
kmc =. ] step^:_ init
NB. ---------------------------------------------------------
NB. faster version
NB. ---------------------------------------------------------
init2 =: ]{~[?#@]
alloc2=: [: (i. <./)"1 +/&.:*:@:-"1 / ~
mean =: +/ % #
kmc2 =: init2 (alloc2 mean/. ])^:_ ]
kmc2OneLiner =: (]{~[?#@]) (([: (i. <./)"1 +/&.:*:@:-"1 / ~) (+/ % #)/. ])^:_ ]
NB. ---------------------------------------------------------
NB. speed test
NB. ---------------------------------------------------------
dt =: 500 3 ?@$ 0
(,[:<"0[:(,:(%~{.))timex&>)'40 kmc dt';'40 kmc2 dt';'40 kmc2OneLiner dt'
NB. Line1 : what is executed, line2 : execution speed, line3 : xtimes faster than first column
NB. ┌─────────┬──────────┬──────────────────┐
NB. │40 kmc dt│40 kmc2 dt│40 kmc2OneLiner dt│
NB. ├─────────┼──────────┼──────────────────┤
NB. │32.2017 │0.803435 │0.701243 │
NB. ├─────────┼──────────┼──────────────────┤
NB. │1 │40.08 │45.9208 │
NB. └─────────┴──────────┴──────────────────┘
Commentaires
Enregistrer un commentaire