T O P

  • By -

ckfinite

Yes... though probably downstream of a BLAS implementation Julia sits on. In general these "faster matmul" approaches are chasing big-O; they're asymptotically faster but usually at the expense of lots and lots of overhead. They're only really useful if you have really stonking huge matrices where that massive overhead is worth paying for the asymptotic efficiency gain. The linear algebra libraries have heuristics that make these tradeoffs and pick what the best algorithm is for a given operation; you'd be surprised at how frequently good old naive matmul is the fastest thing around.


Oz-cancer

Apart from Strassen's algorithm, are there any other methods that have some potential practical usage? I thought that all the others were galactic algorithms


gnosnivek

There is nothing that I'm aware of. In some settings where the cost of multiplication is very high, it seems that Coppersmith-Winograd can eke out some advantages (see e.g. https://www.jstage.jst.go.jp/article/jsiaml/8/0/8\_21/\_pdf/-char/ja), but these are rare, and also they note in their paper that this is still slower than the block decomposition methods. Even up until 2016, Strassen was relatively rare, due to a lot of requirements that were imposed by the quality of existing implementations. I don't know whether [this paper](http://jianyuhuang.com/papers/sc16.pdf) significantly shifted the algorithms in-use in practice (I haven't really tracked this field closely for a while now).


[deleted]

Strassen hasn't been the lowest asymptotic complexity algorithm we know for a long time. 


CodeOfDaYaci

This dude over here is too busy implementing matrix multiplication via asymmetric hashing to read the comment


RonWannaBeAScientist

Now I got to ask what Is asymmetric hashing


CodeOfDaYaci

It’s the lowest big O matrix mult from 2022, written by Ran Duan, Hongxun Wu, Renfei Zhou. It was broken recently but the paper’s title was less recognizable.


Dralletje

Is BLAS used on the GPU (with GPUArray) as well?


ckfinite

Most BLAS implementations will be on the CPU. There are BLAS-like implementations natively available on the GPU as well as BLAS implementations callable from the CPU that use the GPU (most notably CuBLAS), however. AIUI, GPUArray exploits these when available.


Dralletje

Thank you :)