NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus
lostmsu 8 hours ago [-]
Does it support flash attention? Use tensor cores? Can I write custom kernels?

UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.

mikepapadim 2 hours ago [-]
Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
lostmsu 13 minutes ago [-]
TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393
sliicemasternet 7 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 10:25:31 GMT+0000 (Coordinated Universal Time) with Vercel.