Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs (github.com)

28 points by darkrishabh 5 hours ago | 6 comments

ssgodderidge 1 hours ago [-]

The example model in the documentation is 4o-mini, you might want to update that to a more recent model.

As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?

stingraycharles 1 hours ago [-]

It’s an artifact of the documentation being AI generated, they usually pick gpt4-era models, without giving it further thought.

For Gemini it seems to always pick 2.5 despite 3.1 being the latest, Claude the 3.5-era models.

Not sure what’s preventing AI labs on ensuring this stuff is refreshed during training.

block_dagger 1 hours ago [-]

The skill is deterministically added to the prompt by the harness before the target model is invoked. There is no “choosing” to load a skill. You might be confusing skills with tools (MCP etc).

egeozcan 2 hours ago [-]

Are there any published results gathered using this?

jarym 3 minutes ago [-]

Not sure but I'm interested in trying it because I've for a while sensed that adding SKILLS.md degraded my overall experience - most probably I wrote them wrong. But this sort of tooling I guess can help me figure it out?

ianhxu 1 hours ago [-]

How do you iterate on the judge prompt? Is there an auto rater?

bixxie09 30 minutes ago [-]

[dead]

huflungdung 3 hours ago [-]

[dead]

Rendered at 11:32:09 GMT+0000 (Coordinated Universal Time) with Vercel.