Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anthropic's original take home assignment open sourced (github.com/anthropics)
142 points by myahio 3 hours ago | hide | past | favorite | 48 comments




I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)

The writing was on the wall for about half a year (publicly) now. The oAI 2nd place at the atcoder world championship competition was the first one, and I remember it being dismissed at the time. Sakana also got 1st place in another atcoder competition a few weeks ago. Google also released a blog a few months back on gemini 2.5 netting them 1% reduction in training time on real-world tasks by optimising kernels.

If the models get a good feedback loop + easy (cheap) verification, they get to bang their tokens against the wall until they find a better solution.


It's pretty interesting how close this assignment looks to demoscene [1] golf [2].

[1] https://en.wikipedia.org/wiki/Demoscene [2] https://en.wikipedia.org/wiki/Code_golf

It even uses Chrome tracing tools for profiling, which is pretty cool: https://github.com/anthropics/original_performance_takehome/...


it's designed to select for people who can be trusted to manually write ptx :-)

Having recently learned more about SIMD, PTX and optimization techniques, this is a nice little challenge to learn even more.

As a take home assignment though I would have failed as I would have probably taken 2 hours to just sketch out ideas and more on my tablet while reading the code before even changing it.


> This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.

Was the screening format here that this problem was sent out, and candidates had to reply with a solution within 2 hours?

Or, are they just saying that the latest frontier coding models do better in 2 hours than human candidates have done in the past in multiple days?


I wonder if OpenAI follows suit.

They should.

“If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at performance-recruiting@anthropic.com with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.”

Going through the assignment now. Man it’s really hard to pack the vectors right

This is a knowledge test of GPU architecture?

Kind of, but not any particular GPU.

The machine is fake and simulated: https://github.com/anthropics/original_performance_takehome/...

But presumably similar principles apply.


The snarky writing of "if you beat our best solution, send us an email and MAYBE we think about interviewing you" is really something, innit?

They wrote:

> If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at performance-recruiting@anthropic.com with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.

That doesn’t seem snarky to me. They said if you beat Opus, not their best solution. Removing “perhaps” (i.e. MAYBE) would be worse since that assumes everyone wants to interview at Anthropic. I guess they could have been friendlier: “if you beat X, we’d love to chat!”


I suppose you could interpret it either way, but having dealt with their interview pipeline I'd choose the snark.

I think the split between assume they are well meaning and wasting everyone's times depends entirely if you've interviewed with them.

That paraphrases to

"do better than we have publicly admitted most of humanity can do, and we may deign to interview you"

It sounds incredibly condescending, if not snarky, but I would classify those adjectives as mostly synonymous.


I took the "perhaps" as a decision to be considered by the applicant, considering they'd be competent enough to get in at a place of their choice, not just anthropic.

I feel that came out wrong but the "maybe" was intended to be a way of saying "no guarantees", to avoid giving people the idea "solve this, get hired".

If you're an asshole that wants millions of dollars...i mean there's still places to say no

Pride comes before fall thankfully

its anthrophic. their entire marketing is just being an pompous ass and AI fear mongering.

It shocks me that anyone supposedly good enough for anthropic would subject themselves to such a one sided waste of time.

I generally have a policy of "over 4 hours and I charge for my time." I did this in the 4-hour window, and it was a lot of fun. Much better than many other take-home assignments.

I don't do take home assignments, but when I did, I would offer to do it at my hourly rate, even if it was just an hour. It's time I would otherwise spend making money.

Anyone worth working with respected that and I landed several clients who forwent the assignment altogether. It's chump change in the grand scheme of things, and often a formality.

Does help that I have a very public web presence and portfolio, though.


For many reasons, you’re not gonna get into Anthropic with that attitude.

And Anthropic will never land heavyset_go with their attitude. I guess we’re at an impasse.

4 hours continuous or no? I can't imagine finding 4 hours of straight focus.

I’ve been sent the Anthropic interview assignments a few times. I’m not a developer so I don’t bother. At least at the time they didn’t seem to have technical but not-dev screenings. Maybe they do now.

Why is writing code to execute a program using the fewest instructions possible on a virtual machine a waste of time?

The expected time you spend on it is much less than the expected time they'll spend on it.

It’s kind of an interesting problem.

Seems like they’re trying to hire nerds who know a lot about hardware or compiler optimizations. That will only get you so far. I guess hiring for creativity is a lot harder.

And before some smart aleck says you can be creative on these types of optimization problems: not in two hours, it’s far too risky vs regurgitating some standard set of tried and true algos.


Your comments history suggests you’re rather bitter about “nerds” who are likely a few standard deviations smarter than you (Anthropic OG team, Jeff Dean, proof nerds, Linus, …)

And they’re all dumber than John von Neumann, who cares?

Transitively, you haven't thought the most thoughts or cared the most about anything, therefore we should disregard what you think and care about?

The person replying was trying to turn the conversation into some sort of IQ pissing contest. Not sure why, that seems like their own problem. I was reminding them that there is always someone smarter.

If they're hiring performance engineers then they're hiring for exactly these sets of skills.

It's a take-home test, which means some people will spend more than a couple of hours on it to get the answer really good. They would have gone after those people in particular.


And before some smart aleck says you can be creative on these types of optimization problems: not in two hours, it’s far too risky vs regurgitating some standard set of tried and true algos.

You're both right and wrong. You're right in the sense that the sort of creativity the task is looking for isn't really possible in two hours. That's something that takes a lot of time and effort over years to be able to do. You're wrong because that's exactly the point. Being able to solve the problem takes experience. Literally. It's having tackled these sorts of problems over and over in the past until you can draw on that understanding and knowledge reasonably quickly. The test is meant to filter out people who can't do it.

I also think it's possible to interpret the README as saying humans can't do better than the optimizations that Claude does when Claude spends two hours of compute time, regardless of how long the human takes. It's not clear though. Maybe Claude didn't write the README.


This would be an inappropriate assignment for a web dev position, but I'm willing to bet that a 1% improvement in cycles per byte in inference (or whatever) saves Anthropic many millions of dollars. This is one case where the whiteboard assignment is clearly related to the actual job duties.

> Seems like they’re trying to hire nerds who know a lot about hardware or compiler optimizations. That will only get you so far. I guess hiring for creativity is a lot harder.

Good. That should be the minimum requirement.

Not another Next.js web app take home project.


What is the actual assignment here?

The README only gives numbers without any information on what you’re supposed to do or how you are rated.


"Optimize the kernel (in KernelBuilder.build_kernel) as much as possible in the available time, as measured by test_kernel_cycles on a frozen separate copy of the simulator." from perf_takehome.py

Think that means you failed :(

+1

being cryptic and poorly specified is part of the assignment

just like real code

in fact, it's _still_ better documented an self contained than most of the problems you'd usually encounter in the wild. pulling on a thread to end up with a clear picture of what needs to be accomplished is like 90% of the job very often.


I didn't see much cryptic except having to click on "perf_takehome.py" without being told to. But, 2 hours didn't seem like much to bring the sample code into some kind of test environment, debug it enough to works out details of its behaviour, read through the reference kernel and get some idea of what the algorithm is doing, read through the simulator to understand the VM instruction set, understand the test harness enough to see how the parallelism works, re-code the algorithm in the VM's machine language while iterating performance tweaks and running simulations, etc.

Basically it's a long enough problem that I'd be annoyed at being asked to do it at home for free, if what I wanted from that was a shot at an interview. If I had time on my hands though, it's something I could see trying for fun.


it's "cryptic" for an interview problem. e.g. the fact that you have to actually look at the vm implementation instead of having the full documentation of the instruction set from the get go.

It's definitely cleaner than what you will see in the real world. Research-quality repositories written in partial Chinese with key dependencies missing are common.

IMO the assignment('s purpose) could be improved by making the code significantly worse. Then you're testing the important stuff (dealing with ambiguity) that the AI can't do so well. Probably the reason they didn't do that is because it would make evaluation harder + more costly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: