I'm interested in what you've learned using libuv while tuning this library. I've read in the past that libuv makes a lot of unnecessary memory allocations. Is this true and/or have you consider writing directly to select/epoll/kqueue? Is there a lot of overhead in using libuv vs the OS provided eventing syscalls?
Libuv is definitely not as "screamingly fast" as it's marketing tells. I found the biggest flaw with libuv to be the uv_tcp_t which forces you to have a user space buffer to receive the data, something not needed when using uv_poll_t.
I do not know what the overhead of uv_poll_t is compared to epoll/kqueue but I think it's a good balance to depend on libuv in this case, and since we need to integrate with Node.js it is kind of required.
I would much rather use mTCP to further improve the performance but then this project would not be as relevant to most developers. Performance & relevance is key - it can be optimized further by using mTCP and such.
But without all that you would lose performance which seems to be the main goal of this project.