I want to thank all people involved in making this open source project possible. Things are by far not set in stone and this library has only been in development for about 2 months now.
We are working with SocketCluster to make µWS default in version 5, and I have gotten a lot of help from a lot of people during these months.
Thanks for support, I will be accepting PR's and receiving issues that we need to fix before making any kind of official stable release.
Also, try to ignore the hateful comments - these commenters build their cases on thin air, and if you actually do find anything you want to change - I will accept PR's that can be shown to improve the library.
Hey there. Just wrote down a few things that came to my mind after seeing this. Don't see it as criticism, but as a few hints/remarks what has to be covered by a websocket (or any other protocol) implementation. Imho the high performance part isn't too hard to achieve and shouldn't be the highest rated. The important thing is that such a server should be rock solid in implementation, otherwise it's worthless.
Some things that you should answer if you are offering this as a C++ websocket library, and which are currently not covered in the header file:
- Whats the threading model of the library?
- Will server.run() start a singlethreaded eventloop (I guess so from taking a supershort peek into the code and seeing libuv) and everything is running inside there or will it start multiple worker threads?
- Based on the last question, from which thread[s] are the other callbacks called.
- If multiple threads are used, is the library threadsafe for sending messages from other threads
- Is it integratable in other eventloops? Most applications already have a mainloop or something like this, libraries which only work with their own mainloop are not very useful. Normally applications also have to deal with other application logic besides responding to websocket messages.
Besides that a few general questions that you should be able to answer for a websocket implementation:
- What's the sending behavior of socket.send?
- Will it block until all data was sent? This can cause problems with slow receivers in singlethreaded environments.
- Will it copy all data and buffer it internally until it can be sent? This provides no means for backpressure and slow receivers (or non-receivers) can exhaust the servers memory.
- Does it handle connection close properly? This is unfortunatly not too easy in websockets.
- And are there timers in place for force closing the connection if the shutdown sequence is not completed properly? Or if the initial handshake is not completed in a given timeframe?
- Does ist handle control frames? And will it merge control frames (PONGs) if multiple are queued before they are sent? And will it stop sending them after close connection is initiated?
The server is async, so there is no blocking functions exposed.
It passes all Autobahn tests, meaning it properly handles close frames & pings etc.
Timers are used to force close connections. The C++ HTTP server does not currently time out, but the Node.js HTTP server does, so this is one issue that needs to be fixed, yes.
Just looked at the code. You seem to queue everything and therefore never block. That can be OK for some use-cases, but you don't provide any kind of backpressure due to that in the send command. It will get problematic in case of slow receivers. And in node it will break the stream semantics. E.g. if someone pipes 1Gb of messages into your socket (or sends them if WriteableStream is not supported) he will think that they have been sent immediatly and won't know that the data is buffered on lower layer. If you pipe a fast source into a slow receiver it will break over time - which is exactly the thing that node streams actually try to avoid.
And another thing I saw there: Your SHORT_SEND optimization looks broken, as there does not seem to be tracked if the buffer is already used by another message that is still queued for sending. So short messages can corrupt each other.
For your first set of questions: it's single threaded. If you want use multiple processors/threads, you start multiple servers, and either distribute loads with a load balancer or give each one a different TCP port.
Not sure if that's wrong, but it seems that the program would crash if you call pop() on an empty Queue? I guess it never happens in this particular program, which I assume was the reason to squeeze even more performance.
yeah - the C++ bits were clearly written by someone who doesn't know the language well. I'd be careful about using this code in production.
EDIT: Specifically I'm referring to the usage of raw pointers, unchecked pointer arithmetic, goto for flow control and raw new/delete calls. The author says they have run tests under valgrind, but that doesn't say anything unless the inputs were malicious. Ideally it should be compiled with ASAN and run under something like afl-fuzz.
Also - why do you take a libuv dependency - then use uv_poll_t directly with raw send/recv calls instead of using uv's provided TCP primitives?
Also - this server passes the entire Autobahn with no fails, passes Engine.IO tests, Primus tests and exits gracefully with no Valgrind issues. Also, it builds with no warnings. Also, it has been running for weeks stable.
This implementation is way, way more lightweight. And it assumes that the buffer being queued has a struct Message in its head, so it doesn't have to allocate a node - one memory allocation is therefore skipped.
I'm interested in what you've learned using libuv while tuning this library. I've read in the past that libuv makes a lot of unnecessary memory allocations. Is this true and/or have you consider writing directly to select/epoll/kqueue? Is there a lot of overhead in using libuv vs the OS provided eventing syscalls?
Libuv is definitely not as "screamingly fast" as it's marketing tells. I found the biggest flaw with libuv to be the uv_tcp_t which forces you to have a user space buffer to receive the data, something not needed when using uv_poll_t.
I do not know what the overhead of uv_poll_t is compared to epoll/kqueue but I think it's a good balance to depend on libuv in this case, and since we need to integrate with Node.js it is kind of required.
I would much rather use mTCP to further improve the performance but then this project would not be as relevant to most developers. Performance & relevance is key - it can be optimized further by using mTCP and such.
1. It has been compiled with ASAN.
2. What you suggest would blow the memory footprint 16x.
Thanks for giving your infinite enlightenment, after looking at my code for 10 minutes. I guess your 10 minutes of reading the code is infinitely much more valuable than my 3 months work on it?
If you want people to take you and your projects seriously, you are going to have to stop being so antagonistic. I've seen you act this way here, on Reddit, and on GitHub...
The sarcasm, childish comments, and resorting to insults the second someone criticizes your code isn't giving me any hope that you can competently maintain a project like this.
Take a look over the HN Guidelines [1] and the "Approach to comments"[2] sections of the site.
There's no need to resort to insults in-reply to questions or criticisms.
It's a valid question (that i think you did answer, but not very well). Dropping to very unsafe "lower levels" should only be done when absolutely necessary as a single mistake here could cause massive security issues.
Humility and a collaborative attitude are really important to get the best results.
When people attack your code, it isn't an attack on you. It makes your code better. It's one of the downsides to releasing any opinionated project (and many good projects are opinionated).
For my part, I won't use software written by someone who doesn't either refute criticism or use it to improve code, and I'm not satisfied you're doing either of those.
According to logic, this quote "yeah - the C++ bits were clearly written by someone who doesn't know the language well. I'd be careful about using this code in production." is a personal attack. I get offended by this, personally. When I get offended, personally, I answer how ever I see fit.
Thank you, you will be missed. I don't know what to do without you.
The comment that provoked you was rude and dismissive and the sort of thing we ask people not to post. That said, the guidelines here ask you to remain civil even when someone else is uncivil and/or wrong. That's an important rule that we all have to abide by—though it's a challenge, especially when one's own work is being discussed—because otherwise the discussion quality will rapidly deteriorate.
So please either make substantive neutral replies if you can, or don't post anything until you can.
It has been said posting code on HN a literal "Trial By Fire",
But reading the comments in this thread this has gone overboard....to the point I, as a casual observer, had to say something.
I agree, "clearly written by someone who doesn't know the language well", _IS_ a direct attack on the developer and not the code. And I couldn't comprehend the mental gymnastics it would take to explain otherwise.
There is a major difference between:
"Why did you implement your own Queue here?"
and
"This guy doesn't understand C++. Don't use his code."
Don't let them win by giving into replying with a "Redditor" form of rebuttal filled with snark.
When reading your comments and code on the web, it seems like you're really an awesome dev, but you give in to hate comments by such people too easily.
You obviously have strong options about this. That said let me offer you a bit of advice.
I don't care how good your code/framework is if you blow up on just a minor bit of criticism. This is a great example where some comments in code could help teach people who don't know better.
Long gone are the days where the solo programmer could make important software without interacting with the rest of the outside world. Knowing when to check your ego at the door is just as important as getting the technical bits right.
There are many improvements that could be made to this code. However the developer is extremely antagonistic and unwilling to accept criticism so I doubt that the issues will ever be fixed.
It's code like this that gives C++ a bad reputation. It's not modern in any sense. Compiling it with my default warning level in clang gives 482 warnings! Here's a summary:
warning: cast from '...' to '...' increases required alignment from 1 to X [-Wcast-align]
warning: declaration shadows a field of '...' [-Wshadow]
warning: declaration shadows a local variable [-Wshadow]
warning: implicit conversion changes signedness: '...' to '...' [-Wsign-conversion]
warning: implicit conversion loses integer precision: '...' to '...' [-Wconversion]
warning: implicit conversion loses integer precision: '...' to '...' [-Wshorten-64-to-32]
warning: macro name is a reserved identifier [-Wreserved-id-macro]
warning: no previous prototype for function '...' [-Wmissing-prototypes]
warning: operand of ? changes signedness: 'int' to 'char' [-Wsign-conversion]
warning: unused parameter '...' [-Wunused-parameter]
warning: use of old-style cast [-Wold-style-cast]
So you enabled pedantic warning level, and you got a bunch of pedantic nonsense warnings.
There is a reason for these not to be enabled by default.
* Unused parameter -> rly? Who gives a damn?
* Use of old style cast -> Well I'm old style, get over it.
* No previous prototype declaration -> Again, I do this if I want to.
* Shadows field -> who cares? No me.
* Cast increases required alignment -> Well, obviously the perf cost is not an issue here.
* Etc, etc, etc
These are pedantic warnings. However, this is an open source prject and you are free to send me PR's whenever you want.
Maybe using a `reinterpret_cast` would actually have better performance. A C-style cast tries to do a bunch of casts in order, starting with a static and ending with reinterpret (see http://anteru.net/blog/2007/12/18/200/).
There's no excuse for using c-style casting in C++ considering the depth of the different casts we as developers have at our disposal.
Additionally, I just looked at the code, and is there any reason it's all stuffed into a single header and source file? I don't know if I'm just being naive, but isn't this a slightly bad design? There seem to be a lot of different data structures that could easily be broken out and make it a bit easier to follow the flow of the project.
Yep, I seem to recall there was one in MySQL a while back that just involved a simple int->char conversion where the overflow would trigger a match due to a random seed that was involved in the password path.
I'm a big fan of Wall Werror with pragma for specific sections where you must work around the warnings. Does a great job of catching issues with contributions.
"Use of old style cast -> Well I'm old style, get over it."
"By using Linux I haven't been limited by the lacking Microsoft C++ compilers only supporting a fraction of the language, but instead been able to use the very latest features and tools."
Why do I care if it's using an "old-style cast" (I presume something like `(float)var` instead of `float(var)`, replacing a reserved identifier with a macro, or contains implicit casts (like `1.0 + 1` instead of `1.0 + float(1)`)?
Don't C++ compilers give lots of unimportant warnings?
Yes, my friend. This guy is trying to roast the library based on bullshit warnings that normal projects (Node.js as one example) explicitly disable. So if you want to make a case out of this, you will have to report this "issue" to Node.js developers also, because they also have a lot of warnings ignored by turning them of, simply because they are pedantic.
I'm currently using the `ws` library for a custom live-streaming solution (broadcasting binary data) and the memory/CPU usage is indeed not small for a handful of clients. What does `uws` bring new (technically) to the table?
You say you support Windows, but when I try to use it, it gives me an error
Error: Compilation of µWebSockets has failed and there is no pre-compiled binary available for your system. Please install a supported C++ compiler and reinstall the module 'uws'.
uWebSockets is bundled as part of SocketCluster http://socketcluster.io/ and we plan to make it the default in SC v5. We got a massive speedup! Highly recommended.
No offence, but I don't think you fully grasp the situation here. This is an extremely optimized fully native server written with low-level CPU awareness. You can absolutely not, in any possible way, write this in Python.
I can understand that you might be a bit frustrated right now, because perhaps that level of performance in pure python may literally be difficult to impossible in this case. But, the question the person was asking was a very reasonable one -- something along the lines of "I'm using this solution in python right now, and I'd like more performance -- what are my options?"
It seems like you not only misunderstand the question, but felt the need to question their intelligence and give a rude, vague, and overall unhelpful answer. As a piece of communication, it is overall useless to everyone involved. Please be mindful of the way you come across. There's no need to insult, dismiss and disrespect others. It only takes a single moment, and saves time and energy for both you and them. You could rephrase like "No. You can approach this to a level of <percentage_of_perf>, but it will be hard to pass that point, due to the way the library is written." If you did that, you'd add some very valuable information to the conversation with little effort. It would be a win win for everyone.
Beyond that, assuming your benchmarks are accurate, this seems like a prime library for someone to write a python wrapper for! There's autobahn-twisted right now, but I'm not sure how well it performs in comparison.
Toss a coin and it will land on someones holy ground..
My intentions were not to harm, that was why I said "no offence, but". I cannot more than explain myself. Sorry if I offended anyone (despite explicitly saying "no offence"). Someone should probably censor me, like, a lot.
You've written what potentially appears to be a promising library. Great! In fact, it seems so promising that people are trying to find the equivalent in the language of their choice. Even better! Why not encourage them to write a wrapper for your fine library in their language of choice? Not everyone uses Node, after all. Maybe that person asking for equivalents in Python would've written a binding if you told them "Hey, you could try to do this in pure python but because of the relative performance of my library, you might want to consider writing a binding to µws."
I doubt you are trying to harm anyone. But you're not being very helpful. You say that you've "landed on someone's holy ground" but there is a very low chance that is going on. They probably just want to get a job done, and they want to figure out if your tool's a good fit. All it takes is a little bit more thought before you type out a response.
I'm not telling you to censor yourself. I'm telling you to stop worrying about explaining yourself, and start thinking about being more helpful. I'm telling you to do it, because it will make things easier for you. You might have written the library, but other people are going to be the ones who use it. They're going to ask you questions, and you're going to think some of those questions are stupid. It's okay. But if you try to be helpful to them even if you think their questions are stupid, you'll spend far less time writing defensive comments on HN, and far more time watching adoption for your library grow, which I assume is something you may want.
Thanks, I can certainly help people with questions if they need help in writing a Python wrapper. I think that would be a good solution but it would need to integrate seamlessly with the rest of their app. Posting on GitHub would be a good start in this, or Gitter.
asyncio is an asynchronous I/O framework shipping with the Python Standard Library. In this blog post, we introduce uvloop: a full, drop-in replacement for the asyncio event loop. uvloop is written in Cython and built on top of libuv.
uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs."
The question asked was if there is "anything [similar in] python", so there's an obvious requirement of being able to use it from python. That leaves either something with a bit of a friendly python wrapper, or just calling out directly to (only) pure C/C++ code. The latter might be even faster, but at that point it's questionable if calling it from python really is worth the trouble at all.
So, I think it's fair to say that "uvloop may be 'something [similar in]' python".
Although, thinking a bit more on this (and your more helpful reply to another poster), it might very well be that for web sockets, a better performing solution for python might be to write a wrapper for uWebSockets (the c++ implementation) in python.
I'm guessing that the scaffolding code for the websocket-part in python when working with uvloop might indeed give a meaningful performance and/or memory hit (I'm leaning towards memory probably being the most significant difference here).
I'd be interesting to compare a python+uvloop websocket implementation and uWS (both using nodejs and c++) -- and at some point see if wrapping uWS for python would make a meaningful difference.
You need to realize, that libuv itself was simply too heavyweight for this project. Think about that statement for a while.
This is why I use UNIX syscalls directly, and only use uv_poll_t, not the full-on uv_tcp_t. This is the level of optimizations we are talking about -> when libuv is considered too heavyweight...
When libuv becomes too heavyweight to keep up, having this discussion about how a Python async network library could implement similar performance is just purely ridiculous.
My answer was directed to the comment you made about performance for Python and yet the lib you provide has binding for Nodejs which is not a very fast runtime.
You can create a C binding and it does integrate with libuv.
libwebsockets is targeting the embedded world with a smaller code footprint (libc vs libstdc++). libwebsockets performs very good in memory and CPU time.
I don't think so but Meatier (https://github.com/mattkrick/meatier) should receive uws as default engine since it build on SC, you could look into that but honestly I don't understand this project at all :P
We are working with SocketCluster to make µWS default in version 5, and I have gotten a lot of help from a lot of people during these months.
Thanks for support, I will be accepting PR's and receiving issues that we need to fix before making any kind of official stable release.
Also, try to ignore the hateful comments - these commenters build their cases on thin air, and if you actually do find anything you want to change - I will accept PR's that can be shown to improve the library.