Because, that's why.
Okay, okay. The author wanted something like Erlang in Python. Erlang's runtime is awesome for writing high concurrency networking code, but he wasn't so crazy about the language itself. So diesel is a long-term effort to capture some of Erlang's goodness and get in into Python. There are a few moves remaining related to clustering, named processes and name services, and brokerless message dispatching. But development on all those is in progress.
Diesel was started by Jamie Turner (jamwt) in the Fall of 2009 as a way to explore using Python 2.5's generators (with .send()) to implement a coroutine-like non-blocking I/O framework for an online real-time chat application.
In January 2010, Jamie joined a startup called Bump Technologies, and he and Will Moss (wmoss) started using diesel as the basis for the "Bump 2.0" server platform, which would run a custom protocol buffer-based wire protocol to iPhone and Android handsets.
In June 2010, Will and Jamie realized that the generator-based approach, while appealing in its novelty and dependency-free purity, simply had far too many caveats and far too much complexity to justify it vs. a greenlet-library based appraoch with "true" coroutines. They rewrote diesel in a weekend, and that became Diesel 2.
Bump 2.0 launched in July of 2010, powered by Diesel 2. Over the next 18 months, Diesel 2 was optimized, hardened, and battle-tested as Bump's userbase grew from 12 million to 80 million users, and daily traffic increased tenfold.
In early 2012, Bump decided to release formalized tests, clean up some APIs, document the project, and launch a new website for Diesel 3.0.
Basically, (almost) everyone agrees that non-blocking I/O is a great thing
from a scalability perspective. Most of the super-scalable daemons we run
every day are written in an asynchronous fasion on top of the
kqueue(), etc, system call. Resource usage per socket is very low,
scheduling is very efficient even among 10s or 100s of thousands of sockets,
and so on.
What's less good is callback-based programming vs. simple call-and-block
based programming. Call and block style is just easier to understand and
easier to work with. How do I know? Because when we have the liberty
to choose either style, we choose it every time. Is
callback based? Of course not.
Where callback-based programming gets really hairy is when you want to do additional secondary I/O to satisfy some primary handler called by the event loop. For example, you want to issue one (or, god forbid, many) requests to a database server to get some information to satisfy a nonblocking http request. You'll need to put together a callback chain that strings together several database requests, each invoking another callback (so the event loop can do the I/O required), and each meticulously attaching error handlers at each stage since traditional exceptions will not work in this model. This is "callback spaghetti", and if you're nodding along right now, sorry for your pain, brother.
The fundamental issue here is that the event loop needs to get control
back to do the I/O for each of these database requests. This is awkward
in the traditional method because we need to go back up the call
stack to get to the event loop's frames (who called us). We
So what we need is some way to resume the event loop without unwinding the stack so that we can keep the state in our handler just as it is until the database request is done.
The coroutine approach taken by the greenlet library lets us do that: when we need to do more I/O, the state of the stack is "frozen" into the heap, and then the event loop's stack is restored and is run. When each database query is finished, our stack is "unfrozen" and restored, and then our frame can resume running with the result of the database query.
TL;DR: They're magicl.
Yes, yes it does. And it does a pretty fantastic job.
Diesel's goals are a bit different from gevent's though. Gevent seeks to (and largely suceeds) magically make your existing libraries non-blocking by monkey-patching the socket module, the threading library, and other blocking calls. Diesel seeks to provide a whole new integrated platform of libraries and tools that are designed together and tailored specifically for writing highly concurrent network code and clustered services.
Each approach has it's strengths. With gevent, you have an amazing array of existing networking modules available that will (likely) Just Work. With diesel, you'll have far fewer; but the ones that are there may be more consistent in terms of API and semantics, quality and performance, and will doubtlessly be more tightly integrated with the framework ecosystem.
Greenlet is the library name that implements coroutines for Python. These coroutines, when they cooperatively multitask in user-space are often called "green threads" (hence, "greenlet") or "lightweight threads". Diesel calls its thread-like abstraction on top of the greenlet library a Loop. So diesel documentation generally refers to these as "Loops". But you can treat them as essentially all the same thing. Sorry for the confusion!
@calldecorator necessary on Client methods?
Diesel's protocol functions are really easy to work with in part
because their is exactly one contextual "associated socket" for each function call.
I.e., if you're in a service handler, one particular client socket will be
@call decorator acts as a signal to diesel that the
associated socket is changing; that, once inside this method,
send() and friends are targeting the socket for this
particular instance of the Client class, and not any other socket
(like the Service handler, perhaps) that called the client method.
receive()work? Isn't it slow to do small reads like that?
In diesel's low-level event loop, the socket is always technically readable, and diesel will eagerly buffer whatever is given to it by the connected party. Your loop will only be paused if the current buffer cannot satisfy the sentinel. Diesel is fairly efficient about buffering and scanning, so you don't usually need to worry about how it all works under the covers. Just say what your protocol needs and it will be given to you while maintaining throughput.
For now, yes. If you ask for very large
you will pile up a lot of memory in buffers. Don't do that. Or if your
application is so busy (or blocks incorrectly) your handler is not getting
invoked often enough to consume/clear the buffer (unlikely but possible),
you could pile up a lot of memory. It's not crazy to think a future
version of diesel will allow you to specify the maximum amount of data
you'll allow to be read off the socket and kept in buffers, but in the mean
time, try not to make these things happen.
send() function can take either bytes or a file-like
object. If you give it a file-like object, it will read from it incrementally
and stream it to the remote socket. In fact, diesel.web supports Flask's
"X-Sendfile" capability using this method.
thread() function to run the job seamlessly in a (real)
background thread and then resume your loop with the result when that
thread is done.