2013年1月24日 星期四

gevent scheduler is not ideal for all cases?

There's time when everyone can write codes. No matter how good or how bad you are. Free world, free will. Everyone can do whatever he/she likes.
At first, everyone is happy because there are no rules and no boundaries. People act according to their free wills. But then, at a point, things changed...



"You bastard! What are you doing here!?", the farmer shouted.
"Of course building my own house~ I love this beautiful lake so I'm going to build my dream house near it~", the builder said happily.
"You see my house, no!? I love this lake too, that's why I built my house here! But your house is going to block my view, you bastard!"

Because everyone is just doing whatever they likes, they don't care how others feel.
Of course, chaos happen.

======================================================================
Okay, irrelevant part ends. Here's the main part.

gevent. A good library providing support to a large amount of connections.
You can put codes in a number of co-routines, or even one connection for each co-routine, gevent will helps you schedule them. Unlike multi-threading, multiple co-routines will work on the same thread. Less resources will be used. Normally people can use it happily.

But here is the problem: the scheduler doesn't perform well in some situations. Seems like there are one rule that seems so correct that noone will notice that it's not suitable for all cases. The rule is that when one of the co-routines occupied for a longer period of time than other co-routines did, it will take more time before it's being scheduled to run again. Of course the correct part is: there will be no dominator.
Fairness? No doubt. But then another problem arises: what if that co-routine actually needs more time to do its job? If the co-routine is being scheduled less frequently, the job will span across a longer period of time.
The troubling part: there seems to be no obvious function calls, raising of a particular flag or whatsoever to deal with this issue...

You may prefer to read code more than reading paragraphs with tons of jargons:
import time
import gevent.socket as socket
import gevent

class Test:
    def __init__(self):
        self.counta = 0
        self.countb = 0

    def coa(self):
        ipc_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM,
                socket.IPPROTO_UDP)
        ipc_sock.bind(('127.0.0.1', 33302))
        while True:
            ipc_sock.recvfrom(100)
            #act as a load for this co-routine
            time.sleep(0.0001)
            self.counta += 1

    def cob(self):
        #if this extra load is larger,
        #  less chance being scheduled to run
        LOAD = 0.1
        while True:
            gevent.sleep(0.0001)
            #act as a load for this co-routine
            time.sleep(0.0001+LOAD)
            self.countb += 1


if __name__ == "__main__":
    h = Test()
    gevent.spawn(h.coa)
    gevent.spawn(h.cob)

    for i in xrange(1000000):
        print h.counta, h.countb
        gevent.sleep(0.001)

Assuming there is another program that will fill up the buffer of that socket endlessly. Result is: self.counta is much larger than self.countb when the program runs for at least a few seconds. Meaning that coa has more chances to run than The difference is much more obvious if the extra load on co-routine cob() is larger.

Workaround?
1. rewrite your code in another way (well...)
2. write your own scheduler to give a little more time to a specific co-routine similar to cob() in this case.
3. split the work of time-consuming co-routine to a number of co-routines, hoping to have all the co-routines having similar amount of workloads, no more no less.