Friday, November 26, 2010

Testing Interactive Code

The preceding work on the Render and Project Settings dialogs in PiTiVi has led to some fairly complicated UI logic, and we haven't really got a good way to test it.

Imagine that we want to test the following trivial python UI
import gtk
import gobject
import random

def b_clicked_cb(button):
    clicked = True
    button.props.label = "Ouch!"

w = gtk.Window()
v = gtk.VBox()
b = gtk.Button("Click me")
b.connect("clicked", b_clicked_cb)
v.pack_start(b)
w.add(v)
w.show_all()

gtk.main()
We want to check that the button label is correctly changed after being clicked. Because of control inversion, we obviously can't code in a direct, sequential style. At the very least we must install a timeout or idle function into the main loop.

This is a naive approach:
def test_case():
    b.activate()
    assert b.props.label == "Ouch!"
    return False

gobject.timeout_add(1000, test_case)
gtk.main()

It doesn't work because of timing issues. The button's label won't change immedately after activate() is called, so you get the following error -- despite the fact that the the label clearly changed.

Traceback (most recent call last):
  File "figure_1.py", line 19, in test_case
    assert b.props.label == "Ouch!"
AssertionError

The obvious solution is to split the callback in twain:

def test_case():
    b.activate()
    gobject.timeout_add(1000, finish_test)
    return False

def finish_test():
    assert b.props.label == "Ouch!"
    gtk.main_quit()
    return False

gobject.timeout_add(1000, test_case)
gtk.main()

It's good enough for this trivial example, but it should be clear that as the complexity of the UI increases you'll end up with yet another maze of callbacks.

Last night I experimented with applying python generator functions to this problem. The idea is to (ab)use the yield keyword as a way of passing control between the test and the mainloop. It's similar to concepts presented in this paper, which describes how continuations can be used to solve the problem of control inversion in web programming.

Quick summary of python generator functions: the yield keyword in python is a limited form of either lazy evaluation or continuations, depending on your point of view. Usually we think of generators as sequences. For example this generator can be thought of as the sequence of positive integers from 1 to 10
def ints():
   i = 1
   while i <= 10:
      yield i
      i += 1
I can treat it as sequence and take its sum, or iterate over it:
print sum(ints())
squares = [x ** 2 for x in ints()]
What actually happens when squares is called is that an iterator is created and returned to the caller. Yield indicates to the interpreter that squares() is a generator, and that the state should be saved so it can be re-entered later. The fact that the state of the function is saved inside iterator allows us to think of the iterator as a continuation (generators are not as powerful as continuations). With this in mind, we re-write our test function:
def test_case():
    b.activate()
    yield
    assert b.props.label == "Ouch!"
Here the yield keyword is going to provide the same control flow boundary that splitting our code into separate functions did earlier. Now all we need is a bit of code to consume values from this iterator until it is exhausted. We do this using timeouts -- as we did in earlier examples -- which allows the mainloop to continue running in between the two halves of our test case.
def run_test_case(iterator):
    print "Tick"
    try:
        iterator.next()
    except StopIteration:
        print "Test Case Finished Successfully"
        gtk.main_quit()
        return False
    except Exception, e:
        print "An error occured"
        gtk.main_quit()
        return False
    return True

gobject.timeout_add(1000, run_test_case, test_case())
gtk.main()
This is already an improvement, but not yet flexible enough. Suppose we add a new widget:
def c_clicked_cb(button):
    def set_label_async():
        c.props.label = "Ouch!"

    gobject.timeout_add(random.randint(500, 5000),
        set_label_async)
c = gtk.Button("Async Operation")
c.connect("clicked", c_clicked_cb)
c.show()
v.pack_start(c)
Now we have a problem: we have no idea when the action triggered by clicking the second button will complete. Simply waiting for one second will not always work. At the very least we should be able to override the default sleep value. But it would be better still if we could wait until the label value itself is changed. That way if the action takes only a short time, we don't have to wait, while if the action takes longer than expected, the test can still finish successfully. In other words, we shouldn't just assume that after each step in the test we straight on to the next one. Let's factor out portion of the loop that does the scheduling:
class Sleep(object):

    def __init__(self, timeout=1000):
        self.timeout = timeout

    def schedule(self, iterator):
        gobject.timeout_add(self.timeout, run_test_case, iterator)

def run_test_case(iterator):
    print "Tick"
    try:
        scheduler = iterator.next()

    except StopIteration:
        print "Test Case Finished Successfully"
        gtk.main_quit()
        return False

    except Exception, e:
        print "An error occured"
        gtk.main_quit()
        return False

    scheduler.schedule(iterator)
    return False
Now we can easily customize the timeout for the second button specifying a Sleep scheduler with a different timeout.
def test_case():
    b.activate()
    yield Sleep()
    assert b.props.label == "Ouch!"

    c.activate()
    yield Sleep(6000)
    assert c.props.label == "Ouch!"
Actually we can go one better. We don't have to rely on timeouts for scheduling at all. For example, we can easily define a scheduler that will wait for a signal to fire:
class WaitForSignal(object):

    def __init__(self, obj, signame):
        self.obj = obj
        self.signame = signame
        self.iterator = None
        self.sigid = None

    def schedule(self, iterator):
        self.sigid = self.obj.connect(self.signame, self._handler)
        self.iterator = iterator

    def _handler(self, *args):
        run_test_case(self.iterator)
        self.obj.disconnect(self.sigid)
Adopting this is just a one line change to test_case():
def test_case():
    b.activate()
    yield Sleep()
    assert b.props.label == "Ouch!"

    c.activate()
    yield WaitForSignal(c, "notify::label")
    assert c.props.label == "Ouch!"
And we don't have to touch run_test_case() at all. I think this idea could be expanded into a framework for testing event-driven code. True, I would want much better error reporting. But with just that, it would be pretty straight-forward to cover every part of PiTiVi's interface except the timeline (for the Timeline, I need is the ability to synthesize raw input). If necessary, I can include other types of scheduling scenarios, such as waiting for a file or socket access. And, because I can work with the widgets directly, it's possible to verify conditions that would be impossible to check for under Dogtail or LDTP (which are both limited to what AT-SPI exposes, and run the test from a separate process). Full Source

7 comments:

brandon lewis said...

What I wrote about generators was not quite correct. A generator function does not run until it first reaches the yield statement. The following should be sufficient to prove this:

In [1]: def foo():
...: print "bar"
...: yield
...:
...:

In [2]: foo()
Out[2]:

In [3]: foo().next()
bar

In [4]:

brandon lewis said...

Corrections published.

SEJeff said...

Redhat uses dogtail[1] for testing all of the gui stuff in RHEL and Fedora.

[1] https://fedorahosted.org/dogtail/

brandon lewis said...

I'm aware of both LDTP and Dogtail. I was disappointed with both.

Moreover, Since PiTiVi is already a python program, it's not like it's really necessary to have an externally testing framework.

Brett Alton said...

I'm 3 semesters away from graduating with a CS degree and you're not making me look forward to work in the real world!

Seriously though, is this a limitation of Python? Would it be better to be programmed in another languaged, such as one that is strongly typed and has better callback features?

Just curious/learning.

brandon lewis said...

Is what a limitation of python? Python has many arbitrary limitations, but I'm not sure which you're referring to...

alsuren said...

Twisted provides this kind of functionality in the form of the @inlineCallbacks decorator for generators. It also has some integration with the gtk mainloop and other things. The key innovation that twisted has over your prototype is that it uses iterator.send() to allow constructions like webpage = yield http_get(url). This works by making returns a Deferred object (which can be thought of like a Promise or Future if you've seen the concept in other languages)

I am currently using it for the Fargo gateway project, and I have proxy dbus methods that return deferreds (for use with @inlineCallbacks) but I have had to use some slightly more ugly code for turning dbus signals into deferreds that can be waited on.

We also use twisted for testing most of our connection-manager code. In this case, we base everything around an event queue, and have a function q.expect(event_pattern) that spins the mainloop until the event is received, or a timeout occurs. In theory, you could make q.expect return a deferred (and I have patches to do so) but if you have a lot of functions that are all using @inlineCallbacks, it adds loads of cruft to your tracebacks (so in the end I dropped those patches on the floor).