Get it here: ktasyncio.
I love multiprocessing since you can easily move the processes to different servers. But if you want to be efficient without using any io-threads you have to do asyncio.
My current pet project is CPU intensive, doing calculations, from time to time it stores results in a Kyoto Tycoon key-value store. The calculations can be done in multiple-light-weight threads: coroutines. So when we the program stores a value another coroutine will be resumed and the CPU is still busy.
Here is a little, probably unmeaning, benchmark:
orig get_bulk qps: 34811 orig set_bulk qps: 26580 get_bulk qps: 40689 set_bulk qps: 24900 batch get_bulk qps: 63306  Connections used: 20 dbm get qps: 31883 dbm set qps: 8803
In a nutshell it is nowhere as fast as using Kyoto Cabinet directly, but it is as fast as using dbm. And since the CPU continues calculating, I don't care much.
 This is a test of what I call io batching. Which in this case means using 20 connections and send requests in parallel.
It is based on python 3.4 asyncio but also supports trolluis. But with python 2.7 performance is much worse. It can also run an embedded kyoto tycoon process, which is very handy during development:
client = ktasync.KyotoTycoon.embedded(["test.kch"]) client.set_bulk_kv(req)