"If I'm reading this right, only one thread can be enqueueing (anything) at a time, and only one thread can be dequeueing (anything) at a time, yet two threads can be enqueueing and dequeueing the same queue at the same time, and step all over each other? That can't be right."
I wondered at this as well a few months ago, but looking at ac.scm, atomic-invoke basically ensures that there's only ever one thread inside any atomic block. This is the equivalent of python's GIL, perhaps.
Fo' shizzle. The timing of this item is great, because I've been struggling with performance in my arc-based webapp. I have one thread bringing new items into the index, and it's constantly thrashing with UI threads causing unacceptable latencies.
I don't use atomic myself. But I find that enabling the back-end thread slows down the UI threads. I've been struggling to create a simple program to illustrate this.
In fact, wrapping parts of my UI threads in atomic actually makes them more responsive[1]. I think it's because instead of getting 200 thread switches with their associated overheads, a single thread runs to 'completion'.
[1] You're right that throughput may go down, though I try to eliminate IO in atomic blocks. But latency is more of a concern to me than not wasting CPU. It's not the end of the world if new data takes a few seconds longer to make it to the index, but a few seconds in the front-end can make or break the user experience.
"vector-set-performance-stats!" in http://docs.plt-scheme.org/reference/runtime.html returns the number of thread context switches that have been performed, so you might try calling that at the beginning and at the end of an UI operation so you can see how many thread switches happened in between.