The jogl3-backend uses depth peeling to implement the rendering of transparent objects. This method is rather resource consuming, because it has to render the transparent objects as many times as there are layers. The first layer is all the faces that are directly seen by the camera if the object was not transparent. If you remove the first layer, the camera can see the second layer, etc. For example a sphere has two layers, whereas a torus has 2 or 4 layers depending on the viewpoint. Objects with about 5 layers render quickly, whereas more complicated stuctures with 20 or more layers let the framerate drop notably. But for offscreen rendering it is still very quick. By the way, the performance is fragment shader bound, this means your application runs faster if the window is smaller and/or AA turned off.
See below a comparison of the two openGL backends: