Re: performance of CAS impacted by use of ThreadLocal
On Aug 28, 2019, at 2:17 PM, 'Timur Evdokimov' via CAS Developer <[hidden email]> wrote:
Just want to share some recent observations.
We did some profiling of CAS (mostly around OIDC) and it turned out that ClientInfoThreadLocalFilter alone is responsible for substantial latency.
(profiling done using OpenJDK 11.0.2 on MacOS).
ClientInfoThreadLocalFilter, as one may guess by its name, uses ThreadLocal to store user data which is used down the line in some places.
Thanks for the analysis. Very interesting. Is there data you can possibly share, in form of a blog post  perhaps at https://apereo.github.io ?
This filter was removed (in a rather hacky way, by removing cas-server-core-audit-*.jar and **/cas-server-core-events-*.jar from WAR.
It worked wonderfully, the overall CAS endpoint latency dropped more than 2 times.
You shouldn’t have to remove the JARs (the event stuff is removed by default anyway). IIRC, there are specific “.enabled=false” properties (assuming you’re running 6.1 RC5) that let you disable eventing and auditing altogether without having to mess around with exclusion rules.
There are few places where ClientInfo is needed, namely in audit logging and to inject client IP in TGC cookie.
We can continue using CAS with that hack, without the fix, as CAS audit doesn't really fit our model and we don't need IP-bound cookies wither.
But one could just pass ClientInfo down the call stack, explicitly, without resorting to ThreadLocal (inherently unsafe and slow technique that shouldn't be used at all).
I wonder 1) if this is a known issue, and 2) if there are some plans to deal with that.
This will likely require not only API changes, but there will need to be a way for components to locate the ClientInfo which may not be part of the call stack or have access to the required objects to build ClientInfo on their own. It would be a very tricky change to execute, but nothing a WIP PR wouldn’t be able to explore. I can’t remember if this ever discussed at all, but nonetheless, most performance improvements/problems are rather unique and load-specific and just because it hasn’t happened yet, it doesn’t mean it’s not there.