Clojure compiler class cache and JVM soft references

A follow-up to the last post when I still thought it would be easy to clear JVM soft references. Spoiler alert, it's not.

A quick background on JVM references:

The reason the JVM has more than one reference type is garbage collection. They are all handled differently when the GC runs, so if you don't care about that, there is no reason to create any of the non-default types.

Strong references

These are the default references if you don't do anything special. As long as there is a strong reference somewhere, the referenced object will not be garbage collected.

Weak references

These exist so you can hold on to some object without owning it. This is useful for a publish/subscribe or caching scenario where you want the objects to be collected if no other code uses them anymore.

Soft references

These work like strong references until the JVM needs the used memory. This is the case when an OutOfMemory Exception would be thrown. Right before that happens the GC runs and soft reference gets collected to hopefully make enough space for the execution to continue.

Phantom references

These work like weak references, except that they don't allow you to actually get the object they reference. The use-case is a custom finalize workflow.

Clojure class cache

The Clojure DynamicClassLoader has a map of all the classes it has currently loaded. It holds those classes in soft references so that they do not get collected unless there is no more memory to hold on to them.

This makes total sense since recompiling those classes is expensive but possible.

This does pose a problem for though if you want to build a tool that determines which classes are actually reachable. As I mentioned above, the only way soft references get cleaned up is when the memory fills up. This means any anonymous function that's not in use anymore still stays around mostly forever.

Solutions

Blow up your memory

If the only way to get rid of objects in soft references is to fill up the memory, why not just do that?

(defn oom []
    (try (let [memKiller (java.util.ArrayList.)]
            (loop [free 10000000]
                (.add memKiller (object-array free))
                (.get memKiller 0)
                (recur (if (< (Math/abs (.. Runtime (getRuntime) (freeMemory))) Integer/MAX_VALUE)
                                (Math/abs (.. Runtime (getRuntime) (freeMemory)))
                                Integer/MAX_VALUE))))
            (catch OutOfMemoryError _
                (println "freed"))))

This function will produce empty arrays until there is no more space left. Sadly the JVM garbage collector is pretty smart and will sometimes optimize this away when the array is never accessed, so the .get is needed.

But even this does not always work since the GC is being smart again. It will not clean up the soft references if they are too small to help with the OOM exception. This can be fixed by creating even smaller arrays, but this will get pretty slow at some point.

The bigger problem is that your JVM instance will now expand its memory usage until your repl hit's the default 4gig. This is not a very good user experience.

JVM Tools Interface

The JVM TI allows you to get all strong references to an object. So this would be an easy solution, right?

Sadly not every JVM has support for the Tools Interface and even worse, it's hard to use since you have to write native code and connect that to the running JVM.

Conclusion

At this point, there is no good way to find out which functions in the running VM are still in use.

I'd love to be wrong about this, so if anyone has any more ideas I'm very open to suggestions.

Discuss this post here.

Published: 2022-01-07

Clojure analysis and introspection

edit: updated the static analysis part to be more balanced

While writing omni-trace I ran into a common tooling problem:

Which references to my function exist in the codebase?

Or turned on its head:

What functions does a function call?

An IDE would use this information to help you with refactoring, and a linter would warn you when a function is missing.

Static analysis

Traditionally this information is generated by static code analysis, although this is harder the more dynamic a language is. Imagine code like:

(-> (str "i" "nc")
    symbol
    resolve
    (apply [5]))
;; 6

There is no general way to know that the inc function is referenced without running the code.

Problems like these aside, static analyzers like clj-kondo can still give us the needed information most of the time:

(require '[clj-kondo.core :as clj-kondo])
(-> (clj-kondo/run! {:lint (string/split (System/getProperty "java.class.path") #":")
                    :config {:output {:analysis true}}})
    :analysis
    :var-usages)
;; =>
[{:alias ana,
  :arity 1,
  :col 59,
  :end-col 89,
  :end-row 36,
  :filename "src/cyrik/omni_trace/instrument/cljs.cljc",
  :fixed-arities #{1 2},
  :from cyrik.omni-trace.instrument.cljs,
  :from-var ->ns,
  :lang :clj,
  :name find-ns,
  :name-col 60,
  :name-end-col 71,
  :name-end-row 36,
  :name-row 36,
  :row 36,
  :to cljs.analyzer.api}
;;...
  ]

This works great but has two new problems:

it's kind of slow (40 seconds for that project, including jars)
does not work for running processes where you don't have the code (remote repls)

The speed comparison is very misleading at this point since clj-kondo does a lot more than just finding dependencies and this is just a one-time cost if you have some smart caching.

Another benefit is that you don't need a running env to get help from clj-kondo or other static analyzers.

The second problem can not really be fixed and if you have multiple tools that need that analysis like LSP + something, there is no easy way to share it, since they usually don't run inside your repl.

Runtime introspection

Lisps usually come at the same tooling problem from the other side, where the idea is:

I already have a repl with my code, so that should be able to give me all the runtime information I want.

This idea can be seen in orchard, which uses the running JVM to answer questions about the code. Its implementation of fn-deps showcases that beautifully:

(defn fn-deps [val]
    (set (some->> val class .getDeclaredFields
                  (keep (fn [^java.lang.reflect.Field f]
                          (or (and (identical? clojure.lang.Var (.getType f))
                                   (java.lang.reflect.Modifier/isPublic (.getModifiers f))
                                   (java.lang.reflect.Modifier/isStatic (.getModifiers f))
                                   (-> f .getName (.startsWith "const__"))                                  
                                   (.get f val))
                              nil))))))

This code was written by Rich Hickey for REBL and generously shared with the community.

The code relies on the fact that the Clojure compiler generates a class for every function and that class has fields with vars pointing to the functions that it's going to call. This is done so that when you redefine a function its call sites don't need to be recompiled, since the var will now point to the new function.

If you are interested to see the exact bytecode or a Java class version of this there is a great library and blog post by Alexander Yakushev.

This is very fast, since it only has to do field access, but has a major problem. It does not handle lambdas or inline function calls.

(defn dummy []
    (map #(inc %) (range 10)))

The dummy function will have a reference to map and to range, but there is no reference to the anonymous function inside the dummy class. But of course, the dummy class does have to know about the lambda somewhere, so if you check the generated bytecode you will see a reference to it inside the invokeStatic method call.

A reasonable question would then be, how do I get the bytecode? Sadly there is no direct way to get at the running bytecode, since the default JVM classLoader throws it away after loading the class.

So a more roundabout way to get at all the references inside a function is to use the Clojure compiler and a custom classLoader. This way it's possible to remember the bytecode.

This was my first attempt to "fix" orchards fn-deps (modified from gist):

(def classbytes (atom {}))

(defn recompile [ns-sym form]
  (push-thread-bindings
   {clojure.lang.Compiler/LOADER
    (proxy [clojure.lang.DynamicClassLoader] [@clojure.lang.Compiler/LOADER]
      (defineClass
        ([name bytes src]
         (swap! classbytes assoc name bytes)
         (proxy-super defineClass name bytes src))))})
  (try
    (let [line @clojure.lang.Compiler/LINE
          column @clojure.lang.Compiler/COLUMN
          line (if-let [line (:line (meta form))]
                 line
                 line)
          column (if-let [column (:column (meta form))]
                   column
                   column)]
      (push-thread-bindings {clojure.lang.Compiler/LINE line
                             clojure.lang.Compiler/COLUMN column})
      (try
        (let [form (macroexpand form)]
          (when (and (coll? form) (= 'clojure.core/fn (first (nth form 2 nil))))
            (binding [*ns* (create-ns ns-sym)]
              (clojure.lang.Compiler/analyze
               clojure.lang.Compiler$C/EVAL
               (nth form 2)))))
        (finally
          (pop-thread-bindings))))
    (finally
      (pop-thread-bindings))))

(recompile 'playground.decompile '(defn dummy [a] (map #(println a) (range 10))))

This solution works very well produces two classes that can be searched for references.

While playing with this solution I realized that the actual bytecode is not needed, since the Clojure compile also just returns both class names, which can be used inside fn-deps to get all references!

This solution is faster than clj-kondo, even when recompiling my whole code, but still has the problem that you have to have all the source code.

Clojure dynamic classloader

After spending way too much time JVM class loaders it hit me that the Clojure class loader has an internal cache of all the classes it loaded. Since that cache is not public some reflection is needed to get at it, but the solution is pretty straightforward:

(defn- as-val
  "Convert thing to a function value."
  [thing]
  (cond
    (var? thing) (var-get thing)
    (symbol? thing) (var-get (find-var thing))
    (fn? thing) thing))

(defn- fn-name [^java.lang.Class f]
  (-> f .getName repl/demunge symbol))

(defn fn-deps-class
  [v]
  (let [^java.lang.Class v (if (class? v)
                             v
                             (eval v))]
    (set (some->> v .getDeclaredFields
                  (keep (fn [^java.lang.reflect.Field f]
                          (or (and (identical? clojure.lang.Var (.getType f))
                                   (java.lang.reflect.Modifier/isPublic (.getModifiers f))
                                   (java.lang.reflect.Modifier/isStatic (.getModifiers f))
                                   (-> f .getName (.startsWith "const__"))
                                   (.get f (fn-name v)))
                              nil)))))))

(defn fn-deps [s]
  (when-let [v (as-val s)]
    (let [f-class-name (-> v .getClass .getName)
          field (->> clojure.lang.DynamicClassLoader .getDeclaredFields second)
          classes (into {} (.get field clojure.lang.DynamicClassLoader))
          filtered-classes (->> classes
                                (filter (fn [[k _v]] (clojure.string/includes? k f-class-name)))
                                (map (fn [[_k v]] (.get v))))
          deps (set (mapcat fn-deps-class filtered-classes))]
      deps)))

This solution relies on another implementation detail of the Clojure compiler: dummy functions class name will be my_ns$dummy and the anonymous functions class will be prefixed with the containing the same name.

It all works great and is very fast. There is a new problem though: when you recompile a function, its lambdas stay in the cache. This means you might see references to functions that are not called anymore.

The solution to that is still in the works, but will probably just be a manual cache clear or a second cache.

Discuss this post here.

Published: 2022-01-05

Cljs to jxa for automation

Portal is a great dev helper for displaying REPL values.

Usually, I have more than one open, and it's a pain to find the one that belongs to the focused VSCode instance.

Instead of using the VSCode extension like a sane person, I decided to automate focusing the right portal window with a hotkey.

It turns out applescript can do what I want, but it's an interesting language with great documentation. It turns out that everyting that applescript can do, can be used from javascript as well, so after a few hours of frustration we get the following:

function run(_input, _parameters) {
    var se = Application('System Events');
    var win = se.processes.whose({ frontmost: { '=': true } })[0].windows[0];
    var vsCodeName = win.name().split("—")[1]?.trim();

    const width = 800;
    const bounds = {
        "x": win.properties().position[0] + win.properties().size[0] - width,
        "y": 0,
        "width": width,
        "height": win.properties().size[1]
    };

    var browser = Application("Google Chrome");
    browser.activate();

    if (browser.running()) {
        browser.windows().some((window, _index) => {
            if (window.title().includes(vsCodeName)) {
                window.index = 1;
                window.bounds = bounds;
                return true;
            }
        });
    }

    return bounds;
}

This script looks at the title of the focused window, splits it by the "-" and tries to find a chrome window whose title contains that. The found portal window will then be placed and resized over the initial window.

You want to start your portal with your folder as a title: (p/open {:portal.launcher/window-title (System/getProperty "user.dir")})

Usually I display it on a second montior just next to vscode, but that hard to screencaputure and you can easily play with the position/size numbers to fit your needs. You can also just remove the position/size assignment if you want to position the window manually.

Here it is in action:

–>

Once I got that running I something drove me to say: well, that's just JS, I'd rather write clojurescript.

So there is this old abandoned project cljs-jxa-starter that would do the trick, right? It turns out it does not run anyway and uses lein-cljsbuild while I use deps.edn + shadow-cljs. A few too many hours later, I present to you: cljs-jxa-starter-shadow.

Here's the above js as cljs:

(ns cljs-jxa-starter.focus-resize-window
  (:require [clojure.string :as str]))

(def desired-width 800)
(def se (js/Application. "System Events"))

(defn main []
  (let [browser (js/Application. "Google Chrome")
        ^js win (aget ^js (.-windows (aget (.whose (.-processes se) #js{:frontmost #js{:= true}}) 0)) 0)
        vsCodeName (.name win)
        bounds #js{:x (- (+ (aget (.. win properties -position) 0) (aget (.. win properties -size) 0)) desired-width) 
                   :y 0 
                   :width desired-width 
                   :height (aget (.. win properties -size) 1)}]
    (when (.-running browser)
      (when-let [^js found (some #(when (str/includes? (.name %) vsCodeName) %) (.windows browser))]
        (.activate browser)
        (set! (.-index found) 1)
        (set! (.-bounds found) bounds)
        true))))

Yes, in jxa half the property access are functions the other half isn't, which is great.

Getting shadow-cljs to output a usable js file is pretty easy once you find the trick. Just set :target as :browser and compile for release, otherwise jxa incompatible code will be generated.

You can use it as a template to create your own cljs-jxa-shadow project and hopefully we can use it as a springboard to more automation done the clojure way.

For anyone interested in the space, there's a great new project that makes it possible to do the same thing directly on the command line! It's called obb.

Discuss this post here.

Published: 2021-12-09

Lukas Domagala

Clojure compiler class cache and JVM soft references

A quick background on JVM references:

Strong references

Weak references

Soft references

Phantom references

Clojure class cache

Solutions

Blow up your memory

JVM Tools Interface

Conclusion

Clojure analysis and introspection

Static analysis

Runtime introspection

Clojure dynamic classloader

Cljs to jxa for automation