Greetings,
On a related matter, I’ve explored NVIDIA Warp recently.
Which gives:
- Python interface for cuda (on-call bundles cuda kernels under-the-hood)
- auto-differentiation
- Has higher-level use-case libraries (such as for FEA)
Question of Curiosity
Considering Tesseract is written in python and its purpose, has the tesseract team faced considerations of where and how Tesseract and Warp sit relative to each other?
Either as dependencies, siblings, or something else.
Particularly
Is the auto-differentiation features and behavior similar or different, broadly speaking, from the differentiable programming used by tesseract?
Please note, I’m a noob to tesseract.
1 Like
Update, partial answer:
Primary Relationship
Wrap Warp kernels in Tesseract-Core for distribution in orchestration workflows.
Options?
- Feed-forward Warp’s auto-diff through Tesseract’s autodiff interface (maintain consistency).
I believe this is true, however uncertain of what extra steps needed for setting auto-diff priority order.
Thank you for the interesting question!
Tesseracts can be used in conjunction to Warp, and the approach you wrote in your second message is spot-on. Basically, one could write a differentiable component with Warp, and wrap it with a Tesseract. If using Warp + JAX or Warp + Pytorch, the jax
and pytorch
recipes allow you to just let the ad framework internally compute derivatives, and tesseract-core exposes them. Practically, this would only involve you writing the primal evaluation (i.e., fill out apply
) and just use the auto-generated ad endpoints.
An alternative would be to mix-and-match Warp components with Tesseract ones via Tesseract-JAX, which makes Tesseracts “look like” just python function calls. This could be useful, for instance, when your pipeline is split between some local operations you are doing via cuda kernels written through Warp and remotely-hosted Tesseracts.
For Warp + Pytorch, we have been thinking of something similar to Tesseract-Jax, but which registers Tesseracts as torch primitives 
1 Like
Also, I am not 100% sure what you mean with ad priority order, but basically: inside a Tesseract you can use any ad framework you’d like (of course, with different degrees of difficulty into integrating it into the framework: pytorch and jax for now are the only ones we currently have pre-built recipes for), and then externally Tesseract-JAX (or similar tools) allow you to compute derivatives as you compose Tesseracts.
1 Like
Thanks @Alessandro ,
Regarding “ad priority order”, what I mean is, if the auto-diff differs qualitatively or quantitatively between how Tesseract or the underlying kernel calculates them, how to choose which to use, or set up the exposed auto-diff to prioritize one and use the other as a fallback.
Not sure if this question makes sense, I guess it depends on how robust auto-diff is and if there are instability limitations.
Got it; there should not be any ambiguity there: inside a Tesseract you specify how that component is to be differentiated when you implement the jacobian
, jacobian_vector_product
, and vector_jacobian_product
endpoints.
You could do that without any AD at all, for example by implementing some finite differencing on your own, you could have some custom code for that which uses your AD framework of choice, or (for pytorch and jax) you can just use our “recipes” that can create the endpoints mentioned above automatically from your implementation of apply
(see for instance this JAX example, where the user only needed to implement apply
and the implementation of vjp/jvp and so on was pre-filled via a template by tesseract-core).
If you don’t implement any differentiable endpoint, tesseract-core
will not do anything for you as a fallback, and that Tesseract will simply be non differentiable.
1 Like