Frame Transformations in Computer Integrated Surgery
With intuitive notation, vocabulary, and diagrams.
This tutorial is an abbreviated version of this PDF. The notation aligns with the Computer Integrated Surgery course at Johns Hopkins University. Minimal linear algebra is assumed.
Frame transformations are often the cause of confusing, frustratingly simple errors in computer integrated surgery. For beginners, developing the intuition to answer the question, “is this the A to B transform or the B to A transform?” can take some time, especially when the meaning of “A to B” may vary based on convention. In this tutorial, we will establish a consistent notation and vocabulary for talking about points, frames, and the transformations among them, which reflect the graph-like structure of any tracking setup. We also show how to translate from a problem statement, which may use informal language to describe these transformations, to a rigorous description of the problem, suitable for writing a computer program.
First, let’s define precisely what we mean by “frame,” “point,” and “transform.” For this tutorial, we will constrain ourselves to points and frames in 3D.
A frame is a basis for numerical measurements of object locations, orientations, or poses. It can be thought of as a virtual object floating in physical space.
A point u, is a singular location in space. It can be measured relative to a frame A with a 3-vector.
An orientation is measured by a rotation R, which we will formulate as a 3x3 matrix.
Together, a location and an orientation fully describe the degrees of freedom for a rigid body in 3D Euclidean space. The combination is often referred to as a “pose,” and it is the same information needed to describe a second frame B. Thus, a “pose” can be thought of as a measurement of a frame, or a frame transformations.
The “A from B” frame transformation is a measurement of frame B’s pose with respect to frame A. It consists of a rotation and a translation
such that for a given point u,
as in the following diagram:
It is important to ensure consistency between notation like Equation 3, frame transform diagrams like the figure above, and the vocabulary we use to talk about them. We call
the “A from B” transformation, because as an operator, it takes in measurements of points in frame B and returns measurements in frame A. So why not call this the “B to A” transform? Or, as in quite a lot of source code, the B2A transform? The reason is that “A from B” reads from left to right in the same order as the subscripts, avoiding so much confusion. Saying “A from B” makes clear how the transformation operates: it takes a measurement from frame B to a measurement in frame A, while maintaining consistency in our left-to-right ordering. Using this convention when naming variables, e.g. A_from_B, will make code easier to read, since the adjacency rule carries over, and more likely to be correct on the first try.
Frame Transformation Diagrams
Frame transformation diagrams are an informal, helpful tool for understanding the relationships in a given tracking setup. The figure above, for example, is a simple diagram, where the point u is measured in frame B, and B is measured in frame A. The diagram uses arrows to signify these measurements, following the same left-to-right convention as the notation thus far. The arrow for the “A from B” frame transform starts at frame A and ends at frame B. This is often confusing, because the arrow in the diagram starts at A and ends at B, but
maps measurements to frame A from frame B. For this reason, when talking about arrows, we will say the arrow “starts” and “ends,” reserving “to” and “from” for the frame transformation as a mathematical operator. With this vocabulary, we eliminate a single point of confusion while maintaining the many advantages of drawing frame transform diagrams with arrow directions matching the left-to-right ordering in the F notation and “A from B.”
One of these advantages has to do with constructing the frame transformation. If o is the point at the origin of B, and
is the measurement in frame A, the translational component of the“A from B” transform is simply
Thus the arrow representing this vector starts at the same frame and ends in the same location as the arrow representing the “A from B” transform.
A similar rule governs the rotational component, with R being the rotation between axes as measured in the starting frame.
Further advantages of drawing frame transformation diagrams in this manner become clear as tracking setups become more complicated. With a little formalization, these diagrams can help us understand measurements across a chain of frame transformations. In the following section, we will use the powerful tools of graph theory to outline the rules for dealing with frame transformations. We will leave the proofs for these statements to the PDF version of this tutorial, stating a simplified version of the rules here.
We will formalize frame transformation diagrams as “reference graphs” so that we can define the rules by which these objects operate, and how we can use them to recover the desired information. Before giving the full definition, we will show an example of a translation between a diagram, with a single point and two frames, and its corresponding graph. You will see the diagram is already very close to the graph.
A reference graph is a connected, directed graph G = (V, E), representing measurements of points and frames in physical space, that satisﬁes the following:
- The nodes V are partitioned into frames and points. That is, V = F ⊔ U, where ⊔ is the disjoint union.
- Every point node u ∈ U has no outgoing edges.
- There is a measurement of a point u in a frame A if and only if there is an edge (A, u) ∈ E connecting the corresponding nodes.
- There is a measurement of the “A from B” frame transformation if and only if the edge (A, B) is in E.
- There is a one-to-one measurement map that associates edges with frame transforms and point measurements.
Don’t confuse the frames and points themselves, which are nodes in the graph, with the measurements of those frames and points, which correspond to edges. Remember, a “frame” is just an abstract object floating in physical space, with 6 degrees of freedom, and points are similarly abstract locations in physical space with 3 degrees of freedom. We can create sets of these objects, but we cannot (in good faith) associate them numbers until we have measured them in a separate frame of reference. Hence, the frame or point in physical space is the node, and its measurement is really just a relationship (edge) with another node.
The reference graph has some useful properties, which help us determine whether a given object can be measured in a given frame, and how to obtain that measurement from the given quantities (assuming no error). In the following section, we will provide two simple theorems that define how one should deal with (1) inverse transformations and (2) kinematic chains, which are really just paths on the reference graph.
If A,B ∈ F, (A,B) ∈ E, and
then the reverse edge (B,A) is also in E, and
This rule provides a useful method for obtaining the desired transform, when only its inverse is known. Let’s make this a little more concrete with a simple example, which can be done by hand. Remember, rotation matrices are orthonormal, so their inverse is their transpose.
Suppose a tracked patient frame B is measured relative to the tracker frame A to be at
If the tip of a surgical instrument is measured at [4, 5, 6] T in the tracker frame, what is the same point’s measurement in the patient frame B?
Looking at the reference graph below, we know the edge (B,A) is also in the graph, and we know its corresponding frame transform.
To evaluate this expression,
In Example 1, we used the concept of reference graphs in a very minor way, namely to have a vocabulary for how the inverse frame transformation operates on points. However, you might notice that the end result corresponds to a series of connected edges on the reference graph. That is, to find the measurement of u in B, we have found a directed path starting at the B and ending at u, then multiplied the corresponding mathematical objects in the same order. This is no coincidence, and it leads us to the primary advantage of using reference graphs to think about coordinate frame transformations.
Given a reference graph G = (F ⊔ U, E), the measurement of a point u ∈ U a frame A ∈ F can be determined if G contains a directed path
starting at A and ending at u. Moreover,
Likewise, the same holds for paths ending at frames. We can compose frame transformations in the following manner.
The composition of two frame transformations is
To make this more concrete, we will consider a more complicated example in the context of trauma surgery. We encourage the reader to attempt the example on their own, understanding that the hardest part may be translating the problem statement, which requires understanding some medical jargon, to frames and transforms. This is an essential skill, and the best way to acquire it is practicing problems much like this one. A good initial approach is the following:
- Identify the points and frames in the problem as nodes on a reference graph.
- Identify the known measurements as edges on the graph.
- Remember that inverse edges of known transformations are also on the graph.
- Identify the desired measurement as the path starting at the frame they are in ending at the object being measured.
- Compose the desired transformations and evaluate.
Internal pelvic trauma fixation involves the insertion of a rigid metal rod, called a K-wire, into the pelvic bone along a specific trajectory. It is essential to identify the correct trajectory so as to avoid “cortical breach,” where the K-wire punches through the outer cortical bone and enters soft tissue, leading to further complications. For this example, assume the fracture is along the superior pubic ramus and is already shortened (aligned).
The operating room is equipped with a calibrated X-ray imaging device, called a C-arm, with infrared reflective markers on its gantry. The calibration provides the pose of the C-arm’s camera frame — a frame with its origin at the center of the X-ray tube. There is a Polaris optical tracker positioned such that it measures the pose of both the C-arm and a reference marker fixed to the patient table.
The surgeon is wearing a HoloLens 2, an “augmented” or “mixed reality” headset, which is capable of displaying holograms to the surgeon. The HoloLens tracks the surgeon’s head movements using onboard sensors, relative to an estimated “world” frame. The HoloLens is also capable of tracking infrared reflective markers, relative to the headset frame, but the markers must be in the ﬁeld of view. Since the surgeon is generally looking down, only the marker fixed to the patient table can be reliably tracked by the HoloLens.
There is a machine intelligent system which is capable of automatically determining the safe corridor for K-wire insertion based on X-ray images. The input to this system is the X-ray image, and the output is a start- and end-point for the safe trajectory in the camera frame.
Finally, suppose we want to support the safe insertion of the K-wire by displaying a hologram for the surgeon to align the K-wire with. The hologram is a virtual object in the HoloLens “world,” with its own coordinate frame such that the +Z-axis should aligns with the safe trajectory (going into the body), and the origin should be placed at the trajectory start point.
Virtual objects are controlled by setting their rotation and translation relative to the HoloLens “world.” What should these be set to, in terms of the measurements provided, so as to guide the surgical alignment?
First, let’s talk about the hardware in this setup. In the figure below, the X-ray imaging device has infrared markers attached to the detector side of the gantry. The locations of these markers have already been encoded relative to some frame, call it M. When the Polaris tracker measures the pose of the C-arm, it measuring the “T from M” frame transform, where T is the tracker frame. It doesn’t matter precisely how M is deﬁned, because the calibration of the C-arm provides the “C from M” transform, where C is the “camera” frame centered at the X-ray source. Let u and v be the start- and endpoints of a suitable trajectory, which are measured by the machine learning system in the camera frame.
The tracker also measures the pose of a reference marker frame, call it R, which is simultaneously tracked by the HoloLens. The reason to include this marker is to provide a kinematic chain between the HoloLens “world” frame and the camera-centered coordinates measured by the C-arm. Denote the frame of the headset as H and the HoloLens world frame as W. Finally, denote the frame containing the hologram as A (for arrow), and assume that the hologram is aligned with the Z-axis.
The next step is to translate the above diagram into a more formal frame transformation diagram, with the frames discussed.
Often, this figure is sufficient to see the transformations that need to be composed, especially once the translation to reference graphs is clear. Indeed, the path on the full graph which relates the anatomical points to the HoloLens world frame is easily identified in red:
Now that we understand all the frames in the problem and the measurements between them, the hard part is over. All that remains is to define the “W from A” transform such that the origin of A is at u, and the −Z axis points in the same direction as the u → v vector.
The first step is straightforward enough. We have to find the measurement of u in W, which is immediately the desired translation component:
Similarly, note that
Next, all that remains is to ensure the hologram’s rotation aligns with u → v. This can be done by answering the question, “what is the rotation that aligns [0, 0, 1] with u-v?” This question can be answered using Rodrigues’s formula and the vector-angle formulation of rotations, which we will state but not explore in much detail. Brieﬂy, Rodrigues’s formula Rot(w, θ) takes the vector w about which the rotation occurs and the angle θ, and returns the corresponding rotation matrix. Here, w must be perpendicular to both vectors, which we can obtain with the cross product:
Note that Rodriguez’s formula does not take into account the magnitude of this vector, only its direction. The angle θ is given by
where sk(w) is the skew matrix of w.
In this tutorial, you learned how to take a description of a tracking system, such as those used in surgical navigation, and extract desired information. As you may have figured out, this is a pretty simple process once you have a consistent notation and vocabulary for discussing transformations, although the algebra can become tedious if done by hand. Fortunately software packages like tf2 in ROS or pytransform3d in Python/Numpy allow users to manage the frame transform graph and request transforms between frames automatically.
The next step, which we will discuss in future tutorials, is understanding how the potential error in each measurement, from the point itself to every transformation in between, affects the end result.