Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Overview

In computer vision, it is a common task to map pixels in an image to a coordinate system in the real world and vice versa. The geolocation module in the airside system repository identifies the relative positioning of ground targets from the given sensory data.

Note: Throughout this document bolded lowercase variables represent vector quantities and bolded uppercase variables represent matrices.

Measurements Needed

Intrinsics

...

An image is a grid of pixels. Pixels are the smallest component in a digital image. The resolution of an image is the number of pixels in an image. The top-left pixel in the image is the origin. The positive x-direction points in the right direction and the positive y-direction points downwards.

The camera coordinate system follows the North-East-Down (NED) coordinate system. In the NED system, the x-axis is the forward direction, the y-axis is the rightward direction, and the z-axis the downward direction. In the camera coordinate system, c corresponds to the x-direction, u corresponds to the y-direction, and v corresponds to the z-direction. The optical center corresponds to the origin in the camera coordinate system. See more information about the NED system here: https://en.wikipedia.org/wiki/Local_tangent_plane_coordinates.

Image to Camera Space

If you are not familiar with the Field of View of Cameras, I suggest reading this document to understand the necessary background for this section https://uwarg-docs.atlassian.net/wiki/spaces/CV/pages/2237530135/Cameras#Field-of-View.

...

To calculate the vectors c, u and v, we assume that the magnitude of the c vector is 1. Knowing that the magnitude of the c vector is 1 and the Field of View we can calculate the magnitude of the u and v vectors (the magnitude of u and v correspond to a and b in the diagram above respectively). This is done with basic trigonometry (we can take the tangent of half of the FOV angle since the magnitude of the c vector is 1).

If the ratio of a:b is not the same as the ratio of rx:ry then the pixels are not square, which doesn’t really matter but it is useful to note. Usually, the pixels will be square.

To map a pixel in the image space to its corresponding vector in the camera space, we can apply the scaling function f(p) = 2(p/R) - 1 where p is the pixel location (in x or y axis), and R is the resolution (in x or y axis). This function is chosen since it maps the domain of the image space which is [0, R] to the codomain [-1, 1]. This is achieved by scaling the value by 2 and translating it down by 1.

...

We can multiply the value of the scaling function with the vectors u and v to get the horizontal and vertical axis of the pixel vector (upixel = f(p) * u and vpixel = f(p) * v). Thus, the pixel vector (p) is equal to p = c + upixel + vpixel.

Extrinsic

Three-Dimensional Rotation Matrices

It is useful to think about a matrix as a transformation in space. A matrix can warp the coordinate system into a different shape. When we multiply a vector with a matrix, we can visually see this as transforming a vector from one coordinate space to another. To better understand matrices as transformations, you can look at the resources below.

A rotation matrix is a special type of matrix, as it transforms the coordinate space by revolving it around the origin. A visualization of a rotation matrix can be seen below.

...

Rotation matrices are useful since it allows us to model the orientation of an object in 3D space. Multiplying a vector with a rotation matrix allows us to rotate a vector and change its orientation. The rotation matrices for 3D space are shown below. For more information on rotation matrices, see here: https://en.wikipedia.org/wiki/Rotation_matrix.

...

It is important to note that matrices are not commutative. This means that the product of two matrices A and B are not equal if I switch the order of the matrices (AB != BA).

The orientation of a rigid-body in three-dimensional space can be described with three angles (Tait-Bryan Angles). The name of these three angles in aviation are yaw, pitch, and roll. Yaw describes the angle around the z-axis, pitch describes the angle around the y-axis, and roll describes the angle around the x-axis.

...

Intrinsic rotations are elemental rotations that occur about the axes of a coordinate system attached to a moving body. In contrast, extrinsic rotations are elemental rotations that occur about the axes of the fixed coordinate system. The rotations in geolocation are intrinsic. There are 6 different intrinsic rotations that can be performed (since there are 6 permutations to rotate an object around the x, y, and z axis). The order matters because as mentioned above, matrices are not commutative. The intrinsic rotation that is performed in the geolocation module is z-y-x or 3-2-1. If we have X as the rotation matrix for roll, Y as the rotation matrix for pitch,and Z for the rotation matrix for yaw, the overall transformation matrix T would be T = ZXY. The diagram below showcases the intrinsic rotation for z-y-x. For more information you can check out the link here: https://www.wikiwand.com/en/Euler_angles#Tait%E2%80%93Bryan_angles .

...

Camera to Drone Space

Once we are able to describe a pixel in the camera space, we need to convert it to a vector in the drone space. This vector will point to the object in the image from the perspective of the drone. To convert the vector from the camera space to the drone space, we need to know how the camera is positioned and oriented with respect to the drone.

When calculating the position of the camera in relation to the drone, the measurements must be reported from the center of the drone to the center of the camera. Any unit of measurement can be used as long as the units used are consistent. Once we have the measurements of the camera in the x, y, and z axis, we can generate a vector to model the position of the camera in the drone space.

When calculating the orientation of the camera in relation to the drone, the measurements must be reported in radians when the drone is on a flat surface. The camera’s yaw, pitch, and roll with respect to the drone are used to generate a rotation matrix that models how the camera is oriented with respect to the drone.

Let’s say R is the rotation matrix describing the camera’s orientation in relation to the drone, t is the vector representing the position of the camera with respect to the drone, and p is a vector in the camera space. If we want convert the vector p into a vector in the drone space (let’s call this p') we can perform the following equation: p' = Rp + t.

TODO: Redraw diagrams. Not sure what was the issue with the old ones.

TODO: Write a document describing how to calculate measurements needed for Geolocation.

World

Two-Dimensional Rotation and Translation

TODO: Not sure what to put in this section or why it is needed.

Drone to World Space

Once we know where the object is in the drone space, we need to convert it into a vector in the world space. To do this we can get the drone’s yaw, pitch, and roll in the world space

Projective Perspective Transform Matrix

World to Ground Space

Break ---------------------------------------------------------------

Overview

In computer vision, it is a common task to map pixels in an image to a coordinate system in the real world and vice versa. The geolocation module in the airside system repository identifies the relative positioning of ground targets from the given sensory data.

maps pixels in an image to geographical coordinates using the drone's position and orientation.

...

The table below outlines what each variable represents.

Vector

What it represents

o

The location of the camera in the world space (latitude and longitude of camera).

c

Orientation of the camera in the world space (yaw, pitch, and roll of camera).

u

Horizontal axis of the image in the camera space (right is positive).

v

Vertical axis of the image in the camera space (down is positive).

To compute the geographical coordinates (latitude and longitude) of a pixel in an image, we need to convert a pixel in the image (p1, p2) → vector in the camera space (p) → vector in the world space (a).

...

The Geolocation class in the airside system repository is responsible for converting a pixel in the image space to coordinates in the NED system in the world space. Geolocation works by creating a perspective transform matrix that maps pixels to coordinates in the real world. Below is a list of drop down menus explaining the helper outlines the functions used in the geolocation algorithm. `Geolocation`

Expand
titledrone position local from global

The geolocation algorithm assumes a planar coordinate system. We can make the assumption that the earth is a planar coordinate system at small distances (because the radius of the earth is so large). From the FlightControllerWorker we get the latitude and longitude of the drone in the WGS84 coordinate system. With the WGS84 coordinate system, the origin lies in the coast of Africa so assuming a planar coordinate system would be completely wrong. Instead, we can take the home location and make that the origin. This allows us to make the assumption that the world coordinate system in planar. We can convert latitude, longitude, and altitude to the NED coordinate system using the pymap3d library. To do this we need to use the drone's current location and the home location.

...