Introduction

In computer vision, it is a common task to map pixels in an image to a coordinate system in the real world and vice versa. The geolocation module in the airside system repository maps pixels in an image to geographical coordinates using the drone's position and orientation.

Theory

The diagram above displays the vectors used in the geolocation algorithm. The world space is the coordinate system used to describe the position of an object in the world (e.g. latitude and longitude). The world space is shown in the diagram with the black coordinate system. The camera space is the coordinate system used to describe the position of an object in relation to the camera. The camera space is showing in the diagram with vectors c, u, v. Note that bolded variables are vector quantities.

The table below outlines what each variable represents.

Vector	What it represents
o	The location of the camera in the world space (latitude and longitude of camera).
c	Orientation of the camera in the world space (yaw, pitch, and roll of camera).
u	Horizontal axis of the image in the camera space (right is positive).
v	Vertical axis of the image in the camera space (down is positive).

To compute the geographical coordinates (latitude and longitude) of a pixel in an image, we need to convert a pixel in the image (p₁, p₂) → vector in the camera space (p) → vector in the world space (a).

We can compute the scaling factor (let’s call the scaling factor t) with the equation below.

Rotation Matrix

It is useful to think about a matrix as a transformation in space. A matrix can warp the coordinate system into a different shape. When we multiply a vector with a matrix, we can visually see this as transforming a vector from one coordinate space to another. To better understand matrices as transformations, you can look at the resources below.

A rotation matrix is a special type of matrix, as it transforms the coordinate space by revolving it around the origin. A visualization of a rotation matrix can be seen below.

Rotation matrices are useful since it allows us to model the orientation of an object in 3D space. Multiplying a vector with a rotation matrix allows us to rotate a vector and change its orientation. The rotation matrices for 3D space are shown below. For more information on rotation matrices, see here: https://en.wikipedia.org/wiki/Rotation_matrix

Intrinsic Camera Properties

Pixels and the Camera Space

You can think of an image as a grid of pixels. Pixels are the smallest component in a digital image. The resolution of an image is the number of pixels in an image. The top-left pixel in the image is the origin (the point (0, 0)). The positive x-direction points in the right direction and the positive y-direction points downwards.

The camera coordinate system follows the North-East-Down (NED) coordinate system. In the NED system, the x-axis is the forward direction, the y-axis is the rightward direction, and the z-axis the downward direction. In the camera coordinate system, c corresponds to the x-direction, u corresponds to the y-direction, and v corresponds to the z-direction. See more information about the NED system here: https://en.wikipedia.org/wiki/Local_tangent_plane_coordinates.

Calculating Camera Space Vectors

Pixel to Vector in Camera Space

To map a pixel to its corresponding vector in the camera space we use the scaling function shown below.

From the following calculations we can see that the codomain of the scaling function is [-1, 1] (the maximum value of the function is 1 and the minimum value of the function is -1). We can multiply the value of the scaling function with the vectors u and v to get the horizontal and vertical axis of the pixel vector (u_pixel = f(p, r) * u and v_pixel = f(p, r) * v). Thus the pixel vector (p) is equal to c + u_pixel + v_pixel.

Extrinsic Camera Properties

Before we can convert vectors in the camera space to vectors in the world space, we need to know how the camera is positioned in relation to the drone. We need to know the position of the camera in relation to the drone and the orientation of the camera in relation to the drone. The position and orientation of the camera is reported in the NED coordinate system.

When calculating the position of the camera in relation to the drone, the measurements must be reported in meters from the center of the drone. When calculating the orientation of the camera in relation to the drone, the measurements must be reported in radians when the drone is on a flat surface.

Geolocation

The Geolocation class in the airside system repository is responsible for converting a pixel in the image space to coordinates in the NED system in the world space. Geolocation works by creating a perspective transform matrix that maps pixels to coordinates in the real world. Below is a list of drop down menus explaining the helper functions used in the geolocation algorithm.

drone position local from global

The geolocation algorithm assumes a planar coordinate system. We can make the assumption that the earth is a planar coordinate system at small distances (because the radius of the earth is so large). From the FlightControllerWorker we get the latitude and longitude of the drone in the WGS84 coordinate system. With the WGS84 coordinate system, the origin lies in the coast of Africa so assuming a planar coordinate system would be completely wrong. Instead, we can take the home location and make that the origin. This allows us to make the assumption that the world coordinate system in planar. We can convert latitude, longitude, and altitude to the NED coordinate system using the pymap3d library. To do this we need to use the drone's current location and the home location.

ground intersection from vector

Given the camera’s location in the world space and a vector pointing down in the world space, this function can calculate where the vector pointing down will intersect the ground. The calculation for doing so is shown below.

In this diagram, o is camera’s location in the world space and a is the vector pointing down. We can calculate the scaling factor (t in the diagram) and use it to calculate the x and y coordinate in the world space.

get perspective transform matrix

A perspective transform matrix is a 3x3 that maps a 2D point in one plane to another 2D point in a different plane. We use the perspective transform matrix to map a point in the image plane to the a point in the world space. We use the OpenCV library to calculate the transform matrix. To get the perspective transform matrix, 4 points on the image are needed. Among these 4 points, 3 of the points should not be collinear (3 of the points should not be on the same line). Below is a diagram of the 4 points used to calculate the perspective transform matrix.

Let’s call these 4 points the image sources. The 4 image sources can be converted into vectors in the camera space using the scaling function. The sources will then be multiplied by the rotation matrix of the camera in relation to the drone. This will result in the source vectors being oriented in the world space in relation to the drone. Finally, the source vectors are multiplied with the drone orientation matrix to get the source vectors in the world space. We can then send the source vectors to the __ground_intersection_from_vector function to get the corresponding the ground points in the world space.

We now have four pairs of tuples in the image space and the world space. We can pass that to the getPerspectiveTransform function that OpenCV provides, and we now have our matrix!

convert detection to world from image

Given the perspective transform matrix and a Detection object, the detected entity in an image can be mapped to coordinates in the world. The detections are a tuple that represent the pixel coordinates in the image space (p₁, p₂). A value of 1 is appended to the pixel coordinates to calculate the value of the homogenous component (p₁, p₂, 1). See here for more info: https://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection.

The 3D pixel vectors are then multiplied by the perspective transform matrix and the resulting vector is divided by the homogenous component to get the corresponding coordinates on the ground.

Now that we have described how the functions in the Geolocation class works, we can go through the run function and see how the entire algorithm works.

Get the MergedOdometeryDetection object from the queue and create the rotation matrix that models the drone’s orientation with the drone’s yaw, roll, and pitch
Get the home location from the home_location_queue and the drone’s location from the MergedOdometryDetection object and pass it into the drone_position_local_from_global function to get the drone’s location in the NED coordinate system.
Pass in the drone’s rotation matrix and the drone’s position in the NED system to the __get_perspective_transform_matrix function to get the perspective transform matrix.
Pass in the perspective transform matrix and Detection object into the __convert_detection_to_world_from_image to convert the detected object into coordinates in the world frame.
Create a DetectionInWorld object and output it in the output queue.

Geolocation (WIP)