Note: Throughout this document bolded lowercase variables represent vector quantities and bolded uppercaseUPPERCASE variables represent matrices.

Intrinsics

...

An image is a grid of pixels. Pixels are the smallest component in a digital image. The resolution of an image is the number of pixels in an image. The top-left pixel in the image is the origin. The positive x-direction points in the right direction and the positive y-direction points downwards.

...

To calculate the vectors c, u and v, we assume that the magnitude of the c vector is 1 (The magnitude of c can be arbitrary as u and v will scale, we assume the magnitude c is 1 for convenience). Knowing that the magnitude of the c vector is 1 and the Field of View we can calculate the magnitude of the u and v vectors (the magnitude of u and v correspond to a and b in the diagram above respectively). This is done with basic trigonometry (we can take the tangent of half of the FOV angle since the magnitude of the c vector is 1).

If the ratio of a:b is not the same as the ratio of r_x:r_y then the pixels are not square, which doesn’t really matter but it is useful to note. Usually , the pixels will be square.

...

We can multiply the value of the scaling function with the vectors u and v to get the horizontal and vertical axis of the pixel vector (u_pixel = f(p) * u and v_pixel = f(p) * v). Thus, the pixel vector (, p) , is equal to p = c + u_pixel + v_pixel.

...

It is useful to think about a matrix matrices as a transformation in space. A matrix can warp the coordinate system into a different shapeof space. When we multiply a vector with a matrix, we can visually see this as transforming a vector from one coordinate space to another. To better understand matrices as transformations, you can look at the resources below.

...

When calculating the position of the camera in relation to the drone, the measurements must be reported from the center of the drone to the center of the camera. Any unit of measurement can be used as long as the units used are consistent. Once we have the measurements of the camera in the x, y, and z axis, we can generate a vector to model the position of the camera in the drone space.

...

TODO: Redraw diagrams. Not sure what was the issue with the old ones.

World

Two-Dimensional Rotation and Translation

TODO: Write a document describing how to calculate measurements needed for Geolocation.

World

Two-Dimensional Rotation and Translation

TODO: Not Not sure what to put in this section or why it is needed.

...

Once we know where the object is in the drone space, we need to convert it into a vector in the world space. To do this we can get the drone’s yaw, pitch, and roll in the world space and create a rotation matrix. This provides us with a vector that points from the drone to the detected object in the world coordinate system.

Projective Perspective Transform Matrix

...

The diagram above displays the vectors used in the geolocation algorithm. The world space is the goal of the geolocation module is to map where a particular pixel in an image maps to the ground. The following sections dives into the theory on how geolocation works.
...
The diagram above displays the vectors used in the geolocation algorithm. The world space is the coordinate system used to describe the position of an object in the world (e.g. latitude and longitude). The world space is shown in the diagram with the black coordinate system. The camera space is the coordinate system used to describe the position of an object in relation to the camera. The camera space is showing in the diagram with vectors c, u, v. Note that bolded variables are vector quantities.
The table below outlines what each variable represents.
Vector
What it represents
o
The location of the camera in the world space (latitude and longitude of camera).
c
Orientation of the camera in the world space (yaw, pitch, and roll of camera).
u
Horizontal axis of the image in the camera space (right is positive).
v
Vertical axis of the image in the camera space (down is positive).
a
Camera location to individual pixel
The geolocation module works under the following assumptions.Geolocation Assumptions:
Geolocation assumes the ground is flat
...
. To be clear I am not saying the Earth is flat, but we can assume the ground is flat
...
at small distances because the radius of the earth is so large. The image below displays this assumption. This assumption allows us to assume a planar coordinate system for our calculations. We make this assumption because incorporating GIS data is difficult.
...

World to Ground Space

Given a vector pointing to the object in the world space, we can compute where the object is on the ground by finding the intersection between the vector with the ground. Since we are assuming a planar coordinate system, the ground is the plane z = 0. We can calculate the scalar multiple, t, that extends the vector to the ground. Knowing the t value, we can then compute the x and y values and determine the ground location of the object.

...

Using this calculation, we can now get ground locations of target. However, these calculations become costly if we need a large number of pixels translated into ground locations.

To resolve this issue, we can compute a perspective transform matrix. A perspective transform matrix is a 3x3 matrix that maps a 2D point in one plane to another 2D point in a different plane. We use the perspective transform matrix to map a point in the image plane to a point in the world space. We use the OpenCV library to calculate the transform matrix. To get the matrix, the library needs 4 points in the image plane and the corresponding 4 points on the ground plane. Among these 4 points, 3 of the points should not be collinear (3 of the points should not be on the same line). Below is a diagram of the 4 points used to calculate the perspective transform matrix in the geolocation module.

...

Using the algorithm from above, we can find the corresponding ground locations for the 4 pixels above. We can then send the image points and ground points to the getPerspectiveTransform function in OpenCV and compute the ground locations for any pixels of interest in the image.

Break ---------------------------------------------------------------

...

Version	Old Version 3	New Version 4
Changes made by	Mihir Gupta	Mihir Gupta
Saved on	2024-01-13	2024-01-21

Versions Compared

Key

Intrinsics

World

Two-Dimensional Rotation and Translation

World

Two-Dimensional Rotation and Translation

Projective Perspective Transform Matrix

World to Ground Space

Break ---------------------------------------------------------------

Vector	What it represents
o	The location of the camera in the world space (latitude and longitude of camera).
c	Orientation of the camera in the world space (yaw, pitch, and roll of camera).
u	Horizontal axis of the image in the camera space (right is positive).
v	Vertical axis of the image in the camera space (down is positive).
a	Camera location to individual pixel

Content Comparison

Versions Compared

Key

Intrinsics

World

Two-Dimensional Rotation and Translation

World

Two-Dimensional Rotation and Translation

Projective Perspective Transform Matrix

World to Ground Space

Break ---------------------------------------------------------------