Building Pytorch from Source
Overview
There is no Pytorch package for NVIDIA Jetson with Python 3.8 . These are the instructions to build the wheel from source for WARG use.
Pytorch 1.13.1 by WARG: https://uofwaterloo-my.sharepoint.com/:f:/r/personal/uwarg_uwaterloo_ca/Documents/Subteam Folders/Autonomy/Jetson Pytorch
Build resources:
Patch based on:
pytorch-1.10-jetpack-4.5.1.patch
Jetson
Install Python 3.8 :
sudo apt-get update
sudo apt-get install python3.8
sudo apt-get install python3.8-dev # For psutils package build (required for ultralytics)
sudo apt-get install python3.8-venv
sudo apt-get autoremove
Clone Pytorch 1.13.1 and create virtual environment:
git clone --depth 1 --recursive --branch v1.13.1 https://github.com/pytorch/pytorch
cd pytorch
python3.8 -m venv venv/
source venv/bin/activate
pip install --upgrade pip
Install required modules:
pip install -r requirements.txt
pip install ninja scikit-build
Install compilers:
pip install cmake
sudo apt-get install clang-8
sudo ln -s /usr/bin/clang-8 /usr/bin/clang
sudo ln -s /usr/bin/clang++-8 /usr/bin/clang++
Download this patch: https://uofwaterloo-my.sharepoint.com/:u:/r/personal/uwarg_uwaterloo_ca/Documents/Subteam%20Folders/Autonomy/Jetson%20Pytorch/pytorch-1.13.1-jetpack-4.6.3.patch
Alternatively, copy the contents below into a
.patch
file
diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h
index cbd3490..514fa1a 100644
--- a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h
+++ b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h
@@ -32,6 +32,9 @@ inline namespace CPU_CAPABILITY {
// Most likely we will do aarch32 support with inline asm.
#if defined(__aarch64__)
+// See https://github.com/pytorch/pytorch/issues/47098
+#if defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3))
+
#ifdef __BIG_ENDIAN__
#error "Big endian is not supported."
#endif
@@ -827,6 +830,7 @@ Vectorized<float> inline fmadd(const Vectorized<float>& a, const Vectorized<floa
return Vectorized<float>(r0, r1);
}
+#endif /* defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3)) */
#endif /* defined(aarch64) */
}}}
diff --git a/aten/src/ATen/cuda/CUDAContext.cpp b/aten/src/ATen/cuda/CUDAContext.cpp
index 98fa9a5..54948c8 100644
--- a/aten/src/ATen/cuda/CUDAContext.cpp
+++ b/aten/src/ATen/cuda/CUDAContext.cpp
@@ -25,6 +25,8 @@ void initCUDAContextVectors() {
void initDeviceProperty(DeviceIndex device_index) {
cudaDeviceProp device_prop;
AT_CUDA_CHECK(cudaGetDeviceProperties(&device_prop, device_index));
+ // patch for "too many resources requested for launch"
+ device_prop.maxThreadsPerBlock = device_prop.maxThreadsPerBlock / 2;
device_properties[device_index] = device_prop;
}
diff --git a/aten/src/ATen/cuda/detail/KernelUtils.h b/aten/src/ATen/cuda/detail/KernelUtils.h
index b36e78c..c980df3 100644
--- a/aten/src/ATen/cuda/detail/KernelUtils.h
+++ b/aten/src/ATen/cuda/detail/KernelUtils.h
@@ -19,7 +19,9 @@ namespace at { namespace cuda { namespace detail {
// Use 1024 threads per block, which requires cuda sm_2x or above
-constexpr int CUDA_NUM_THREADS = 1024;
+//constexpr int CUDA_NUM_THREADS = 1024;
+// patch for "too many resources requested for launch"
+constexpr int CUDA_NUM_THREADS = 512;
// CUDA: number of blocks for threads.
inline int GET_BLOCKS(const int64_t N, const int64_t max_threads_per_block=CUDA_NUM_THREADS) {
Apply the patch:
git apply pytorch-1.13.1-jetpack-4.6.3.patch
Set environment variables:
# https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
export USE_NCCL=0
export USE_DISTRIBUTED=0 # skip setting this if you want to enable OpenMPI backend
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2" # or "7.2;8.7" for JetPack 5 wheels for Xavier/Orin
export PYTORCH_BUILD_VERSION=1.13.1 # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0
export PYTORCH_BUILD_NUMBER=1
# https://qengineering.eu/install-pytorch-on-jetson-nano.html
export CC=clang-8
export CXX=clang++-8
export CUDACXX=/usr/local/cuda/bin/nvcc
export BUILD_TEST=0
The build uses a lot of memory, more than the available RAM. Add a swapfile to use as memory:
Plug in a USB drive
Run the following commands (this will destroy anything on the USB drive!):
sudo umount /dev/sda1 # If needed sudo mkswap /dev/sda -f sudo swapon /dev/sda # DO NOT UNPLUG USB cat /proc/swaps
The output of the swapfile list should show the USB drive as a swapfile
Compile:
python setup.py clean # Optional start fresh rather than resuming
python setup.py bdist_wheel > jetson_build_report.txt # > report.txt is optional
The compilation takes several hours. If the compilation is interrupted, set the environment variables again and run only the wheel command. Once the compilation is complete, the wheel is saved in pytorch/dist/
.
Remove the USB swapfile:
sudo swapoff /dev/sda # Now safe to unplug USB
Done!