/
Building Pytorch from Source

Building Pytorch from Source

Overview

There is no Pytorch package for NVIDIA Jetson with Python 3.8 . These are the instructions to build the wheel from source for WARG use.

Pytorch 1.13.1 by WARG: https://uofwaterloo-my.sharepoint.com/:f:/r/personal/uwarg_uwaterloo_ca/Documents/Subteam Folders/Autonomy/Jetson Pytorch

Build resources:

Jetson

Install Python 3.8 :

sudo apt-get update sudo apt-get install python3.8 sudo apt-get install python3.8-dev # For psutils package build (required for ultralytics) sudo apt-get install python3.8-venv sudo apt-get autoremove

Clone Pytorch 1.13.1 and create virtual environment:

git clone --depth 1 --recursive --branch v1.13.1 https://github.com/pytorch/pytorch cd pytorch python3.8 -m venv venv/ source venv/bin/activate pip install --upgrade pip

Install required modules:

pip install -r requirements.txt pip install ninja scikit-build

Install compilers:

pip install cmake sudo apt-get install clang-8 sudo ln -s /usr/bin/clang-8 /usr/bin/clang sudo ln -s /usr/bin/clang++-8 /usr/bin/clang++

Download this patch: https://uofwaterloo-my.sharepoint.com/:u:/r/personal/uwarg_uwaterloo_ca/Documents/Subteam%20Folders/Autonomy/Jetson%20Pytorch/pytorch-1.13.1-jetpack-4.6.3.patch

  • Alternatively, copy the contents below into a .patch file

diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h index cbd3490..514fa1a 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h @@ -32,6 +32,9 @@ inline namespace CPU_CAPABILITY { // Most likely we will do aarch32 support with inline asm. #if defined(__aarch64__) +// See https://github.com/pytorch/pytorch/issues/47098 +#if defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3)) + #ifdef __BIG_ENDIAN__ #error "Big endian is not supported." #endif @@ -827,6 +830,7 @@ Vectorized<float> inline fmadd(const Vectorized<float>& a, const Vectorized<floa return Vectorized<float>(r0, r1); } +#endif /* defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3)) */ #endif /* defined(aarch64) */ }}} diff --git a/aten/src/ATen/cuda/CUDAContext.cpp b/aten/src/ATen/cuda/CUDAContext.cpp index 98fa9a5..54948c8 100644 --- a/aten/src/ATen/cuda/CUDAContext.cpp +++ b/aten/src/ATen/cuda/CUDAContext.cpp @@ -25,6 +25,8 @@ void initCUDAContextVectors() { void initDeviceProperty(DeviceIndex device_index) { cudaDeviceProp device_prop; AT_CUDA_CHECK(cudaGetDeviceProperties(&device_prop, device_index)); + // patch for "too many resources requested for launch" + device_prop.maxThreadsPerBlock = device_prop.maxThreadsPerBlock / 2; device_properties[device_index] = device_prop; } diff --git a/aten/src/ATen/cuda/detail/KernelUtils.h b/aten/src/ATen/cuda/detail/KernelUtils.h index b36e78c..c980df3 100644 --- a/aten/src/ATen/cuda/detail/KernelUtils.h +++ b/aten/src/ATen/cuda/detail/KernelUtils.h @@ -19,7 +19,9 @@ namespace at { namespace cuda { namespace detail { // Use 1024 threads per block, which requires cuda sm_2x or above -constexpr int CUDA_NUM_THREADS = 1024; +//constexpr int CUDA_NUM_THREADS = 1024; +// patch for "too many resources requested for launch" +constexpr int CUDA_NUM_THREADS = 512; // CUDA: number of blocks for threads. inline int GET_BLOCKS(const int64_t N, const int64_t max_threads_per_block=CUDA_NUM_THREADS) {

Apply the patch:

git apply pytorch-1.13.1-jetpack-4.6.3.patch

Set environment variables:

# https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048 export USE_NCCL=0 export USE_DISTRIBUTED=0 # skip setting this if you want to enable OpenMPI backend export USE_QNNPACK=0 export USE_PYTORCH_QNNPACK=0 export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2" # or "7.2;8.7" for JetPack 5 wheels for Xavier/Orin export PYTORCH_BUILD_VERSION=1.13.1 # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0 export PYTORCH_BUILD_NUMBER=1 # https://qengineering.eu/install-pytorch-on-jetson-nano.html export CC=clang-8 export CXX=clang++-8 export CUDACXX=/usr/local/cuda/bin/nvcc export BUILD_TEST=0

The build uses a lot of memory, more than the available RAM. Add a swapfile to use as memory:

  1. Plug in a USB drive

  2. Run the following commands (this will destroy anything on the USB drive!):

    1. sudo umount /dev/sda1 # If needed sudo mkswap /dev/sda -f sudo swapon /dev/sda # DO NOT UNPLUG USB cat /proc/swaps
  3. The output of the swapfile list should show the USB drive as a swapfile

Compile:

python setup.py clean # Optional start fresh rather than resuming python setup.py bdist_wheel > jetson_build_report.txt # > report.txt is optional

The compilation takes several hours. If the compilation is interrupted, set the environment variables again and run only the wheel command. Once the compilation is complete, the wheel is saved in pytorch/dist/ .

Remove the USB swapfile:

sudo swapoff /dev/sda # Now safe to unplug USB

Done!

 

 

 

Related content