Translation based on https://github.com/KwanWaiPang/Photo-SLAM_comment/
Photo-SLAM
├── cfg # This folder includes the config files for different settings
├── cuda_rasterizer # This folder includes the rasterization module from the original 3DGS
├── examples # This folder is the demo entry point of SLAM system
It is used for reading the data, creating pointers and mapping
├── include # Head files
├── ORB-SLAM3
├── scripts # Scripts to test running and evaluation
├── src # 3D Gaussian related scripts
├── third_party # Relied packages: colmap, simple-knn(from 3D Gaussian) and tinyply
└── viewer # Thread of visualizer
It includes functions main, LoadImages(to read images), saveTrackingTime (to save trajectory) and saveGpuPeakMemoryUsage (to save the peak of VRAM usage).
What does function main do? It checks the input parameters, input directories and load the images. The most important is to establish the SLAM thread pSLAM , the 3D gaussian mapping thread pGausMapper and Gaussian viewer thread pViewer . pSLAM works as the input parameter of pGausMapper , connecting ORB-SLAM and 3D Gaussian mapping.
Specifically, the code did the following stuff:
Check the parameter and set the output directory.
Check the input image directory and call LoadImages to read the images.
Assure the amounts of depth images and RDB images are the same. Check to use CPU or GPU.
After everything is ready, we can start to create the SLAM system. The following operations are:
1. to create SLAM system.
2. to create 3D Gaussian mapping system and
3. to create 3D Gaussian viewer.
// Create SLAM system. It initializes all system threads and gets ready to process frames.
// Create the pointer pSLAM to point to the SLAM system
std::shared_ptr<ORB_SLAM3::System> pSLAM =
std::make_shared<ORB_SLAM3::System>(
argv[1], argv[2], ORB_SLAM3::System::RGBD);
float imageScale = pSLAM->GetImageScale();
// Create GaussianMapper
// Create the pointer pGuasMapper to point to the 3D Gaussian Mapper,
// the input parameter is the SLAM system, configs for 3D Gaussian
// and output directory
std::filesystem::path gaussian_cfg_path(argv[3]);
std::shared_ptr<GaussianMapper> pGausMapper =
std::make_shared<GaussianMapper>(
pSLAM, gaussian_cfg_path, output_dir, 0, device_type);
std::thread training_thd(&GaussianMapper::run, pGausMapper.get());
// Create Gaussian Viewer
std::thread viewer_thd;
std::shared_ptr<ImGuiViewer> pViewer;
// If the GUI config is true, we start the viewer thread and
// pass SLAM system pointer and 3D Gaussian Mapper pointer as input
if (use_viewer)
{
pViewer = std::make_shared<ImGuiViewer>(pSLAM, pGausMapper);
viewer_thd = std::thread(&ImGuiViewer::run, pViewer.get());
}
Then we output the system information.
Load the RGB and depth information and check if they are valid.
Scale RGB and depth image according to imageScale.
Timer (chrono is the new timing method in C++ 11)
Import RGB, depth image and frame time into pSLAM system.
After the main loop ends, close the SLAM system and visualization system, output the GPU peak usage and tracking information. End the main thread.
TrackRGBD is the ORB tracking. And we integrate orb tracking&mapping and 3D Gaussian mapping through:
This class include many functions but the main content is the constructor, run thread, Gaussian training and keyframe handler.
The constructor doesn't only handle the input parameters but also operate on its private variables. Set up the running device and camera model, initialize 3D Gaussian scene and set up the sensor types.
The constructor of GaussianMapper
void run()
void run() is the main process. It reads the camera pose and point cloud, prepares the images of multiple resolutions and calls the train function trainForOneIteration() .
void trainForOneIteration()
void trainForOneIteration() is the main training code, involves iterations management, Gaussian pyramid, rendering and calculating loss, saving and logging.
void combineMappingOperations()
void combineMappingOperations() is to combine the mappers, using the functions of orb-slam3 to do BA and loop closure.
void GaussianMapper::handleNewKeyframe
void GaussianMapper::handleNewKeyframe sets up the pose, camera, image, auxiliary image of the new keyframe; puts the new keyframe into the scene. It gives the used time of the keyframe and put it into the training sliding window,
Other functions
bool GaussianMapper::isStopped()
bool GaussianMapper::isStopped() returns the private variable stopped_ of the class objectGaussianMapper, at the same time it use the mutual lock to ensure the visit to this variable is thread-safe in the multi-thread environment.
void trainColmap()
void trainColmap() is for colmap training example only. Read the point cloud, train 3D Gaussians and save, it is similar to the run function.
Renderings
There are mainly three rendering functions to render rgb, depth, loss and to output the evaluation metrics: void recordKeyframeRendered(/*param*/), void renderAndRecordKeyframe(/*param*/) and void renderAndRecordAllKeyframes(/*param*/).
Loader
void loadPly(/*param*/) not only loads the point cloud, but also load the camera intrinsics to undistort the images and reset the size of the image.
Densification based on inactive points
void increasePcdByKeyframeInactiveGeoDensify(/*param*/) densifies the point cloud based on inactive points according to different types of sensors.
This is a main class. According to the head file, the main member variables include:
torch::DeviceType device_type_: type of device
The current degree and maximum degred of sperical harmonic: int active_sh_degree_ and int max_sh_degree_ .
The parameters of 3D Gaussians:
torch::Tensor xyz_: position
torch::Tensor features_dc_: direct component, aka, the inherent color of the Gaussian when SH=0
torch::Tensor features_rest_: remaining components, the remaining spherical harmonics coefficients for SH≥1.
torch:: Tensor opacity_: opacity alpha
torch::Tensor scaling_ : scaling factors
torch::Tensor rotation_ : rotation factors
torch::Tensor xyz_gradient_accum_ : accumulated gradient in the Gaussians
torch::Tensor denom_ : denominator tensor used for computing the average gradient accumulation for each Gaussian point during densification
torch::Tensor exist_since_iter_ : The iteration when this Gaussian is added to the map
Position and color of the sparse point cloud: torch::Tensor sparse_points_xyz and torch::Tensor sparse_points_color .
Two macros:
# define GAUSSIAN_MODEL_TENSORS_TO_VEC : to store the group of tensors into a vector (data type: std::vector<torch::Tensor>)
#define GAUSSIAN_MODEL_INIT_TENSORS(device_type) : to initialize multiple tensors and place them to the specific device. This macro accpets one parameter device_type for specific device.
The constructor GaussianModel::GaussianModel(). The input is sh_degree and model parameters.
torch::Tensor getCovarianceActivation(int scaling_modifier = 1) : Calculate the covariance matrix from the scaling and rotation, at the same time outputs symmetric uncertainty (NICE!)
void createFromPcd: Initialize the Gaussians from the point clouds. Create, fill and fuse the point cloud tensor and color tensor, deal with the color with SH degree. Calculate other properties of the point cloud.
void increasePcd: Add the new point cloud data (with color) into the existing Gaussian model.
void trainingSetup(/*param*/): set up the paramters of adam optimizer and learning rate.
float updateLearningRate(int step): Update the learning rate based on the step, call the pre-defined exponLrFunc function to get a continous leanring rate decrease function.
The following functions handle the densification, clone, pruning and split of Gaussians:
void densifyAndSplit, void densifyAndClone, void densifyAndPrune .
void densificationPostfix() extend s the newly created Gaussians (already densified) into the existing Gaussian model.
void scaledTransformVisiblePointsOfKeyframe() prepares and marks the model's Gaussian 3D points that are visible from a given keyframe, applying a uniform scale and view/projection transforms, then registers the transformed point and rotation tensors for optimization.
The Constructor(s)
There are two constructors in gaussian_model.cpp. The default parameterized constructor takes a single int parameter (sh_degree) as the input, allowing simple initialization with just the spherical harmonics degree. And the parameter-based constructor takes a const GaussianModelParams& parameter (a configuration struct). It provides more flexibility by accepting a full parameter object
get Functions
Creating Gaussians and adding new Gaussians from point cloud
There are two methods related to point cloud: void GaussianModel::createFromPcd to create Gaussians from the point cloud. void GaussianModel::increasePcd to add new Gaussians into existing Gaussian model from new points from point cloud.
There are two overloaded versions of void GaussianModel::increasePcd , one is void GaussianModel::increasePcd(std::vector<float> points, std::vector<float> colors, const int iteration) to deal with C++ vectors as input, the other is void GaussianModel::increasePcd(torch::Tensor& new_point_cloud, torch:: Tensor& new_colors, const int iteration) to deal with torch tensor as input. We use the former as an example.
GaussianScene::GaussianScene(/*param*/) is the constructor. It can read the Gaussian scene already trained.
void GaussianScene::addCamera(/*param*/) contains a bunch of get/set methods, to get the camera, keyframes and 3D points`.
void GaussianScne::applyScaledTransformation(/*param*/) applies a uniform scale to every keyframe's translation and then applies a rigid transform to each keyframe pose. It updates each GaussianKeyframe's stored pose and recomputes its transform tensors.
The net effect is a similarity transform (scale + rigid transform) applied to all keyframe positions and poses. Rotation comes only from the rigid transformation and the original pose; the scale is applied only to translations.
The constructor of Gaussian_scene
getNerfppNorm
This is part of the NeRF++ normalization scheme, which normalizes the scene by:
translate: Shifting the scene so the average camera center is at the origin
radius: Defining the normalized scale of the scene based on the camera distribution
This normalization helps standardize the scene representation for the Gaussian Splatting model used in Photo-SLAM.
In essence, the radius represents a bounding sphere around all the keyframe cameras. The scaling factor of 1.1 provides a 10% margin/padding around the actual maximum distance, which is a common practice in computer graphics and 3D reconstruction to ensure all cameras are safely within the computed bounds.
We definitely shall not forget our favorite keyframing part of a SLAM system :)
The headfile mainly defines the camera id, parameters, size of the images and gaussian pyramid, as well as the original image. The main functions in the cpp file includes:
GaussianKeyframe::getProjectionMatrix(): to get the projection matrix
GaussianKeyframe::getWorld2View2(): the transformation matrix from world to camera frame
void computeTransformTensors(): Use the two functions above to calculate the frame trasformation: world frame -> camera frame -> pixel frame.
int getCurretGausPyramidLevel(): get the current layer of the Gaussian pyramid
It only has one function GaussianRenderer::render, as a wrapper of the forward function above but it handles the calculation of covariance and spherical harmonic degree. It is similar to the gaussian_render/__init__.py.
// Constructor
GaussianMapper::GaussianMapper(
std::shared_ptr<ORB_SLAM3::System> pSLAM,
std::filesystem::path gaussian_config_file_path,
std::filesystem::path result_dir,
int seed,//random seed is 0
torch::DeviceType device_type)
: pSLAM_(pSLAM),//treat is as if already got the data from orbslam
initial_mapped_(false),
interrupt_training_(false),
stopped_(false),
iteration_(0),
ema_loss_for_log_(0.0f),
SLAM_ended_(false),
loop_closure_iteration_(false),
min_num_initial_map_kfs_(15UL),
large_rot_th_(1e-1f),
large_trans_th_(1e-2f),
training_report_interval_(0)
{
// In standard C++ library, this code is used to set up the random seed.
std::srand(seed);
// This is the random seed set up for PyTorch/litorch.
torch::manual_seed(seed);
// Device(set up according to the input device)
if (device_type == torch::kCUDA && torch::cuda::is_available()) {
std::cout << "[Gaussian Mapper]CUDA available! Training on GPU." << std::endl;
device_type_ = torch::kCUDA;
model_params_.data_device_ = "cuda";
}
else {
std::cout << "[Gaussian Mapper]Training on CPU." << std::endl;
device_type_ = torch::kCPU;
model_params_.data_device_ = "cpu";
}
// Create the output directory for results
result_dir_ = result_dir;
CHECK_DIRECTORY_AND_CREATE_IF_NOT_EXISTS(result_dir)
// Read the config file
config_file_path_ = gaussian_config_file_path;
readConfigFromFile(gaussian_config_file_path);
// Set up the background color
std::vector<float> bg_color;
if (model_params_.white_background_) //If it is white
bg_color = {1.0f, 1.0f, 1.0f};
else //Otherwise it is black
bg_color = {0.0f, 0.0f, 0.0f};
// Convert bg_color to a tensor and save it in the variable named background_
// Here we use torch::TensorOptions() to specify the options for the tensor
// including the data type to torch::kFloat32 and the device type
background_ = torch::tensor(bg_color,
torch::TensorOptions().dtype(torch::kFloat32).device(device_type_));
override_color_ = torch::empty(0, torch::TensorOptions().device(device_type_));
// Initialize scene and model including the SH parameter
// Initializ 3DGS tensor
gaussians_ = std::make_shared<GaussianModel>(model_params_);
scene_ = std::make_shared<GaussianScene>(model_params_);
// Mode (If there is no SLAM results then return)
if (!pSLAM) {
// NO SLAM
return;
}
// Specify tensor types
switch (pSLAM->getSensorType())
{
case ORB_SLAM3::System::MONOCULAR:
case ORB_SLAM3::System::IMU_MONOCULAR:
{
this->sensor_type_ = MONOCULAR;
}
break;
case ORB_SLAM3::System::STEREO:
case ORB_SLAM3::System::IMU_STEREO:
{
this->sensor_type_ = STEREO;
this->stereo_baseline_length_ = pSLAM->getSettings()->b();
this->stereo_cv_sgm_ = cv::cuda::createStereoSGM(
this->stereo_min_disparity_,
this->stereo_num_disparity_);
this->stereo_Q_ = pSLAM->getSettings()->Q().clone();
stereo_Q_.convertTo(stereo_Q_, CV_32FC3, 1.0);
}
break;
case ORB_SLAM3::System::RGBD:
case ORB_SLAM3::System::IMU_RGBD:
{
this->sensor_type_ = RGBD;
}
break;
default:
{
throw std::runtime_error("[Gaussian Mapper]Unsupported sensor type!");
}
break;
}
// Cameras
// TODO: not only monocular
auto settings = pSLAM->getSettings();
cv::Size SLAM_im_size = settings->newImSize();
//Get the undistortion parameters of the camera
UndistortParams undistort_params(
SLAM_im_size,
settings->camera1DistortionCoef()
);
//Get all cameras and the intrinsics accordingly in a loop
auto vpCameras = pSLAM->getAtlas()->GetAllCameras();
for (auto& SLAM_camera : vpCameras) {
Camera camera;//Create a camera class
//Get the id of the camera
camera.camera_id_ = SLAM_camera->GetId();
//Get the type of the camera to fetch the instrinc matrix
if (SLAM_camera->GetType() == ORB_SLAM3::GeometricCamera::CAM_PINHOLE) {
camera.setModelId(Camera::CameraModelType::PINHOLE);
float SLAM_fx = SLAM_camera->getParameter(0);
float SLAM_fy = SLAM_camera->getParameter(1);
float SLAM_cx = SLAM_camera->getParameter(2);
float SLAM_cy = SLAM_camera->getParameter(3);
// Old K, i.e. K in SLAM
cv::Mat K = (
cv::Mat_<float>(3, 3)
<< SLAM_fx, 0.f, SLAM_cx,
0.f, SLAM_fy, SLAM_cy,
0.f, 0.f, 1.f
);
// camera.width_ = this->sensor_type_ == STEREO ? undistort_params.old_size_.width
// : graphics_utils::roundToIntegerMultipleOf16(
// undistort_params.old_size_.width);
camera.width_ = undistort_params.old_size_.width;
float x_ratio = static_cast<float>(camera.width_) / undistort_params.old_size_.width;
// camera.height_ = this->sensor_type_ == STEREO ? undistort_params.old_size_.height
// : graphics_utils::roundToIntegerMultipleOf16(
// undistort_params.old_size_.height);
camera.height_ = undistort_params.old_size_.height;
float y_ratio = static_cast<float>(camera.height_) / undistort_params.old_size_.height;
camera.num_gaus_pyramid_sub_levels_ = num_gaus_pyramid_sub_levels_;
camera.gaus_pyramid_width_.resize(num_gaus_pyramid_sub_levels_);
camera.gaus_pyramid_height_.resize(num_gaus_pyramid_sub_levels_);
for (int l = 0; l < num_gaus_pyramid_sub_levels_; ++l) {
camera.gaus_pyramid_width_[l] = camera.width_ * this->kf_gaus_pyramid_factors_[l];
camera.gaus_pyramid_height_[l] = camera.height_ * this->kf_gaus_pyramid_factors_[l];
}
camera.params_[0]/*new fx*/= SLAM_fx * x_ratio;
camera.params_[1]/*new fy*/= SLAM_fy * y_ratio;
camera.params_[2]/*new cx*/= SLAM_cx * x_ratio;
camera.params_[3]/*new cy*/= SLAM_cy * y_ratio;
cv::Mat K_new = (
cv::Mat_<float>(3, 3)
<< camera.params_[0], 0.f, camera.params_[2],
0.f, camera.params_[1], camera.params_[3],
0.f, 0.f, 1.f
);
// Undistortion
if (this->sensor_type_ == MONOCULAR || this->sensor_type_ == RGBD)
undistort_params.dist_coeff_.copyTo(camera.dist_coeff_);
camera.initUndistortRectifyMapAndMask(K, SLAM_im_size, K_new, true);
undistort_mask_[camera.camera_id_] =
tensor_utils::cvMat2TorchTensor_Float32(
camera.undistort_mask, device_type_);
cv::Mat viewer_sub_undistort_mask;
int viewer_image_height_ = camera.height_ * rendered_image_viewer_scale_;
int viewer_image_width_ = camera.width_ * rendered_image_viewer_scale_;
cv::resize(camera.undistort_mask, viewer_sub_undistort_mask,
cv::Size(viewer_image_width_, viewer_image_height_));
viewer_sub_undistort_mask_[camera.camera_id_] =
tensor_utils::cvMat2TorchTensor_Float32(
viewer_sub_undistort_mask, device_type_);
cv::Mat viewer_main_undistort_mask;
int viewer_image_height_main_ = camera.height_ * rendered_image_viewer_scale_main_;
int viewer_image_width_main_ = camera.width_ * rendered_image_viewer_scale_main_;
cv::resize(camera.undistort_mask, viewer_main_undistort_mask,
cv::Size(viewer_image_width_main_, viewer_image_height_main_));
viewer_main_undistort_mask_[camera.camera_id_] =
tensor_utils::cvMat2TorchTensor_Float32(
viewer_main_undistort_mask, device_type_);
if (this->sensor_type_ == STEREO) {
camera.stereo_bf_ = stereo_baseline_length_ * camera.params_[0];
if (this->stereo_Q_.cols != 4) {
this->stereo_Q_ = cv::Mat(4, 4, CV_32FC1);
this->stereo_Q_.setTo(0.0f);
this->stereo_Q_.at<float>(0, 0) = 1.0f;
this->stereo_Q_.at<float>(0, 3) = -camera.params_[2];
this->stereo_Q_.at<float>(1, 1) = 1.0f;
this->stereo_Q_.at<float>(1, 3) = -camera.params_[3];
this->stereo_Q_.at<float>(2, 3) = camera.params_[0];
this->stereo_Q_.at<float>(3, 2) = 1.0f / stereo_baseline_length_;
}
}
}
else if (SLAM_camera->GetType() == ORB_SLAM3::GeometricCamera::CAM_FISHEYE) {
camera.setModelId(Camera::CameraModelType::FISHEYE);
}
else {
camera.setModelId(Camera::CameraModelType::INVALID);
}
if (!viewer_camera_id_set_) {
viewer_camera_id_ = camera.camera_id_;
viewer_camera_id_set_ = true;
}
this->scene_->addCamera(camera);//In the end, we place the camera into the scene
}
}
void GaussianMapper::run()
{
// First loop: Initial gaussian mapping
// Initially, we use a loop to initialize the 3D Gaussian mapping process and single iteration training
// The loop will end once the initialization is executed and the first training iteration is finished.
while (!isStopped()) {
// Check conditions for initial mapping
// We need ORBSLAM still running and the number of keyframes is over a certain amount
if (hasMetInitialMappingConditions()) {
//Clean the mapping operation of orbslam3
pSLAM_->getAtlas()->clearMappingOperation();
// Get initial sparse map from orbSLAM's current map
auto pMap = pSLAM_->getAtlas()->GetCurrentMap();
std::vector<ORB_SLAM3::KeyFrame*> vpKFs;
std::vector<ORB_SLAM3::MapPoint*> vpMPs;
{
// Lock the map of orb3, basically pause the tracking during mapping
std::unique_lock<std::mutex> lock_map(pMap->mMutexMapUpdate);
//Get keyframes
vpKFs = pMap->GetAllKeyFrames();
//Get map points
vpMPs = pMap->GetAllMapPoints();
//Iterate all map points to get their 3D coordinates and colors. Then place them into scene_
for (const auto& pMP : vpMPs){
Point3D point3D;
auto pos = pMP->GetWorldPos();
point3D.xyz_(0) = pos.x();
point3D.xyz_(1) = pos.y();
point3D.xyz_(2) = pos.z();
auto color = pMP->GetColorRGB();
point3D.color_(0) = color(0);
point3D.color_(1) = color(1);
point3D.color_(2) = color(2);
//Place the points and colors into cached_point_cloud_
scene_->cachePoint3D(pMP->mnId, point3D);
}
//Then iterate all keyframes
for (const auto& pKF : vpKFs){
//Create a new Gaussian keyframe based on the keyframe id and current iteration number
std::shared_ptr<GaussianKeyframe> new_kf = std::make_shared<GaussianKeyframe>(pKF->mnId, getIteration());
//The closet and farest depth
new_kf->zfar_ = z_far_;
new_kf->znear_ = z_near_;
//Get pose of the according keyframe
auto pose = pKF->GetPose();
//Set pose
new_kf->setPose(
pose.unit_quaternion().cast<double>(),
pose.translation().cast<double>());
// Get image information
cv::Mat imgRGB_undistorted, imgAux_undistorted;
try {
// Camera
Camera& camera = scene_->cameras_.at(pKF->mpCamera->GetId());
new_kf->setCameraParams(camera);
// Image (left if STEREO)
cv::Mat imgRGB = pKF->imgLeftRGB;
if (this->sensor_type_ == STEREO)
imgRGB_undistorted = imgRGB;
else
camera.undistortImage(imgRGB, imgRGB_undistorted);
// Depth Image, which is auxiliary as it support mono.
cv::Mat imgAux = pKF->imgAuxiliary;
if (this->sensor_type_ == RGBD)
camera.undistortImage(imgAux, imgAux_undistorted);
else
imgAux_undistorted = imgAux;
new_kf->original_image_ =
tensor_utils::cvMat2TorchTensor_Float32(imgRGB_undistorted, device_type_);
new_kf->img_filename_ = pKF->mNameFile;
new_kf->gaus_pyramid_height_ = camera.gaus_pyramid_height_;
new_kf->gaus_pyramid_width_ = camera.gaus_pyramid_width_;
new_kf->gaus_pyramid_times_of_use_ = kf_gaus_pyramid_times_of_use_;
}
catch (std::out_of_range) {
throw std::runtime_error("[GaussianMapper::run]KeyFrame Camera not found!");
}
// Calculate the transformation matrix (in tensor format)
new_kf->computeTransformTensors();
// Then add the keyframe into the scene
scene_->addKeyframe(new_kf, &kfid_shuffled_);
//Increase the keyframe's used time to prevent using the same keyframe all the time
increaseKeyframeTimesOfUse(new_kf, newKeyframeTimesOfUse());
// Features positions from ORB-SLAM3::KeyFrame::GetKeypointInfo.
// pixels are 2D features in images, pointsLocals are 3D feature points in the space.
std::vector<float> pixels;
std::vector<float> pointsLocal;
pKF->GetKeypointInfo(pixels, pointsLocal);
// Pass the vector including the pixel and local coordinates to the member variables
// kps_pixel_ and kps_point_local_ of the new keyframe new_kf
// This is to avoid copying and improve efficiency
new_kf->kps_pixel_ = std::move(pixels);
new_kf->kps_point_local_ = std::move(pointsLocal);
new_kf->img_undist_ = imgRGB_undistorted;
new_kf->img_auxiliary_undist_ = imgAux_undistorted;
}
}
// Prepare multi resolution images for training (for Gausisan pyramid)
for (auto& kfit : scene_->keyframes()) {
auto pkf = kfit.second;
if (device_type_ == torch::kCUDA) {
cv::cuda::GpuMat img_gpu;
img_gpu.upload(pkf->img_undist_);
pkf->gaus_pyramid_original_image_.resize(num_gaus_pyramid_sub_levels_);
for (int l = 0; l < num_gaus_pyramid_sub_levels_; ++l) {
cv::cuda::GpuMat img_resized;
cv::cuda::resize(img_gpu, img_resized,
cv::Size(pkf->gaus_pyramid_width_[l], pkf->gaus_pyramid_height_[l]));
pkf->gaus_pyramid_original_image_[l] =
tensor_utils::cvGpuMat2TorchTensor_Float32(img_resized);
}
}
else {
pkf->gaus_pyramid_original_image_.resize(num_gaus_pyramid_sub_levels_);
for (int l = 0; l < num_gaus_pyramid_sub_levels_; ++l) {
cv::Mat img_resized;
cv::resize(pkf->img_undist_, img_resized,
cv::Size(pkf->gaus_pyramid_width_[l], pkf->gaus_pyramid_height_[l]));
pkf->gaus_pyramid_original_image_[l] =
tensor_utils::cvMat2TorchTensor_Float32(img_resized, device_type_);
}
}
}
// Prepare for training (basic setup for training)
{
std::unique_lock<std::mutex> lock_render(mutex_render_);
// Get the edge of scene from the point cloud
scene_->cameras_extent_ = std::get<1>(scene_->getNerfppNorm());
// Create the Gaussian model based on the cached_point_cloud_ and the scene edge
gaussians_->createFromPcd(scene_->cached_point_cloud_, scene_->cameras_extent_);
std::unique_lock<std::mutex> lock(mutex_settings_);
// Setup the parameters for training using vector
gaussians_->trainingSetup(opt_params_);
}
// Invoke training once(1st iteration)
trainForOneIteration();
// Finish initial mapping loop only after the map initialization and one iteration of training
initial_mapped_ = true;
break;
}
else if (pSLAM_->isShutDown()) {
break;
}
else {
// Initial conditions not satisfied
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
// Second loop: Incremental gaussian mapping
int SLAM_stop_iter = 0;
while (!isStopped()) {
// Check conditions for incremental mapping
if (hasMetIncrementalMappingConditions()) {
// If the conditions for incremental mapping is fulfilled, then do mapping
// At the same time, add new Gaussians in the Gaussian map
combineMappingOperations();
if (cull_keyframes_)
// Clear unnecessary keyframes in the scene.
cullKeyframes();
}
// Invoke training once
trainForOneIteration();
// Here we continue the training until the maximum training iteration is reached
if (pSLAM_->isShutDown()) {
SLAM_stop_iter = getIteration();
SLAM_ended_ = true;
}
if (SLAM_ended_ || getIteration() >= opt_params_.iterations_)
break;
}
// Third loop: Tail gaussian optimization
int densify_interval = densifyInterval();
int n_delay_iters = densify_interval * 0.8;
while (getIteration() - SLAM_stop_iter <= n_delay_iters || getIteration() % densify_interval <= n_delay_iters || isKeepingTraining()) {
trainForOneIteration();
densify_interval = densifyInterval();
n_delay_iters = densify_interval * 0.8;
}
// Save and clear
renderAndRecordAllKeyframes("_shutdown");
savePly(result_dir_ / (std::to_string(getIteration()) + "_shutdown") / "ply");
writeKeyframeUsedTimes(result_dir_ / "used_times", "final");
signalStop();
}
void GaussianMapper::trainForOneIteration()
{
//Add 1 to the training iteration
increaseIteration(1);
auto iter_start_timing = std::chrono::steady_clock::now();
// Pick a random Camera of keyframe
std::shared_ptr<GaussianKeyframe> viewpoint_cam = useOneRandomSlidingWindowKeyframe();
if (!viewpoint_cam) {
increaseIteration(-1);
return;
}
writeKeyframeUsedTimes(result_dir_ / "used_times");
// if (isdoingInactiveGeoDensify() && !viewpoint_cam->done_inactive_geo_densify_)
// increasePcdByKeyframeInactiveGeoDensify(viewpoint_cam);
int training_level = num_gaus_pyramid_sub_levels_;
int image_height, image_width;
torch::Tensor gt_image, mask;
if (isdoingGausPyramidTraining())
training_level = viewpoint_cam->getCurrentGausPyramidLevel();
if (training_level == num_gaus_pyramid_sub_levels_) {
image_height = viewpoint_cam->image_height_;
image_width = viewpoint_cam->image_width_;
gt_image = viewpoint_cam->original_image_.cuda();
mask = undistort_mask_[viewpoint_cam->camera_id_];
}
else {
//Get the width and height of images, original images and mask (for undistortion???)
image_height = viewpoint_cam->gaus_pyramid_height_[training_level];
image_width = viewpoint_cam->gaus_pyramid_width_[training_level];
gt_image = viewpoint_cam->gaus_pyramid_original_image_[training_level].cuda();
mask = scene_->cameras_.at(viewpoint_cam->camera_id_).gaus_pyramid_undistort_mask_[training_level];
}
// Mutex lock for usage of the gaussian model
std::unique_lock<std::mutex> lock_render(mutex_render_);
// Every 1000 iterations, we increase the levels of SH up to a maximum degree
if (getIteration() % 1000 == 0 && default_sh_ < model_params_.sh_degree_)
default_sh_ += 1;
// if (isdoingGausPyramidTraining())
// gaussians_->setShDegree(training_level);
// else
gaussians_->setShDegree(default_sh_);
// Update learning rate
if (pSLAM_) {
int used_times = kfs_used_times_[viewpoint_cam->fid_];
int step = (used_times <= opt_params_.position_lr_max_steps_ ? used_times : opt_params_.position_lr_max_steps_);
float position_lr = gaussians_->updateLearningRate(step);
setPositionLearningRateInit(position_lr);
}
else {
gaussians_->updateLearningRate(getIteration());
}
gaussians_->setFeatureLearningRate(featureLearningRate());
gaussians_->setOpacityLearningRate(opacityLearningRate());
gaussians_->setScalingLearningRate(scalingLearningRate());
gaussians_->setRotationLearningRate(rotationLearningRate());
// Render. The returned render_pkg is a tuple, including
// rendered images, points in the view space, visibility filter and radius.
auto render_pkg = GaussianRenderer::render(
viewpoint_cam,
image_height,
image_width,
gaussians_,
pipe_params_,
background_,
override_color_
);
auto rendered_image = std::get<0>(render_pkg);//Rendered image
auto viewspace_point_tensor = std::get<1>(render_pkg);
auto visibility_filter = std::get<2>(render_pkg);
auto radii = std::get<3>(render_pkg);
// Get rid of black edges caused by undistortion
torch::Tensor masked_image = rendered_image * mask;
// Loss: Calculate loss and do backpropagation
auto Ll1 = loss_utils::l1_loss(masked_image, gt_image);
float lambda_dssim = lambdaDssim();
auto loss = (1.0 - lambda_dssim) * Ll1
+ lambda_dssim * (1.0 - loss_utils::ssim(masked_image, gt_image, device_type_));
loss.backward();
torch::cuda::synchronize();
{
torch::NoGradGuard no_grad;//Stop gradient calculation
ema_loss_for_log_ = 0.4f * loss.item().toFloat() + 0.6 * ema_loss_for_log_;
if (keyframe_record_interval_ &&
getIteration() % keyframe_record_interval_ == 0)
// Record the rendered image, ground truth image, keyfrae ID, camera to the saved path
recordKeyframeRendered(masked_image, gt_image, viewpoint_cam->fid_, result_dir_, result_dir_, result_dir_);
// Densification
if (getIteration() < opt_params_.densify_until_iter_) {
// Keep track of max radii in image-space for pruning
gaussians_->max_radii2D_.index_put_(
{visibility_filter},
torch::max(gaussians_->max_radii2D_.index({visibility_filter}),
radii.index({visibility_filter})));
// if (!isdoingGausPyramidTraining() || training_level < num_gaus_pyramid_sub_levels_)
gaussians_->addDensificationStats(viewspace_point_tensor, visibility_filter);
if ((getIteration() > opt_params_.densify_from_iter_) &&
(getIteration() % densifyInterval()== 0)) {
int size_threshold = (getIteration() > prune_big_point_after_iter_) ? 20 : 0;
// Densifify and pruning
gaussians_->densifyAndPrune(
densifyGradThreshold(),
densify_min_opacity_,//0.005,//
scene_->cameras_extent_,
size_threshold
);
}
if (opacityResetInterval()
&& (getIteration() % opacityResetInterval() == 0
||(model_params_.white_background_ && getIteration() == opt_params_.densify_from_iter_)))
gaussians_->resetOpacity();
}
auto iter_end_timing = std::chrono::steady_clock::now();
auto iter_time = std::chrono::duration_cast<std::chrono::milliseconds>(
iter_end_timing - iter_start_timing).count();
// Log and save
if (training_report_interval_ && (getIteration() % training_report_interval_ == 0))
GaussianTrainer::trainingReport(
getIteration(),
opt_params_.iterations_,
Ll1,
loss,
ema_loss_for_log_,
loss_utils::l1_loss,
iter_time,
*gaussians_,
*scene_,
pipe_params_,
background_
);
if ((all_keyframes_record_interval_ && getIteration() % all_keyframes_record_interval_ == 0)
// || loop_closure_iteration_
)
{
renderAndRecordAllKeyframes();
savePly(result_dir_ / std::to_string(getIteration()) / "ply");
}
if (loop_closure_iteration_)
loop_closure_iteration_ = false;
// Optimizer step
// If the iteration is smaller than the maximum iteration amount, continue to optimize
if (getIteration() < opt_params_.iterations_) {
gaussians_->optimizer_->step();
gaussians_->optimizer_->zero_grad(true);
}
}
void GaussianMapper::combineMappingOperations()
{
// Get Mapping Operations
while (pSLAM_->getAtlas()->hasMappingOperation()) {
ORB_SLAM3::MappingOperation opr =
pSLAM_->getAtlas()->getAndPopMappingOperation();
switch (opr.meOperationType)
{
// Handle local bundle adjustment
case ORB_SLAM3::MappingOperation::OprType::LocalMappingBA:
{
// std::cout << "[Gaussian Mapper]Local BA Detected."
// << std::endl;
// Get new keyframes
auto& associated_kfs = opr.associatedKeyFrames();
// Add keyframes to the scene
for (auto& kf : associated_kfs) {
// Keyframe Id
auto kfid = std::get<0>(kf);
std::shared_ptr<GaussianKeyframe> pkf = scene_->getKeyframe(kfid);
// If the keyframe is already in the scene, only update the pose.
// Otherwise create a new one
if (pkf) {
auto& pose = std::get<2>(kf);
pkf->setPose(
pose.unit_quaternion().cast<double>(),
pose.translation().cast<double>());
pkf->computeTransformTensors();
// Give local BA keyframes times of use
increaseKeyframeTimesOfUse(pkf, local_BA_increased_times_of_use_);
}
else {
handleNewKeyframe(kf);
}
}
// Get new points
// Theoratically we can get the new points by removing the cooresponding points ???
auto& associated_points = opr.associatedMapPoints();
auto& points = std::get<0>(associated_points);
auto& colors = std::get<1>(associated_points);
// Add new points to the model
if (initial_mapped_ && points.size() >= 30) {
// No gradient calculation below
torch::NoGradGuard no_grad;
std::unique_lock<std::mutex> lock_render(mutex_render_);
gaussians_->increasePcd(points, colors, getIteration());
}
}
break;
// Loop closure bundle adjustment
case ORB_SLAM3::MappingOperation::OprType::LoopClosingBA:
{
std::cout << "[Gaussian Mapper]Loop Closure Detected."
<< std::endl;
// Get the loop keyframe scale modification factor
float loop_kf_scale = opr.mfScale;
// Get new keyframes (scaled transformation applied in ORB-SLAM3)
auto& associated_kfs = opr.associatedKeyFrames();
// Mark the transformed points to avoid transforming more than once
torch::Tensor point_not_transformed_flags =
torch::full(
{gaussians_->xyz_.size(0)},
true,
torch::TensorOptions().device(device_type_).dtype(torch::kBool));
if (record_loop_ply_)
savePly(result_dir_ / (std::to_string(getIteration()) + "_0_before_loop_correction"));
int num_transformed = 0;
// Add keyframes to the scene
for (auto& kf : associated_kfs) {
// Keyframe Id
auto kfid = std::get<0>(kf);
std::shared_ptr<GaussianKeyframe> pkf = scene_->getKeyframe(kfid);
// In case new points are added in handleNewKeyframe()
int64_t num_new_points = gaussians_->xyz_.size(0) - point_not_transformed_flags.size(0);
if (num_new_points > 0)
point_not_transformed_flags = torch::cat({
point_not_transformed_flags,
torch::full({num_new_points}, true, point_not_transformed_flags.options())},
/*dim=*/0);
// If kf is already in the scene, evaluate the change in pose,
// if too large we perform loop correction on its visible model points.
// If not in the scene, create a new one.
if (pkf) {
auto& pose = std::get<2>(kf);
// If is loop closure kf
// if (std::get<4>(kf)) {
// renderAndRecordKeyframe(pkf, result_dir_, "_0_before_loop_correction");
Sophus::SE3f original_pose = pkf->getPosef(); // original_pose = old, inv_pose = new
Sophus::SE3f inv_pose = pose.inverse();
Sophus::SE3f diff_pose = inv_pose * original_pose;
bool large_rot = !diff_pose.rotationMatrix().isApprox(
Eigen::Matrix3f::Identity(), large_rot_th_);
bool large_trans = !diff_pose.translation().isMuchSmallerThan(
1.0, large_trans_th_);
if (large_rot || large_trans) {
std::cout << "[Gaussian Mapper]Large loop correction detected, transforming visible points of kf "
<< kfid << std::endl;
diff_pose.translation() -= inv_pose.translation(); // t = (R_new * t_old + t_new) - t_new
diff_pose.translation() *= loop_kf_scale; // t = s * (R_new * t_old)
diff_pose.translation() += inv_pose.translation(); // t = (s * R_new * t_old) + t_new
torch::Tensor diff_pose_tensor =
tensor_utils::EigenMatrix2TorchTensor(
diff_pose.matrix(), device_type_).transpose(0, 1);
{
std::unique_lock<std::mutex> lock_render(mutex_render_);
gaussians_->scaledTransformVisiblePointsOfKeyframe(
point_not_transformed_flags,
diff_pose_tensor,
pkf->world_view_transform_,
pkf->full_proj_transform_,
pkf->creation_iter_,
stableNumIterExistence(),
num_transformed,
loop_kf_scale); // selected xyz *= s
}
// Give loop keyframes times of use
increaseKeyframeTimesOfUse(pkf, loop_closure_increased_times_of_use_);
// renderAndRecordKeyframe(pkf, result_dir_, "_1_after_loop_transforming_points");
// std::cout<<num_transformed<<std::endl;
}
// }
pkf->setPose(
pose.unit_quaternion().cast<double>(),
pose.translation().cast<double>());
pkf->computeTransformTensors();
// if (std::get<4>(kf)) renderAndRecordKeyframe(pkf, result_dir_, "_2_after_pose_correction");
}
else {
handleNewKeyframe(kf);
}
}
if (record_loop_ply_)
savePly(result_dir_ / (std::to_string(getIteration()) + "_1_after_loop_correction"));
// keyframesToJson(result_dir_ / (std::to_string(getIteration()) + "_0_before_loop_correction"));
// Get new points (scaled transformation applied in ORB-SLAM3, so this step is performed at last to avoid scaling twice)
auto& associated_points = opr.associatedMapPoints();
auto& points = std::get<0>(associated_points);
auto& colors = std::get<1>(associated_points);
// Add new points to the model
if (initial_mapped_ && points.size() >= 30) {
torch::NoGradGuard no_grad;
std::unique_lock<std::mutex> lock_render(mutex_render_);
gaussians_->increasePcd(points, colors, getIteration());
}
// Mark this iteration
loop_closure_iteration_ = true;
}
break;
case ORB_SLAM3::MappingOperation::OprType::ScaleRefinement:
{
std::cout << "[Gaussian Mapper]Scale refinement Detected. Transforming all kfs and points..."
<< std::endl;
float s = opr.mfScale;
Sophus::SE3f& T = opr.mT;
if (initial_mapped_) {
// Apply the scaled transformation on gaussian model points
{
std::unique_lock<std::mutex> lock_render(mutex_render_);
gaussians_->applyScaledTransformation(s, T);
}
// Apply the scaled transformation to the scene
scene_->applyScaledTransformation(s, T);
}
else { // TODO: the workflow should not come here, delete this branch
// Apply the scaled transformation to the cached points
for (auto& pt : scene_->cached_point_cloud_) {
// pt <- (s * Ryw * pt + tyw)
auto& pt_xyz = pt.second.xyz_;
pt_xyz *= s;
pt_xyz = T.cast<double>() * pt_xyz;
}
// Apply the scaled transformation on gaussian keyframes
for (auto& kfit : scene_->keyframes()) {
std::shared_ptr<GaussianKeyframe> pkf = kfit.second;
Sophus::SE3f Twc = pkf->getPosef().inverse();
Twc.translation() *= s;
Sophus::SE3f Tyc = T * Twc;
Sophus::SE3f Tcy = Tyc.inverse();
pkf->setPose(Tcy.unit_quaternion().cast<double>(), Tcy.translation().cast<double>());
pkf->computeTransformTensors();
}
}
}
break;
default:
{
throw std::runtime_error("MappingOperation type not supported!");
}
break;
}
}
}
void GaussianMapper::handleNewKeyframe(
std::tuple< unsigned long/*Id*/,
unsigned long/*CameraId*/,
Sophus::SE3f/*pose*/,
cv::Mat/*image*/,
bool/*isLoopClosure*/,
cv::Mat/*auxiliaryImage*/,
std::vector<float>,
std::vector<float>,
std::string> &kf)
{
std::shared_ptr<GaussianKeyframe> pkf =
std::make_shared<GaussianKeyframe>(std::get<0>(kf), getIteration());
pkf->zfar_ = z_far_;
pkf->znear_ = z_near_;
// Pose
auto& pose = std::get<2>(kf);
pkf->setPose(
pose.unit_quaternion().cast<double>(),
pose.translation().cast<double>());
cv::Mat imgRGB_undistorted, imgAux_undistorted;
try {
// Camera
Camera& camera = scene_->cameras_.at(std::get<1>(kf));
pkf->setCameraParams(camera);
// Image (left if STEREO)
cv::Mat imgRGB = std::get<3>(kf);
if (this->sensor_type_ == STEREO)
imgRGB_undistorted = imgRGB;
else
camera.undistortImage(imgRGB, imgRGB_undistorted);
// Auxiliary Image
cv::Mat imgAux = std::get<5>(kf);
if (this->sensor_type_ == RGBD)
camera.undistortImage(imgAux, imgAux_undistorted);
else
imgAux_undistorted = imgAux;
pkf->original_image_ =
tensor_utils::cvMat2TorchTensor_Float32(imgRGB_undistorted, device_type_);
pkf->img_filename_ = std::get<8>(kf);
pkf->gaus_pyramid_height_ = camera.gaus_pyramid_height_;
pkf->gaus_pyramid_width_ = camera.gaus_pyramid_width_;
pkf->gaus_pyramid_times_of_use_ = kf_gaus_pyramid_times_of_use_;
}
catch (std::out_of_range) {
throw std::runtime_error("[GaussianMapper::combineMappingOperations]KeyFrame Camera not found!");
}
// Add the new keyframe to the scene
pkf->computeTransformTensors();
scene_->addKeyframe(pkf, &kfid_shuffled_);
// Give new keyframes times of use and add it to the training sliding window
increaseKeyframeTimesOfUse(pkf, newKeyframeTimesOfUse());
// Get dense point cloud from the new keyframe to accelerate training
pkf->img_undist_ = imgRGB_undistorted;
pkf->img_auxiliary_undist_ = imgAux_undistorted;
pkf->kps_pixel_ = std::move(std::get<6>(kf));
pkf->kps_point_local_ = std::move(std::get<7>(kf));
if (isdoingInactiveGeoDensify())
increasePcdByKeyframeInactiveGeoDensify(pkf);
// Prepare multi resolution images for pyramid training
if (device_type_ == torch::kCUDA) {
cv::cuda::GpuMat img_gpu;
img_gpu.upload(pkf->img_undist_);
pkf->gaus_pyramid_original_image_.resize(num_gaus_pyramid_sub_levels_);
for (int l = 0; l < num_gaus_pyramid_sub_levels_; ++l) {
cv::cuda::GpuMat img_resized;
cv::cuda::resize(img_gpu, img_resized,
cv::Size(pkf->gaus_pyramid_width_[l], pkf->gaus_pyramid_height_[l]));
pkf->gaus_pyramid_original_image_[l] =
tensor_utils::cvGpuMat2TorchTensor_Float32(img_resized);
}
}
else {
pkf->gaus_pyramid_original_image_.resize(num_gaus_pyramid_sub_levels_);
for (int l = 0; l < num_gaus_pyramid_sub_levels_; ++l) {
cv::Mat img_resized;
cv::resize(pkf->img_undist_, img_resized,
cv::Size(pkf->gaus_pyramid_width_[l], pkf->gaus_pyramid_height_[l]));
pkf->gaus_pyramid_original_image_[l] =
tensor_utils::cvMat2TorchTensor_Float32(img_resized, device_type_);
}
}
}
bool GaussianMapper::isStopped()
{
// Create a exclusive lock lock_status, also lockes the mutex mutex_status_
// The purpose of the exclusive lock is to ensure during the processing time of the function,
// other threads cannot visit the protected data
std::unique_lock<std::mutex> lock_status(this->mutex_status_);
// The following line returns the private variable stopped_ of the class object GaussianMapper
// Because we have got the exclusive access to mutex_status_, we can safely read the value of this member variable.
return this->stopped_;
// When the function finishes and leave the scpe, the exclusive lock lock_status
// will release automatically and release mutex_status_
}
// The default parameterized constructor
GaussianModel::GaussianModel(const int sh_degree)
: active_sh_degree_(0), spatial_lr_scale_(0.0),
lr_delay_steps_(0), lr_delay_mult_(1.0), max_steps_(1000000)
{
this->max_sh_degree_ = sh_degree;
// Device
if (torch::cuda::is_available())
this->device_type_ = torch::kCUDA;
else
this->device_type_ = torch::kCPU;
//The corresponding tensors for the 3D Gaussians are all empty, including:
// xyz_, features_dc_, features_rest_, scaling_, rotation_, opacity_,
// max_radii2D_, xyz_gradient_accum_ and denom_
GAUSSIAN_MODEL_INIT_TENSORS(this->device_type_)
}
// The parameter-based constructor, similar to the default one
// but initialize the GaussianModel with parameters
GaussianModel::GaussianModel(const GaussianModelParams &model_params)
: active_sh_degree_(0), spatial_lr_scale_(0.0),
lr_delay_steps_(0), lr_delay_mult_(1.0), max_steps_(1000000)
{
this->max_sh_degree_ = model_params.sh_degree_;
// Device
if (model_params.data_device_ == "cuda")
this->device_type_ = torch::kCUDA;
else
this->device_type_ = torch::kCPU;
//The corresponding tensors for the 3D Gaussians are all empty, including:
// xyz_, features_dc_, features_rest_, scaling_, rotation_, opacity_,
// max_radii2D_, xyz_gradient_accum_ and denom_
GAUSSIAN_MODEL_INIT_TENSORS(this->device_type_)
}
Get the scaling
torch::Tensor GaussianModel::getScalingActivation()
{
// restore the scaling from log space into linear space
return torch::exp(this->scaling_);
}
torch::Tensor GaussianModel::getRotationActivation()
{
// Calling the torch::nn::functional::normalize method normalizes
// the input rotation tensor, to make the norm of each row/column
// to be one, so that we can get the rotatation activation value
// Finally it will return the normalized tensor as the result
return torch::nn::functional::normalize(this->rotation_);
}
torch::Tensor GaussianModel::getXYZ()
{
return this->xyz_;
}
torch::Tensor GaussianModel::getFeatures()
{
return torch::cat({this->features_dc_.clone(), this->features_rest_.clone()}, /*dim=*/1);
}
torch::Tensor GaussianModel::getOpacityActivation()
{
// Mapping an unconstrained opacity parameter into (0,1).
return torch::sigmoid(this->opacity_);
}
torch::Tensor GaussianModel::getCovarianceActivation(int scaling_modifier)
{
// build_rotation
auto r = this->rotation_;
auto R = general_utils::build_rotation(r);
// build_scaling_rotation(scaling_modifier * scaling(Activation), rotation(_))
auto s = scaling_modifier * this->getScalingActivation();
auto L = torch::zeros({s.size(0), 3, 3}, torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
L.select(1, 0).select(1, 0).copy_(s.index({torch::indexing::Slice(), 0}));
L.select(1, 1).select(1, 1).copy_(s.index({torch::indexing::Slice(), 1}));
L.select(1, 2).select(1, 2).copy_(s.index({torch::indexing::Slice(), 2}));
L = R.matmul(L); // L = R @ L
// build_covariance_from_scaling_rotation
auto actual_covariance = L.matmul(L.transpose(1, 2));
// strip_symmetric: the 3×3 matrix actual_covariance is symmetric,
// so we only need to keep its unique entries (not all 9 values).
// strip_lowerdiag: drop the strictly lower-triangular entries
(i.e., remove the duplicate lower-triangle), and keep the diagonal upper triangle entries.
auto symm_uncertainty = torch::zeros({actual_covariance.size(0), 6}, torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
// symm_uncertainty is a compact (packed) representation of each Gaussian's 3×3 covariance
// matrix — it stores the six independent entries of the symmetric covariance:
// [cov_xx, cov_xy, cov_xz, cov_yy, cov_yz, cov_zz].
symm_uncertainty.select(1, 0).copy_(actual_covariance.index({torch::indexing::Slice(), 0, 0}));
symm_uncertainty.select(1, 1).copy_(actual_covariance.index({torch::indexing::Slice(), 0, 1}));
symm_uncertainty.select(1, 2).copy_(actual_covariance.index({torch::indexing::Slice(), 0, 2}));
symm_uncertainty.select(1, 3).copy_(actual_covariance.index({torch::indexing::Slice(), 1, 1}));
symm_uncertainty.select(1, 4).copy_(actual_covariance.index({torch::indexing::Slice(), 1, 2}));
symm_uncertainty.select(1, 5).copy_(actual_covariance.index({torch::indexing::Slice(), 2, 2}));
return symm_uncertainty;
}
void GaussianModel::createFromPcd(
// Current cache,the fetched point cloud with color
std::map<point3D_id_t, Point3D> pcd,
// Multiplier adapting the position learning rate to the spatial scale
// of the point cloud/scene.
const float spatial_lr_scale)
{
this->spatial_lr_scale_ = spatial_lr_scale;
// Get the points in the cache point cloud
int num_points = static_cast<int>(pcd.size());
// Created the fused point cloud based on the number of points
torch::Tensor fused_point_cloud = torch::zeros(
{num_points, 3},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
// Create a tensor whose shape is (num_points, 3)
// to store the color of the point cloud, using float data type on the device
torch::Tensor color = torch::zeros(
{num_points, 3},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
// Iterator to fetch the data from the colored point cloud
auto pcd_it = pcd.begin();
// Iterate every point in the colored point cloud
for (int point_idx = 0; point_idx < num_points; ++point_idx) {
// Fetch the point cloud data which pointed by the current iterator
auto& point = (*pcd_it).second;
// Store the point's position and color to two tensors:
// fused_point_cloud and color.
// ++pcd_it; : to make the iterator to point to the next point
fused_point_cloud.index({point_idx, 0}) = point.xyz_(0);
fused_point_cloud.index({point_idx, 1}) = point.xyz_(1);
fused_point_cloud.index({point_idx, 2}) = point.xyz_(2);
color.index({point_idx, 0}) = point.color_(0);
color.index({point_idx, 1}) = point.color_(1);
color.index({point_idx, 2}) = point.color_(2);
++pcd_it;
}
// Convert it to spherical harmonics
torch::Tensor fused_color = sh_utils::RGB2SH(color);
auto temp = this->max_sh_degree_ + 1;
torch::Tensor features = torch::zeros(
{fused_color.size(0), 3, temp * temp},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
features.index(
{torch::indexing::Slice(),
torch::indexing::Slice(0, 3),
0}) = fused_color;
features.index(
{torch::indexing::Slice(),
torch::indexing::Slice(3, features.size(1)),
torch::indexing::Slice(1, features.size(2))}) = 0.0f;
// std::cout << "[Gaussian Model]Number of points at initialization : " << fused_point_cloud.size(0) << std::endl;
// What are they doing here ???
torch::Tensor point_cloud_copy = fused_point_cloud.clone();
torch::Tensor dist2 = torch::clamp_min(distCUDA2(point_cloud_copy), 0.0000001);
torch::Tensor scales = torch::log(torch::sqrt(dist2));
auto scales_ndimension = scales.ndimension();
scales = scales.unsqueeze(scales_ndimension).repeat({1, 3});
torch::Tensor rots = torch::zeros({fused_point_cloud.size(0), 4}, torch::TensorOptions().device(device_type_));
rots.index({torch::indexing::Slice(), 0}) = 1;
torch::Tensor opacities = general_utils::inverse_sigmoid(
0.1f * torch::ones(
{fused_point_cloud.size(0), 1},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_)));
this->exist_since_iter_ = torch::zeros(
{fused_point_cloud.size(0)},
torch::TensorOptions().dtype(torch::kInt32).device(device_type_));
this->xyz_ = fused_point_cloud.requires_grad_();
this->features_dc_ = features.index({torch::indexing::Slice(),
torch::indexing::Slice(),
torch::indexing::Slice(0, 1)})
.transpose(1, 2)
.contiguous()
.requires_grad_();
this->features_rest_ = features.index({torch::indexing::Slice(),
torch::indexing::Slice(),
torch::indexing::Slice(1, features.size(2))})
.transpose(1, 2)
.contiguous()
.requires_grad_();
this->scaling_ = scales.requires_grad_();
this->rotation_ = rots.requires_grad_();
this->opacity_ = opacities.requires_grad_();
// Store all the initialized tensors into a vector
GAUSSIAN_MODEL_TENSORS_TO_VEC
this->max_radii2D_ = torch::zeros({this->getXYZ().size(0)}, torch::TensorOptions().device(device_type_));
}
void GaussianModel::increasePcd(std::vector<float> points, std::vector<float> colors, const int iteration)
{
// auto time1 = std::chrono::steady_clock::now();
// Check the primitive variables match with each other and handle error case
assert(points.size() == colors.size());
assert(points.size() % 3 == 0);
auto num_new_points = static_cast<int>(points.size() / 3);
if (num_new_points == 0)
return;
// Initialize new point cloud and color variables to store data
torch::Tensor new_point_cloud = torch::from_blob(
points.data(), {num_new_points, 3},
torch::TensorOptions().dtype(torch::kFloat32)).to(device_type_);
// torch::zeros({num_new_points, 3}, xyz_.options());
torch::Tensor new_colors = torch::from_blob(
colors.data(), {num_new_points, 3},
torch::TensorOptions().dtype(torch::kFloat32)).to(device_type_);
// torch::zeros({num_new_points, 3}, xyz_.options());
// Initialize or update sparse points
if (sparse_points_xyz_.size(0) == 0) {
sparse_points_xyz_ = new_point_cloud;
sparse_points_color_ = new_colors;
}
else {
sparse_points_xyz_ = torch::cat({sparse_points_xyz_, new_point_cloud}, /*dim=*/0);
sparse_points_color_ = torch::cat({sparse_points_color_, new_colors}, /*dim=*/0);
}
// Convert RGB to SH colors
torch::Tensor new_fused_colors = sh_utils::RGB2SH(new_colors);
auto temp = this->max_sh_degree_ + 1;
torch::Tensor features = torch::zeros(
{new_fused_colors.size(0), 3, temp * temp},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_));
features.index(
{torch::indexing::Slice(),
torch::indexing::Slice(0, 3),
0}) = new_fused_colors;
features.index(
{torch::indexing::Slice(),
torch::indexing::Slice(3, features.size(1)),
torch::indexing::Slice(1, features.size(2))}) = 0.0f;
// std::cout << "[Gaussian Model]Number of points increase : "
// << num_new_points << std::endl;
// Set up scaling and opacity
torch::Tensor dist2 = torch::clamp_min(
distCUDA2(new_point_cloud.clone()), 0.0000001);
torch::Tensor scales = torch::log(torch::sqrt(dist2));
auto scales_ndimension = scales.ndimension();
scales = scales.unsqueeze(scales_ndimension).repeat({1, 3});
torch::Tensor rots = torch::zeros(
{new_point_cloud.size(0), 4},
torch::TensorOptions().device(device_type_));
rots.index({torch::indexing::Slice(), 0}) = 1;
torch::Tensor opacities = general_utils::inverse_sigmoid(
0.1f * torch::ones(
{new_point_cloud.size(0), 1},
torch::TensorOptions().dtype(torch::kFloat).device(device_type_)));
// Set up existing iteration
torch::Tensor new_exist_since_iter = torch::full(
{new_point_cloud.size(0)},
iteration,
torch::TensorOptions().dtype(torch::kInt32).device(device_type_));
auto new_xyz = new_point_cloud;
auto new_features_dc = features.index({torch::indexing::Slice(),
torch::indexing::Slice(),
torch::indexing::Slice(0, 1)})
.transpose(1, 2)
.contiguous();
auto new_features_rest = features.index({torch::indexing::Slice(),
torch::indexing::Slice(),
torch::indexing::Slice(1, features.size(2))})
.transpose(1, 2)
.contiguous();
auto new_opacities = opacities;
auto new_scaling = scales;
auto new_rotation = rots;
// auto time2 = std::chrono::steady_clock::now();
// auto time = std::chrono::duration_cast<std::chrono::milliseconds>(time2-time1).count();
// std::cout << "increasePcd(umap) preparation time: " << time << " ms" <<std::endl;
//For newly added points, adds new Gaussians into the model
// and wires them into the optimizer
densificationPostfix(
new_xyz,
new_features_dc,
new_features_rest,
new_opacities,
new_scaling,
new_rotation,
new_exist_since_iter
);
c10::cuda::CUDACachingAllocator::emptyCache();
// auto time3 = std::chrono::steady_clock::now();
// time = std::chrono::duration_cast<std::chrono::milliseconds>(time3-time2).count();
// std::cout << "increasePcd(umap) postfix time: " << time << " ms" <<std::endl;
}
// This method mainly updates the class member variables xyz_ and scaling_
// It stores the updated tensor into the corresponding parameter groups of the optimizer
void GaussianModel::scaledTransformationPostfix(
torch::Tensor& new_xyz,
torch::Tensor& new_scaling)
{
// param_groups[0] = xyz_, replace the first parameter of the parameter group
// with the new input new_xyz, and save the results into optimizable_xyz
torch::Tensor optimizable_xyz = this->replaceTensorToOptimizer(new_xyz, 0);
// param_groups[4] = scaling_, replace the fifth parameter of the parameter group
// with the new input tensor new_scaling, and save the results into optimizable_scaling
torch::Tensor optimizable_scaling = this->replaceTensorToOptimizer(new_scaling, 4);
// Allocate the replaced optimized tensor to the member variables xyz_ and scaling_。
this->xyz_ = optimizable_xyz;
this->scaling_ = optimizable_scaling;
// Add these tensors into the class member Tensor_vec_xyz_ and Tensor_vec_scaling_
this->Tensor_vec_xyz_ = {this->xyz_};
this->Tensor_vec_scaling_ = {this->scaling_};
}
void GaussianModel::scaledTransformVisiblePointsOfKeyframe(
torch::Tensor& point_not_transformed_flags,//If this point has been transformed
torch::Tensor& diff_pose,//the pose differnce of the point
torch::Tensor& kf_world_view_transform,
torch::Tensor& kf_full_proj_transform,//the projection of the keyframe
const int kf_creation_iter,//the iteration when the keyframe is created
const int stable_num_iter_existence,
int& num_transformed,
const float scale)
{
torch::NoGradGuard no_grad;//During the creation, no gradient calculation allowed
// Get all current Gaussians
torch::Tensor points = this->getXYZ();
torch::Tensor rots = this->getRotationActivation();
// torch::Tensor scales = this->scaling_;// * scale;
// Check if the difference between exist_since_iter_ and kf_creation_iter
// is less than stable_num_iter_existence. If yes, return True. Otherwise return False
// torch::where(condition, x, y) is a function in torch, it select x and y
// according to the condition. If condition is true, select x, otherwise select y.
torch::Tensor point_unstable_flags = torch::where(
torch::abs(this->exist_since_iter_ - kf_creation_iter) < stable_num_iter_existence,
true,
false);
//Scale the point cloud and mark visible points
scaleAndTransformThenMarkVisiblePoints(
points,//All Gaussian points at the moment
rots,
point_not_transformed_flags,
point_unstable_flags,//Mark if this point is stable, the point's appearance and
// the keyframe's appearance should be close
diff_pose,
kf_world_view_transform,
kf_full_proj_transform,
num_transformed,
scale
);
// torch::Tensor point_cloud_copy = points.clone();
// torch::Tensor dist2 = torch::clamp_min(distCUDA2(point_cloud_copy), 0.0000001);
// torch::Tensor scales = torch::log(torch::sqrt(dist2));
// auto scales_ndimension = scales.ndimension();
// scales = scales.unsqueeze(scales_ndimension).repeat({1, 3});
// Postfix
// ==================================
// param_groups[0] = xyz_
// param_groups[1] = feature_dc_
// param_groups[2] = feature_rest_
// param_groups[3] = opacity_
// param_groups[4] = scaling_
// param_groups[5] = rotation_
// ==================================
// After we mark the visible points, we need to replace them
torch::Tensor optimizable_xyz = this->replaceTensorToOptimizer(points, 0);
// torch::Tensor optimizable_scaling = this->replaceTensorToOptimizer(scales, 4);
torch::Tensor optimizable_rots = this->replaceTensorToOptimizer(rots, 5);
this->xyz_ = optimizable_xyz;
// this->scaling_ = optimizable_scaling;
this->rotation_ = optimizable_rots;
this->Tensor_vec_xyz_ = {this->xyz_};
// this->Tensor_vec_scaling_ = {this->scaling_};
this->Tensor_vec_rotation_ = {this->rotation_};
}
// Replace certain indexed parameter in the optimizer and update
// the corresponding state
torch::Tensor GaussianModel::replaceTensorToOptimizer(torch::Tensor& tensor, int tensor_idx)
{
// Get the tensor_idx place parameter's first parameter, store it in param
auto& param = this->optimizer_->param_groups()[tensor_idx].params()[0];
// Get the state dictionary of the optimizer
// Store it in state
auto& state = optimizer_->state();
// Get the key value of the tensor, convert it into string
// Use as the primary key to mark this parameter ???
auto key = c10::guts::to_string(param.unsafeGetTensorImpl());
// Get the state object corresponding to the parameter from the state dictionary
// Assume the object's type is AdamParamState, and store it in stored_state
auto& stored_state = static_cast<torch::optim::AdamParamState&>(*state[key]);
// Create a new parameter state object
auto new_state = std::make_unique<torch::optim::AdamParamState>();
// Set the new state object's step aas the new state's step
new_state->step(stored_state.step());
// Initialize the expoentially weighted average of the new state object
// as a zero value tensor with the same size of the given tensor
new_state->exp_avg(torch::zeros_like(tensor));
// Initialize the squared-expoentially weighted average of the new state object
// as a zero value tensor with the same size of the given tensor
new_state->exp_avg_sq(torch::zeros_like(tensor));
// new_state->max_exp_avg_sq(stored_state.max_exp_avg_sq().clone());
// needed only when options.amsgrad(true), which is false by default
// Remove the previous corresponding state objects from the state dictionary
state.erase(key);
// Replace the parameter with the new tensor, and make sure it needs gradients
param = tensor.requires_grad_();
// Get the new key values from the new tensor can convert it into a string
key = c10::guts::to_string(param.unsafeGetTensorImpl());
// Link the new param state to the new param, and add into the state dictionary
state[key] = std::move(new_state);
auto optimizable_tensors = param;
return optimizable_tensors;
}
GaussianScene::GaussianScene(
GaussianModelParams& args,
int load_iteration, // this is 0 be default
bool shuffle,//this is true by default
std::vector<float> resolution_scales)//This is {1.0f} by default
{
// load_iteration is set as 0, so it is same as not executing the lines below
// it just claims the scene's initialization
if (load_iteration)
{
this->loaded_iter_ = load_iteration;
std::cout << "Loading trained model at iteration " << load_iteration << std::endl;
}
}
//
GaussianScene::getNerfppNorm()
{
std::vector<Eigen::Matrix<float, 3, 1>> cam_centers;
auto kfs = this->getAllKeyframes();
std::size_t n_cams = kfs.size();
cam_centers.reserve(n_cams);
for (auto& kfit : kfs) {
auto pkf = kfit.second;
auto W2C = pkf->getWorld2View2();
auto C2W = W2C.inverse();
auto cam_center = C2W.block<3, 1>(0, 3);
cam_centers.emplace_back(cam_center);
}
// get_center_and_diag(cam_centers)
Eigen::Vector3f avg_cam_center;
avg_cam_center.setZero();
for (const auto& cam_center : cam_centers) {
avg_cam_center.x() += cam_center.x();
avg_cam_center.y() += cam_center.y();
avg_cam_center.z() += cam_center.z();
}
avg_cam_center.x() /= n_cams;
avg_cam_center.y() /= n_cams;
avg_cam_center.z() /= n_cams;
float max_dist = 0.0f; // diagonal
for (std::size_t cam_idx = 0; cam_idx < n_cams; ++cam_idx) {
float dist = (cam_centers[cam_idx] - avg_cam_center).norm();
if (dist > max_dist)
max_dist = dist;
}
float radius = max_dist * 1.1;
Eigen::Vector3f translate = -avg_cam_center;
// It returns a tuple to contain the "boundary" of the scene.
// the radius represents the maximum distance from the average camera center
// to any keyframe camera center, scaled by 1.1.
return std::make_tuple(translate, radius);
}
std::tuple<torch::Tensor, torch::Tensor>
monocularPinholeInactiveGeoDensifyBySearchingNeighborhoodKeypoints(
torch::Tensor& kps_pixel,
torch::Tensor& kps_has3D,
torch::Tensor& kps_point_local,
torch::Tensor& colors,
float max_pixel_dist,
std::vector<float>& intr,
int width)
// Validation: Checks input tensor dimensions
{
if (kps_pixel.ndimension() != 2 || kps_pixel.size(1) != 2)
AT_ERROR("kps_pixel must have dimensions (num_points, 2)");
if (kps_has3D.ndimension() != 1)
AT_ERROR("kps_has3D must have dimensions (num_points)");
if (kps_point_local.ndimension() != 2 || kps_point_local.size(1) != 3)
AT_ERROR("kps_point_local must have dimensions (num_points, 3)");
int N = kps_pixel.size(0);
torch::Tensor result_pt, result_color;
if (N != 0) {
result_pt = torch::zeros_like(kps_point_local);
result_color = torch::zeros_like(kps_point_local);
float fx = intr[0];
float fy = intr[1];
float cx = intr[2];
float cy = intr[3];
// For keypoints with 3D data: Directly copies their 3D positions and colors
// For keypoints without 3D data:
// Searches all other keypoints to find the nearest neighbor (in pixel space)
// that has 3D information
// Uses that neighbor's depth to estimate depth for the current keypoint
// Reprojects the 2D pixel + estimated depth to 3D space using pinhole camera model
search_neighborhood_to_estimate_depth_and_reproject_pinhole<<<(N + 255) / 256, 256>>>(
N, width, fx, fy, cx, cy, max_pixel_dist,
kps_pixel.contiguous().data_ptr<float>(),
kps_has3D.contiguous().data_ptr<bool>(),
kps_point_local.contiguous().data_ptr<float>(),
colors.contiguous().data_ptr<float>(),
result_pt.contiguous().data_ptr<float>(),
result_color.contiguous().data_ptr<float>());
// Filtering: Removes keypoints where depth estimation failed (depth ≤ 0)
torch::Tensor depth_valid_flags = torch::where(result_pt.index({torch::indexing::Slice(), 2}) > 0.0f, true, false);
result_pt = result_pt.index({depth_valid_flags});
result_color = result_color.index({depth_valid_flags});
}
return std::make_tuple(result_pt, result_color);
}