Photo-SLAM code brief
Translation based on https://github.com/KwanWaiPang/Photo-SLAM_comment/
Photo-SLAM ├── cfg # This folder includes the config files for different settings ├── cuda_rasterizer # This folder includes the rasterization module from the original 3DGS ├── examples # This folder is the demo entry point of SLAM system It is used for reading the data, creating pointers and mapping ├── include # Head files ├── ORB-SLAM3 ├── scripts # Scripts to test running and evaluation ├── src # 3D Gaussian related scripts ├── third_party # Relied packages: colmap, simple-knn(from 3D Gaussian) and tinyply └── viewer # Thread of visualizer
Main Loop
WLOG, let us start with example/tum_rgbd.cpp:
It includes functions main, LoadImages(to read images), saveTrackingTime (to save trajectory) and saveGpuPeakMemoryUsage (to save the peak of VRAM usage).
What does function main do? It checks the input parameters, input directories and load the images. The most important is to establish the SLAM thread pSLAM , the 3D gaussian mapping thread pGausMapper and Gaussian viewer thread pViewer . pSLAM works as the input parameter of pGausMapper , connecting ORB-SLAM and 3D Gaussian mapping.
Specifically, the code did the following stuff:
Check the parameter and set the output directory.
Check the input image directory and call
LoadImagesto read the images.
LoadImages(strAssociationFilename, vstrImageFilenamesRGB, vstrImageFilenamesD, vTimestamps);Assure the amounts of depth images and RDB images are the same. Check to use CPU or GPU.
After everything is ready, we can start to create the SLAM system. The following operations are: 1. to create SLAM system. 2. to create 3D Gaussian mapping system and 3. to create 3D Gaussian viewer.
// Create SLAM system. It initializes all system threads and gets ready to process frames.
// Create the pointer pSLAM to point to the SLAM system
std::shared_ptr<ORB_SLAM3::System> pSLAM =
std::make_shared<ORB_SLAM3::System>(
argv[1], argv[2], ORB_SLAM3::System::RGBD);
float imageScale = pSLAM->GetImageScale();
// Create GaussianMapper
// Create the pointer pGuasMapper to point to the 3D Gaussian Mapper,
// the input parameter is the SLAM system, configs for 3D Gaussian
// and output directory
std::filesystem::path gaussian_cfg_path(argv[3]);
std::shared_ptr<GaussianMapper> pGausMapper =
std::make_shared<GaussianMapper>(
pSLAM, gaussian_cfg_path, output_dir, 0, device_type);
std::thread training_thd(&GaussianMapper::run, pGausMapper.get());
// Create Gaussian Viewer
std::thread viewer_thd;
std::shared_ptr<ImGuiViewer> pViewer;
// If the GUI config is true, we start the viewer thread and
// pass SLAM system pointer and 3D Gaussian Mapper pointer as input
if (use_viewer)
{
pViewer = std::make_shared<ImGuiViewer>(pSLAM, pGausMapper);
viewer_thd = std::thread(&ImGuiViewer::run, pViewer.get());
}Then we output the system information. Load the RGB and depth information and check if they are valid. Scale RGB and depth image according to
imageScale. Timer (chronois the new timing method in C++ 11) Import RGB, depth image and frame time intopSLAMsystem.
After the main loop ends, close the SLAM system and visualization system, output the GPU peak usage and tracking information. End the main thread.
TrackRGBD is the ORB tracking. And we integrate orb tracking&mapping and 3D Gaussian mapping through:
Then what does GaussianMapper do?
src/gaussian_mapper.cpp
This class include many functions but the main content is the constructor, run thread, Gaussian training and keyframe handler.
The constructor doesn't only handle the input parameters but also operate on its private variables. Set up the running device and camera model, initialize 3D Gaussian scene and set up the sensor types.
The constructor of GaussianMapper
void run()
void run() void run() is the main process. It reads the camera pose and point cloud, prepares the images of multiple resolutions and calls the train function trainForOneIteration() .
void trainForOneIteration()
void trainForOneIteration() void trainForOneIteration() is the main training code, involves iterations management, Gaussian pyramid, rendering and calculating loss, saving and logging.
void combineMappingOperations()
void combineMappingOperations()void combineMappingOperations() is to combine the mappers, using the functions of orb-slam3 to do BA and loop closure.
void GaussianMapper::handleNewKeyframe
void GaussianMapper::handleNewKeyframevoid GaussianMapper::handleNewKeyframe sets up the pose, camera, image, auxiliary image of the new keyframe; puts the new keyframe into the scene. It gives the used time of the keyframe and put it into the training sliding window,
Other functions
bool GaussianMapper::isStopped()
bool GaussianMapper::isStopped()bool GaussianMapper::isStopped() returns the private variable stopped_ of the class objectGaussianMapper, at the same time it use the mutual lock to ensure the visit to this variable is thread-safe in the multi-thread environment.
void trainColmap()
void trainColmap() void trainColmap() is for colmap training example only. Read the point cloud, train 3D Gaussians and save, it is similar to the run function.
Renderings
There are mainly three rendering functions to render rgb, depth, loss and to output the evaluation metrics: void recordKeyframeRendered(/*param*/), void renderAndRecordKeyframe(/*param*/) and void renderAndRecordAllKeyframes(/*param*/).
Loader
void loadPly(/*param*/) not only loads the point cloud, but also load the camera intrinsics to undistort the images and reset the size of the image.
Densification based on inactive points
void increasePcdByKeyframeInactiveGeoDensify(/*param*/) densifies the point cloud based on inactive points according to different types of sensors.
Head File
This is a main class. According to the head file, the main member variables include:
torch::DeviceType device_type_: type of device
The current degree and maximum degred of sperical harmonic: int active_sh_degree_ and int max_sh_degree_ .
The parameters of 3D Gaussians:
torch::Tensor xyz_: position
torch::Tensor features_dc_: direct component, aka, the inherent color of the Gaussian when SH=0
torch::Tensor features_rest_: remaining components, the remaining spherical harmonics coefficients for SH≥1.
torch:: Tensor opacity_: opacity alpha
torch::Tensor scaling_ : scaling factors
torch::Tensor rotation_ : rotation factors
torch::Tensor xyz_gradient_accum_ : accumulated gradient in the Gaussians
torch::Tensor denom_ : denominator tensor used for computing the average gradient accumulation for each Gaussian point during densification
torch::Tensor exist_since_iter_ : The iteration when this Gaussian is added to the map
Optimizer: std::shared_ptr <torch::optim::Adam> optimizer_
Position and color of the sparse point cloud: torch::Tensor sparse_points_xyz and torch::Tensor sparse_points_color .
Two macros:
# define GAUSSIAN_MODEL_TENSORS_TO_VEC : to store the group of tensors into a vector (data type: std::vector<torch::Tensor>)
#define GAUSSIAN_MODEL_INIT_TENSORS(device_type) : to initialize multiple tensors and place them to the specific device. This macro accpets one parameter device_type for specific device.
The constructor GaussianModel::GaussianModel(). The input is sh_degree and model parameters.
torch::Tensor getCovarianceActivation(int scaling_modifier = 1) : Calculate the covariance matrix from the scaling and rotation, at the same time outputs symmetric uncertainty (NICE!)
void createFromPcd: Initialize the Gaussians from the point clouds. Create, fill and fuse the point cloud tensor and color tensor, deal with the color with SH degree. Calculate other properties of the point cloud.
void increasePcd: Add the new point cloud data (with color) into the existing Gaussian model.
void trainingSetup(/*param*/): set up the paramters of adam optimizer and learning rate.
float updateLearningRate(int step): Update the learning rate based on the step, call the pre-defined exponLrFunc function to get a continous leanring rate decrease function.
The following functions handle the densification, clone, pruning and split of Gaussians:
void densifyAndSplit, void densifyAndClone, void densifyAndPrune .
void densificationPostfix() extend s the newly created Gaussians (already densified) into the existing Gaussian model.
void scaledTransformVisiblePointsOfKeyframe() prepares and marks the model's Gaussian 3D points that are visible from a given keyframe, applying a uniform scale and view/projection transforms, then registers the transformed point and rotation tensors for optimization.
The Constructor(s)
There are two constructors in gaussian_model.cpp. The default parameterized constructor takes a single int parameter (sh_degree) as the input, allowing simple initialization with just the spherical harmonics degree. And the parameter-based constructor takes a const GaussianModelParams& parameter (a configuration struct). It provides more flexibility by accepting a full parameter object
get Functions
get FunctionsCreating Gaussians and adding new Gaussians from point cloud
There are two methods related to point cloud: void GaussianModel::createFromPcd to create Gaussians from the point cloud. void GaussianModel::increasePcd to add new Gaussians into existing Gaussian model from new points from point cloud.
There are two overloaded versions of void GaussianModel::increasePcd , one is void GaussianModel::increasePcd(std::vector<float> points, std::vector<float> colors, const int iteration) to deal with C++ vectors as input, the other is void GaussianModel::increasePcd(torch::Tensor& new_point_cloud, torch:: Tensor& new_colors, const int iteration) to deal with torch tensor as input. We use the former as an example.
Selected transformation methods
Training related methods
Head File
GaussianScene::GaussianScene(/*param*/) is the constructor. It can read the Gaussian scene already trained.
void GaussianScene::addCamera(/*param*/) contains a bunch of get/set methods, to get the camera, keyframes and 3D points`.
void GaussianScne::applyScaledTransformation(/*param*/) applies a uniform scale to every keyframe's translation and then applies a rigid transform to each keyframe pose. It updates each GaussianKeyframe's stored pose and recomputes its transform tensors.
The net effect is a similarity transform (scale + rigid transform) applied to all keyframe positions and poses. Rotation comes only from the rigid transformation and the original pose; the scale is applied only to translations.
Utility functions for Gaussians
void GaussianTrainer:trainingOnce(): Train the Gaussians once, intialize the iterations, read the training options, set up the background color
void GaussianTrainer::trainingReport(): Input and Output
We definitely shall not forget our favorite keyframing part of a SLAM system :)
The headfile mainly defines the camera id, parameters, size of the images and gaussian pyramid, as well as the original image. The main functions in the cpp file includes:
GaussianKeyframe::getProjectionMatrix(): to get the projection matrix
GaussianKeyframe::getWorld2View2(): the transformation matrix from world to camera frame
void computeTransformTensors(): Use the two functions above to calculate the frame trasformation: world frame -> camera frame -> pixel frame.
int getCurretGausPyramidLevel(): get the current layer of the Gaussian pyramid
gaussian_rasterizer mainly includes
GaussianRasterizer::markVisibleGaussians : the selection of visibile Gaussians
GaussianRasterizerFunction::forward: forwarding
GaussianRasterizerFunction::backward : backword propagation
It only has one function GaussianRenderer::render, as a wrapper of the forward function above but it handles the calculation of covariance and spherical harmonic degree. It is similar to the gaussian_render/__init__.py.
Last updated