Journal: When I re-implimented the D-Map Paper

This is not the sharp blog, rather you can check it out from this link Rather I just want to post my thought process and how I logged my whole situation and journey when I worked on the project!

I set up webots world, arena, etc.
I imported the Pioneer 3-DX bot, and on it I add 180 degrees LiDAR SickLMS291
Webot sends the data in list format of size 180 with distance of the item for that angle as the index
index 0 -> left, index 89 -> angle 90, index 179 -> right
I Put the angle equation rad = pi * (1 - theta/180) i base this on my view of unit circle, left is pi and right is 0
I calculate the (x,y) position
I define a grid of $n\times n$ , and normalise the (x & y from 0 to n)
the scale of each cell is, cell_size = mapping distance / n so if i take my max range as 20 meters and select n as 10 then each cell size is 20 / 10 = 2 units
i put the bot in center below then range of x is -n/2 to n/2 and y is 0
python nested lists consequently matrices are like grid of images where top-left corner is origin, we want it to follow Cartesian grid
also remember y -> row, x -> column
Now we implement Depth-Map
Ray Casting: Imagine throwing light with flashlight and if it gets casted on something there is an object, if it doesn’t it runs off to space
Depth-Image: It is like looking through a slit in a paper and if light comes through (White) means there is nothing, and if there is shadow(blackish) there must be item. BUT IN REVERSE, You throw the light through through the slit, you are the bot.
So in each snapshot of the LiDAR loop, then we standardize the array so the gradient explains if the object is far or close. This exactly is the Depth-Image
Now we need to map the data of the depth-image to the occupancy grid we made earlier to track which angle have an object and which does not.
I add the depth image generation along with the real-time occupancy grid side-by-side.
I created a function which taking the cell indexes can give me the angle of that cell from the robot
Now what I technically do is, assume I am the robot, after waking up first take a image of distances (my depth image) from right to left. Then I take chess board like map to study it, hmm this cell of the map is where exactly in the map, how far from me and which angle to turn my head?
So I also define a reverse function which from seeing the cell can tell me which angle of the LiDAR it is.
Then I compare that distance with my depth sensor -> Could I see the place directly? (free space) or there was an object before i could see (the LiDAR was obstructed by an object) (unknown space)
I learn how to create a 1D segmented tree of Range Minimum query without any prior knowledge of binary trees totally from scratch
Then I write the query for range function on the tree
So the need is, previously with normal indexing if we fetch for one angle index it is $O(1)$ but the moment we search for ranges say depth_map[45:50] then complexity is $O(n)$ where $n = 5$ .
But with segment tree the query complexity comes down to $O(\log n)$ where $n = 180$ and is fixed for however large the queries are
quick note: for getting single angle values still old method is better because of $O(1)$ complexity.
Now we need to create an efficient storage for our occupancy map!
The size of our current map is $n\times n$ that if we do any measurement the complexity is $O(n)$ which is absolutely too much high.

occupancy_grid = np.zeros((n, n))

for row in range(n):
	for col in range(n):
		pass

We store the occupancy of which cell is covered by an object with [[Skill Stacking/DSA/OCTREE|OCTREE]]
But in our octree we do not need to place the boxes rather we need to see if the space is UNDETERMINED or not. If it is, we subdivide to check it more properly.
If our robot position is X, Y then, for our octree, center: (X, max_dist) half: (max_dist / 2, max_dist / 2)
Whenever we get a new LiDAR range by the LiDAR in each time step, we need to query through the tree to see which cells are NEAR to us and if that is still UNDETERMINED we break it up more!
To query that^ we make a new process.

[!defn] Algorithm 1: Occupancy State Determination

Input: Cell center, size, LiDAR pose, segment tree

Compare cell’s depth range with depth images’s min-max distances

Output: free/occupied, blocked or Undetermined space

After introducing OctreeMap for occupancy grid and Segment_Tree for Depth Image - Our computational cost goes down very low for large LiDAR data.
In my classifier I encountered that minimum distance is always 0 (no hit) so it was always ‘UNKNOWN’ and never going to ‘KNOWN’ or ‘UNDETERMINED’ so the node split never worked, so I implemented non-zero minimum

if max distance 0 -> UNKNOWN (nothing there) if max < cell’s near side -> UNKNOWN (blocked) if min > cell’s far side -> KNOWN (free) if else -> UNDETERMINED

where max = far side of cell, min = near side of cell 35. In human analogy, I always took darkness as something is present around me always! 36. But there is one issue, it never stopped sub-dividing because no node was ever marked ‘KNOWN’ 37. If all the LiDAR hits are farther than the cell’s far edge -> the cell is empty -> I mark it known

[!defn] Algorithm 2: On-Tree Update

Traverse octree from root to leaves

For each node: apply Algorithm 1

Action: Remove KNOWN cells, keep UNKNOWN cells, split UNDETERMINED cells

recursive process

I am storing the occupied cells in the segmentation tree for current frame.
Need to figure out what happens to LiDAR hits from previous frames.
I need Octree to track only UNKNOWN spaces
I need Hash-grid to store only occupied spaces
Because in every new frame we are forgetting the previous hits and where the objects was. When I move my head I do not forget that wall existed in my right or left.
Right now I have nxn occupancy grid which is large. Most cells are empty, but we store all of them in memory. -> Waste of space!
When we loop through the grid, we check every cell even empty ones -> waste of computation!
So we will use Hash-Pixel data structure, Hash: Sparse Storage Pixel: 2D Grid elements It basically means, “Using hash tables to efficiently store 2D grid data by only keeping the non-empty cells”
So we implement a simple hash key "col,row" in string
Whenever a new object is seen for first hit, it stores the data in the Hash-pixel.
Then iterates the Hash-pixel for the seen object so in new time-steps the objects are not forgotten
It is clearly seen that even if I move the object, the marker stays in the occupancy grid, the hash grid stores a permanent record of all LiDAR hits. (We assume static environment)
Now since the known cells are mapped, we can DELETE them from the octree.
I introduce an recursive update method to remove all the 'KNOWN' nodes because octree keep tracks of fully known nodes
If you see octree node have fallen from 553 -> 480 i.e 73 nodes have been marked as 'KNOWN'
Since octree does not abruptly dissect into $\frac{n^{d+1}-1}{n-1}$ where d is depth = 6 in my code, and n = 4 in 2D, and n = 8 in 3D, octree/quad-tree only marks the node where it surely got a hit as KNOWN or UNKNOWN
73 nodes in my example was removed making the Tree better space optimised which can be extended to larger depths and max_dist for production.
In the above image the grey area are all free space (air) which we get rid of in the right image from the tree because of KNOWN
And the objects are placed by either marking them 'UNKNWON' or through HashPixel.
For quick prototyping I did LiDAR2World but, for benchmark I compared my DMap algorithm with traditional Raycasting using Bresenham Line

You can see for $N = 20$ , i.e. for very small grid size DMap is overkill and Ray Casting is better.
For $N=100$ , DMap and Raycasting both has a very small margin of difference in their time complexity.
For very large N, like $N = 1000$ , clearly DMap wins by a BIG margin!

The memory complexity is as well what we thought it’d be,
Also the accuracy is very well even though in simulation there was a lot of free space