Project 1 Webpage
Text giving a brief overview of the project:
At a high level, the project is essentially a way to be able to reproduce images from a certain gallery. We do this by getting the digitized Prokudin-Gorskii glass plate images to produce a color image with as few visual artifacts as possible. We do this by meshing together the three different channels of light (R, G, and B) to form a cohesive image.
Test Describing your approach:
My approach was relatively simple:
The starter code already did a lot of legwork in terms of loading and displaying the images so my details will focus on how I aligned the three channels.
After getting the three channels (r, g, b), I needed to align them. a function with the name align was already called without being defined. As such I made a function named align that essentially takes in two color channels and aligns them such that their ssd’s are minimized.
The basic approach to this function is to overlay two channels of an image together and shift the top overlay just a bit to find where the two overlays match up. We calculate a score of “matching up” called an ssd. We want to iterate through many possible “shifts” and find the shift that gives us the best possible ssd (we want to minimize ssd - think of it as a loss function). However, we cannot check too many shifts.
To get around this problem, I made use of the sk.transform.rescale function. If the image is too large, then I downsampled it and then calculated the best possible shifts in the lower pixelated space, and then upscaled it back.
This approach is useful as Lets say I have two images one of 10x10 and 1000x1000 pixels. I would need to shift by only 1 pixel to cover 1% of all possible test cases whereas I would need to shift by 100 pixels total to cover 1% of all possible shifts in the bigger photo, especially when many of these are very miniscully different from each other.
What this would look like in practice is if I would first check shifts in multiples of 16 and then multiples of 8 then 4 then 2 then 1. (Similar to lowering the learning rate as you get closer and closer to optimum in gradient decent).
I chose a cutoff pixel size of 500 - meaning any axis of more than 500 pixels gets downsized, recalculated, then upsized. If the image was less than 500x500 pixels then I checked shifts in x and 30 shifts in y. (This was pretty arbitrary).
Otherwise, I only checked 8. (8 was chosen as its ~30 / 4 since I am dividing number of pixels by 2 in either direction - arbitrary and seemed to work pretty fast)
For the actual calculation, I naively calculated the current ssd using np operations.
If you ran into problems on images, describe how you tried to solve them:
I ran into a problem that was really apparent in my emir.tif image. I was not cropping my image which resulted in pretty terrible overlays such as the one below.
To get around this I simply chopped off a little bit on each side of the image (about 5%)
The result of your algorithm on all of our example images.
Cathedral.jpg:
church.tif
emir.tif
harvesters.tif
icon.tif
lady.tif
melons.tif
monastery.jpg
onion_church.tif
sculpture.tif
self_portrait.tif
three_generations.tif
tobolsk.jpg
train.tif
The result of your algorithm on a few examples of your own choosing
Zakat
[At the 5th water supply control, irrigation canal (aryk) in the Murgab Estate]
[Locomotive and coal car at a railroad yard]
If your algorithm failed to align any image, provide a brief explanation of why
N/A
Describe any bells and whistles you implemented. For maximum credit, show before and after images.
Automatic cropping. Remove white, black or other color borders. Don't just crop a predefined margin off of each side -- actually try to detect the borders or the edge between the border and the image.
The way I implemented this was to see how much I was wrapping around on the other side using np.roll and then chunking off the excess. This was done in a fashion that is not predetermined and made for much cleaner pictures.
The code is below:
Before:
After:
Automatic contrasting. It is usually safe to rescale image intensities such that the darkest pixel is zero (on its darkest color channel) and the brightest pixel is 1 (on its brightest color channel). More drastic or non-linear mappings may improve perceived image quality.
To implement this, I messed around on desmos to make a function that essentially dramatizes the values of rgb with the center point at 0.5 in the bounds. This was the function I came up with.
After implementing this in python:
The contrast factor can be anything but I chose 2 as it was the first number that made the contrast apparent.
Before:
After: