Preamble
Lately I've been contemplating a project which I'm dubbing Project Clearly to act as a focal point for a collection of efforts to sharpen my skills. I sat down and thought about what to build - sometimes it's not about pure novelty, but about an ensemble of efforts. Consider this project as a collage of sorts - development of pieces each containing value, largely open-sourced - that may or may not ultimately culminate in a synthesis greater than the parts. Maybe you'll even find a part that's valuable for something you build in the future!
The Idea
What technologist hasn’t been involved in helping a family member – often an elderly family member – with IT projects of some sort? One common occurrence is that of helping a family member to work with large collections of data accumulated over the years. Often past data has simply been allowed to stagnate – and in some cases, this is fine. However, in some cases, there is a loss when past data can’t be properly categorized.
An enduring family tradition is the photo album. In the physical world, photos had a natural selective pressure of development cost to limit the number taken, and the arrival in the mail formed a natural point to reflect and discard poor quality photos and place those that were of interest into a photo album. The digital workflow on the other hand has fewer natural pressures or points of reflection. Typical experiences offload the SD card irregularly, and often the card is only discovered to be full at an inopportune time – leading to offloading or deleting in haste, or perhaps simply purchasing a new card. And yet despite the often poor administration of photo collections, many family members would definitely place value on a well-curated photo collection.
With advances in AI, the ability to handle large numbers of photos through automatic tagging has become a reality. Additionally, with the advances of AI at the edge as well as general computing capabilities of computers, even advanced tagging can be done on low-spec home computers.
Who?
A potential demographic:
- Tech savvy family members helping less tech-savvy member tame their photo mess – but often with limited time and looking for out-of-the-box solutions that are easy to use
- Somewhat tech-savvy middle-aged to senior computer users seeking to tame a photo mess
Based on the target audience, an emphasis on data privacy is paramount. Many may be suspicious of “AI” and “cloud” technologies.
Windows 10 or 11 are the most likely platforms to hit for this demographic, with low to middle performance.
Common Challenges
The following should sound familiar to most encountering a “photo mess”:
- Large number of files – often the number of photos is prolific
- Scattered locations or inconsistent folder structure – additionally, partial organization may be present in with some folder structure, but the folder structure may have been started late or abandoned
- Searching – finding by date is often the easiest, but even the built-in ability to find by date across subfolders is often lacking.
- No tags – if the folder structure is not cared for, it is rare that tags are employed
- Duplicates and near duplicates – inconsistent copying strategies can also lead to this
Machine Learning and Algorithmic Advances
There has been an explosion of machine learning advances – particularly since the arrival of AlexNet in 2012, many new options are available and are becoming commoditized at different rates. Below are several possibilities that could make sense when trying to grapple with large photo collections:
- Object detection – while image classification may be sufficient for many use cases, object detection provides many more advantages in photo organization by providing the ability to think of photos as groups of objects
- YOLO series, in particular YOLO-World – YOLO provides object detection, and the recent YOLO-World (Feb 2024) provides a new advantage – an “open vocabulary”. Unlike previous YOLO series, YOLO-World allows for tagging nearly arbitrary objects.
- Face identification – several options are available, with different computing costs. General strategy of these is to produce an “embedding” (set of numbers) that describes a face, then to compare how close the embedding is between two faces to determine similarity.
- ArcFace – a modern and high accuracy approach. DeepFace variant here: https://github.com/serengil/deepface
- VGGFace2 – a highly mature option
- Automatic photo captioning – the goal here is different than object detection, but could potentially lead to similar outcomes. The most prevalent options here are based on CLIP.
- OpenCLIP - https://github.com/mlfoundations/open_clip
- Geolocation – many images are not tagged with location. Location can be useful for such things as determining if photos were part of a trip, etc. Options here are a bit more limited than other areas but still available
- GeoClip – currently #1 on the “Paper with Code” https://github.com/VicenteVivan/geo-clip
- Perceptual hashing – for photo similarity detection
- pHash – one such approach https://www.phash.org/
Organizational Feature Possibilities
Sitting at the junction of the helpful and the technically possible fall several possible features:
- Automatic tagging of objects in photos
- Using YOLO-World and a list of photo-oriented keywords
- Identification of similar faces
- Either use YOLO-World with “face” tag and feed into face identification models, or in some cases let the face identification model directly extract the face
- Categorization of potentially sensitive document photos
- If a photo contains a “document” or “receipt” detected object taking up most of the photo with high confidence, this can be placed in a special category
- Trip detection
- Use temporal information along with estimated (or tagged) geolocation allows for an approximation of trip detection
- Similar photo grouping
- Using temporal and object detection similarity, grouping of similar photos is possible
- Identical photo removal
- Identical photo files can be removed by a simple file hash-based approach
- Sensible folder structure
- Using the above features, a sensible folder structure can be created that will help photo organization even if the application is no longer desired to be used
What Next?
There's a lot to play around with in this space. Taking a peek at some of the different machine learning models and determining paths to viable integration is next up.
Comments