May 4, 2026•project • completed

Crowd Predictor

Take pictures or videos of a crowd of people to get an accurate reading on the number of them!

crowdp2pnetransacpoint cloud

Sprint Challenge

This project was built during Sprint 1 of 8 (May 1, 2026 — May 21, 2026) as part of my challenge to launch 8 products in 6 months.

See all sprints

TLDR

I've made a tool to reconstruct crowds in 3D space from a single picture or video, so I can make much better and faster estimate for crowd size over traditional methods. (Mapchecking, Reconstruction with sampling)
I've combined a depth model with a model for counting people's heads (P2Pnet). Thus a 3d point cloud of "people" can be reconstructed. Based on this occluded people that aren't actually visible on the picture can be guessed and the user can use an editor to edit these guesses manually.

Try for yourself here: https://crowdcounting.net

Motivation

The "art" of politics is about subtly (or not so subtly) bending facts to make your side look good while bringing the other side down.

In the election campaign of Hungary this spring, the main form of this was politicians of the two leading parties comparing the attendance of their rallies through misleading Facebook posts like this:

Since you can make a weak argument that comparisons like this are factual - "They are taken in the same city after all", it's enough to reassure hardcore supporters. Being shown posts like this constantly really polarizes people, contributing to annoyance and hostility in the air.

But this made me think -- there is no easy way to give a ballpark estimate of a crowd's size based on a single video or picture. So I created one.

Execution

My north star idea was to model the crowd in 3D, so I can use real crowd characteristics and behavior to refine a basic computer estimate. This includes stuff like predicting occluded people and detecting probable outliers that aren't part of the main crowd.

I knew about a CNN model called P2PNet, made to count people's heads on an image. The minimal implementation is to combine this with a depth model returning real world coordinates (like Apple Depth Pro) to get a 3D point cloud of people. After this using RANSAC determine the ground plane people are standing on to know "which way is up".

There are two problems with this.

1. Depth models are usually trained to work on close objects (e.g. room, desk) and in this case all tested models are wrong by orders of magnitude (10-15x smaller, thus denser crowds)

Fix:
The real-world max. crowd density is about 6-7 people/m2. We need to re-scale so points respect this. For this I've created a custom DBSCAN-like algorithm calculating the 7th closest neighbor for each point and scaling based on that.

2. If the camera angle is low, people occlude eachother. This often leads to the back of the image being detected as more sparse as it is visually on the image itself.

Fix:
First I cluster the crowd to determine outliers and calculate a polygon describing the crowd's outline, and holes.
After that I calculate the number of neighboring people for each person in a PILL-shaped area determined by the shadow length the person casts on the ground plane. From this we know a heatmap of densities. Populate missing areas using Poission sampling and mask using the calculated polyline!

This is the core algorithm I've created. Since I want to prepare for processing video, I've optimized the execution time of each frame from 10-12 seconds down to about 4-6 by not saving the point cloud for example, and downsampling the cluster polylines so that Poisson sampling takes less time.

Since I'm planning to publish this product for free, and host via. an external provider, this also meant reducing the cost of each request from about 0.3$ to 0.02$. This theoretically means the product is scaleable, but the cost is still really high, and may warrant training a custom AI model, since P2PNet inference takes up majority of the compute.

If turned into a product this would also work as a standalone API layer.

Presentation

My best guess for a target audience for this is journalists, political scientists, and small-time event planners. For all of these user groups I wanted to create an editor they can use to edit as well as visualize the backend response, the core prediction.

Currently the user can add and delete zones (obstacles and crowd clusters) the backend guessed, as well as having the ability to export a defensible prediction in the form of a .pdf file.

I can take this potential further to do more complex stuff like polyline editing similar to Mapchecking and more numerous export options.

Takeaways during cycle 1

Here are some important takeaways during development

You are going to run into 1000 problems and that's OK

If you start something similar, prepare to throw away DAYS to stupid stuff. Here is my cherry pick for this cycle:

- Memory leaking docker container killing my computer wiping out hours of progress
- Working on a feature for days I end up needing to cut
- Modal SUCKING all my credits randomly one night due to a wrong file
- Blog not being indexed leaving no option but to rewrite my ENTIRE WEBSITE in Next.js

Or more personal stuff like:

- A stomach flu destroying my gym schedule I was consistent in since January
- A crowded AF supermarket I just can't find an oddly specific spice in draining me for a day
- A night of drinking derailing my morning routine
- Being constantly annoyed at myself for wasting time on the above mentioned

DON'T expect smooth sailing. BUT you are sailing! You are going to burn out, you will have TERRIBLE days as well. You don't see X or LinkedIn posts like this don't you?

Answer "Who is this for?", "Who does this help?" RIGHT AFTER getting the IDEA.

PLAN FIRST DAMN IT. I got sidetracked - I had a cool idea, and I was focusing on refining it and making it even cooler instead of answering questions like: "How would people actually use this?" and "How would this make money down the line?"

Focus on fast-burning social media channels with tailored posts to market

If you haven't posted before, like me - Social media algorithms are built extremely different from eachother. Market using Reddit and X primarily since they have algorithms focused on the "hero story" and ignore slow-burning ones like Youtube for promoting builds directly.

Dare reach out to people

You are surrounded with unique people you can get feedback from - a coworker, a family member, a talented friend you recall from high school. You don't need to be in a builderspace and it's just another office, if you don't interact with anybody!

Make it public that you are creating something and reach out to anyone you think may help you. I know that everyone says this, but what's the worst that can happen?

Here is me demoing this project in mesh. for other members to keep me accountable:

This is the story of Sprint 1 out of 8. Sprint 2 first drops May 22. Subscribe here to follow along — I send 2–3 emails per sprint, no more.