Using public GitHub code to find local talent (I'm coining it *repo hunting*)
My ten-minute impression of LinkedIn recruitment tools was that they are expensive and limited by the nature of the platform. They rely on users to self-declare their skills and experience.
It is easy to list SQL, Python, dashboards and analytics experience. It is much harder to see what that actually looks like in practice.
For early-career folks in particular, profiles can end up looking very similar. This is not a criticism. Most are doing their best to present themselves clearly and competitively, and that takes effort.
The challenge is that the format naturally compresses everyone into the same set of signals.
This makes it difficult to distinguish between those who are actively building and applying their skills and those who simply describe them well.
For technical roles, there is a more direct option.
You can look at the code.
Instead of relying on profiles, you can assess people based on their actual projects and repositories.
Rather than waiting for applications, you start from public work and trace it back to the person behind it.
This shows people in a more natural working state. Not a polished profile, but how they actually write queries, structure scripts, and organise their work.
It also surfaces who is spending time building outside of work or study: small projects, half-finished ideas, incremental commits.
This is a small proof-of-concept pipeline that pulls public GitHub activity, extracts real code, and uses an LLM to assess and classify a person’s technical and analytical strengths.
The goal is not to automate hiring. It is a way to build a better shortlist using actual work rather than self-description.
Step 1: Find Melbourne / Victoria GitHub profiles
The first script just brute-forces GitHub user search with location queries like
- Melbourne
- Melbourne Australia
- Melbourne VIC
etc
Filtered to accounts created after 1 Jan 2023 so it leans towards newer / early-career profiles.
For each user it pulls:
- profile
- repos
- recent activity (last ~6 months)
Step 2: Use an LLM to assess code in users’ repositories
Once you have a small packet of recent SQL and Python code, you can actually evaluate what someone can do.
Each person gets a bundled “code packet” (recent .sql and .py files pulled from their commits), which is sent to an LLM with a constrained prompt.
The goal is not “is this person good”.
It is much narrower:
- can they write SQL
- can they write Python
- do they structure queries and logic properly
- is there any evidence of analytical thinking
The prompt is intentionally strict:
You are reviewing public GitHub code for evidence of early-career analytics skill.
Assess:
- SQL skill
- Python skill
- SQL structure and readability
- SQL/Python interoperability
- Segmentation / analytical thinking
Rules:
- Use only the provided code
- Be conservative
- Cite actual file/path evidence
- Return strict JSON only
Step 3: Review the results
The LLM returns a structured response for each person, which is flattened into a table.
Each row represents a GitHub profile, with fields such as SQL skill, Python skill, overall score, and a short evidence summary tied back to specific files.
At that point it becomes a filtering exercise. You can scan the table, shortlist profiles that show consistent signals, and click through to the underlying repositories if needed.
Most profiles include a link to LinkedIn or another contact method.
From there, it is just deciding who to contact.
You can view a small interactive sample of the output here. The usernames have been replaced for anonymity:
The scripts used for this are available here: