Scott Horvath's Weblog

3-min read

Using public GitHub code to find local talent (I'm coining it *repo hunting*)

My ten-minute impression of LinkedIn recruitment tools was that they are expensive and limited by the nature of the platform. They rely on users to self-declare their skills and experience.

It is easy to list SQL, Python, dashboards and analytics experience. It is much harder to see what that actually looks like in practice.

For early-career folks in particular, profiles can end up looking very similar. This is not a criticism. Most are doing their best to present themselves clearly and competitively, and that takes effort.

The challenge is that the format naturally compresses everyone into the same set of signals.

This makes it difficult to distinguish between those who are actively building and applying their skills and those who simply describe them well.

For technical roles, there is a more direct option.

You can look at the code.

Instead of relying on profiles, you can assess people based on their actual projects and repositories.

Rather than waiting for applications, you start from public work and trace it back to the person behind it.

This shows people in a more natural working state. Not a polished profile, but how they actually write queries, structure scripts, and organise their work.

It also surfaces who is spending time building outside of work or study: small projects, half-finished ideas, incremental commits.

This is a small proof-of-concept pipeline that pulls public GitHub activity, extracts real code, and uses an LLM to assess and classify a person’s technical and analytical strengths.

The goal is not to automate hiring. It is a way to build a better shortlist using actual work rather than self-description.

Step 1: Find Melbourne / Victoria GitHub profiles

The first script just brute-forces GitHub user search with location queries like

Filtered to accounts created after 1 Jan 2023 so it leans towards newer / early-career profiles.

Searching GitHub for Melbourne and regional Victoria users
Initial GitHub search for Melbourne and regional Victoria users.

For each user it pulls:

Step 2: Use an LLM to assess code in users’ repositories

Once you have a small packet of recent SQL and Python code, you can actually evaluate what someone can do.

Each person gets a bundled “code packet” (recent .sql and .py files pulled from their commits), which is sent to an LLM with a constrained prompt.

The goal is not “is this person good”.

It is much narrower:

The prompt is intentionally strict:

You are reviewing public GitHub code for evidence of early-career analytics skill.

Assess:
- SQL skill
- Python skill
- SQL structure and readability
- SQL/Python interoperability
- Segmentation / analytical thinking

Rules:
- Use only the provided code
- Be conservative
- Cite actual file/path evidence
- Return strict JSON only

LLM assessing GitHub code packets and returning structured output
LLM assessing bundled GitHub code and returning structured scores.

Step 3: Review the results

The LLM returns a structured response for each person, which is flattened into a table.

Each row represents a GitHub profile, with fields such as SQL skill, Python skill, overall score, and a short evidence summary tied back to specific files.

At that point it becomes a filtering exercise. You can scan the table, shortlist profiles that show consistent signals, and click through to the underlying repositories if needed.

Most profiles include a link to LinkedIn or another contact method.

From there, it is just deciding who to contact.

You can view a small interactive sample of the output here. The usernames have been replaced for anonymity:

Open the results

The scripts used for this are available here:

← Prev