Add new practice exercise `baffling-birthdays` by jiegillet · Pull Request #1575 · exercism/elixir

jiegillet · 2025-06-07T07:35:47Z

New exercise.

This one is a bit of a special one, and I might have gotten carried away with the math :D

It involves randomness, and therefore the tests have the potential to be flaky. I did the math to ensure that the flakiness is kept to a minimum (like one fail per 100k runs), but the safer I make the tests, the less strict they become.

github-actions · 2025-06-07T07:35:55Z

Thank you for contributing to exercism/elixir 💜 🎉. This is an automated PR comment 🤖 for the maintainers of this repository that helps with the PR review process. You can safely ignore it and wait for a maintainer to review your changes.

Based on the files changed in this PR, it would be good to pay attention to the following details when reviewing the PR:

General steps
- 🏆 Does this PR need to receive a label with a reputation modifier (x:size/{tiny,small,medium,large,massive})? (A medium reputation amount is awarded by default, see docs)
Any exercise changed
- 👤 Does the author of the PR need to be added as an author or contributor in <exercise>/.meta/config.json (see docs)?
- 🔬 Do the analyzer and the analyzer comments exist for this exercise? Do they need to be changed?
- 📜 Does the design file (<exercise>/.meta/design.md) need to be updated to document new implementation decisions?
Practice exercise changed
- 🌲 Do prerequisites, practices, and difficulty in config.json need to be updated?
- 🧑‍🏫 Are the changes in accordance with the community-wide problem specifiations?
Practice exercise tests changed
- ⚪️ Are all tests except the first one skipped?
- 📜 Does <exercise>/.meta/tests.toml need updating?

Automated comment created by PR Commenter 🤖.

angelikatyborska

I think I agree with the testing approach. It's good that you left comments for students that tell them how different sample sizes behave in terms of flaky tests.

The tests with your implementation are very fast, but I think some students might go too far with sample sizes and have problems with timeouts 🤔

angelikatyborska · 2025-06-09T06:58:49Z

+        ~D[2019-02-12]
+      ]
+
+      output = BafflingBirthdays.shared_birthday?(birthdates) == true


Here and on LOC 50 are unused variables

@jiegillet sorry, I should have been clearer - please make those expressions into assertions 😅 now it's:

warning: use of operator == has no effect │ 63 │ BafflingBirthdays.shared_birthday?(birthdates) == true │ ~ │ └─ test/baffling_birthdays_test.exs:63:54

Oh, don't apologize for me being an idiot 😆

angelikatyborska · 2025-06-09T07:04:12Z

+
+      # for a sample size of 100 and this delta, the assertion is expected to fail once in 100 runs
+      # for a sample size of 600 and this delta, the assertion is expected to fail once in a billion runs
+      delta = 0.83


How did you choose the deltas in the last few tests?

I used this wolfram alpha app

enter the sample proportion (expected)

enter the confidence level (0.99, which means one failure in 100)

enter the sample size (100)

That gives you the 99% interval, meaning a random pick will fall in there 99% of the time. This interval goes from 0.03417 to 0.1997, which is 2 times the delta, that's how I got the value.

After that, I entered a confidence level of 0.999999999 (one failure in a billion) and tweaked the sample size until I found an interval that's fairly close to the confidence level of 0.99, that came out to being around 600 every time. It's not an exact calculation, but it should be close enough.

I used mix test --repeat-until-failure 100 with a sample of 100 to check if the solution would actually fail, and it does about half the time, which is expected, so it seems to work. I did not try a billion times with a sample of 600, but I ran it for a bit and never managed to make it fail 😆

I have one dark secret for this particular test though. If you try with a sample of 100, you will almost always fail the test, because with 100 sample, most of the time you will estimate 100 or 99, which is outside of the theoretical range (99.08 to 1). But I chose to ignore it, it's complicated to explain and it's unlikely that people will only pick a 100 sample size, and if they do, they'll see the comment and pick something bigger.

angelikatyborska · 2025-06-09T07:04:49Z

+      expected_count_standard_deviation = fn
+        day when day <= 28 -> :math.sqrt(group_size * 12 / 365 * (1 - 12 / 365))
+        day when day <= 30 -> :math.sqrt(group_size * 11 / 365 * (1 - 11 / 365))
+        day when day == 31 -> :math.sqrt(group_size * 7 / 365 * (1 - 7 / 365))
+      end
+
+      counts_outside_95_percent_confidence_interval =
+        day_frequencies
+        |> Enum.filter(fn {day, count} ->
+          abs(count - expected_count.(day)) > 1.96 * expected_count_standard_deviation.(day)
+        end)
+        |> length()


Is there maybe a website that you could link in a comment in this code that could explain what's going on? My statistics knowledge is very rusty

There's a log going on, that's what I meant when I said that maybe I got carried away lol

There are many concepts involved:

calculating the probabilities of events happening (how many times each day is found in a year), although that's basically counting

calculating the mean and variance for a discrete uniform distribution, that's what those functions above are doing

calculating the 95% confidence interval, using a normal distribution for simplicity

calculating the probabilities that x out of y events will be outside that range, using binomial distributions

I was wondering if I should explain more each step, but it seems like too much, when solving the exercise is actually pretty easy.
What do you think?

Hmm... yes, that is way too much. I won't even pretend I understand 🙈

I see a few potential problems with this testing code:

students might get confused by it (but in reality they should just read the test description and know what to do)

confused students might ask mentors about this test code, and mentors won't know how to explain it (but so what? it's important to know how to say "I don't know" 😬)

if this test code ever needs to be changed and there's nobody with a physics or maths phd around... I won't know how to fix the tests 😅 (but then I would just throw it all away and do what Erik did: https://github.com/exercism/csharp/pull/2402/files#diff-a31c6c904e805077bb6ab5a333d6e0f154d63488697388f75ce77e4ea40a4eba)

Still, I'm pretty impressed with the effort you went through and if I understood it, I'm sure I would agree it's the only reasonable way to write a reliable test for random values 🤓 I'm fine just accepting that I don't get it and let it be merged.

I would not have done this if it wasn't fun :)

I actually agree with your list of problems, I share your concerns. I looked at Erik's test, my tests include his (assert map_size(month_frequencies) == number_of_months), but yes, I am fine with future maintainers simplifying the tests if I get hit by a bus. Or if this causes more confusion than it's worth.

angelikatyborska · 2025-06-09T07:11:23Z

+  @spec random_birthdates(group_size :: integer()) :: [Date.t()]
+  def random_birthdates(group_size) do
+    for _ <- 1..group_size do
+      year = generate_non_leap_year_january_first(0, 3000)


This is an interesting year range 😬 just so that we're clear: it's completely unnecessary to make the year random, right? To pass the tests, you might just as well choose the same non-leap year for all birthdates.

Yeah, as far as the tests are concerned you could pick year = 2025 and be done with it, that's true.
In terms of generating random Date.t(), this felt more natural to me. I guess it doesn't really matter?

right - It does not matter! I just wanted to make sure 🙂

angelikatyborska · 2025-06-09T07:15:51Z

+          "randomness",
+          "dates-and-time",
+          "lists",
+          "enum"


You have also used: list-comprehensions, ranges, tuples. I think list-comprehensions might be optional (you're not using cartesian products, right?), but ranges might be necessary for generating a list of a given length.

Additionally a MapSet, but we don't have a concept for that 🤷

Thank you, I added them all, it could help to know the concepts.

Additionally a MapSet, but we don't have a concept for that 🤷

I made a set concept for Elm a while ago. Want me to fork it? :)

Sure thing, if you have the time, it would be very nice to have that concept in Elixir too

jiegillet · 2025-06-13T05:29:18Z

The tests with your implementation are very fast, but I think some students might go too far with sample sizes and have problems with timeouts 🤔

I did some test by changing my sample size:

1M: 40s
100k: 4s
10k: 0.4s

The cutoff is 10 seconds right? That's a limit of 250k samples. That doesn't seem too bad, especially considering that the tests are kind of hinting that a sample of 600 would be fine.

angelikatyborska · 2025-06-14T16:29:49Z

The cutoff is 10 seconds right?

I somehow remember it's 2x or 3x the average test runtime set in our config file ("average_run_time": 4) but maybe it's just simple 10s. Anyway similar values (8s, 10s, or 12s)

especially considering that the tests are kind of hinting that a sample of 600 would be fine.

Agreed, I'm no longer worried about test speed. Thanks!

Add new practice exercise baffling-birthdays

130e0a6

angelikatyborska reviewed Jun 9, 2025

View reviewed changes

jiegillet added 4 commits June 13, 2025 10:49

remove unused variables

f1a67b6

use map_size

208ff67

fix failure rate for days

7c12532

add prerequisites

e78a151

actually assert result is true

42fde39

angelikatyborska approved these changes Jun 15, 2025

View reviewed changes

angelikatyborska merged commit df0b279 into main Jun 15, 2025
9 checks passed

angelikatyborska deleted the jie-baffling-birthdays branch June 15, 2025 07:59

Uh oh!

Conversation

jiegillet commented Jun 7, 2025

Uh oh!

github-actions Bot commented Jun 7, 2025

Uh oh!

angelikatyborska left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiegillet commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angelikatyborska commented Jun 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiegillet commented Jun 13, 2025 •

edited

Loading