Problem B
Haydn Seek
Languages
en
sv
A lot of people have used the music recognition software
"Shazam" (the app that tells you the name of a song by
listening to it), most people think it’s some smart algorithm
like Fast Fourier Transform that makes it be able to recognize
what song is being played. But as it turns out, Shazam has -
like all smart companies - outsourced this work to Iceland
instead. So everytime you want the name of a song, it goes
through a number of experts in a certain field that will listen
and then tell you the song name.
Hordigordur has recently been employed by Shazam and wrote on
his CV that he knows everything about knowing which classical
composer is which by only listening to a small bit of a piece.
However, this is a lie, Hordigordur doesn’t know anything about
classical composers, he has however gotten some data from his
company that he can "polish his skills with". Instead of
actually learning all the material, Hordigordur does the
unthinkable: he outsources it back to you. He asks you to
create an algorithm that will listen to a bit of music and
determine which composer wrote it.
Indata
Download the zip-file with training data and test data, this can be found at the bottom of the page where it says "attachments". The data contains an interval of 100 notes in a piece of music. Every note can be described by its starting time (in the column "start"), its duration (in the column "duration"), its pitch (in the column "pitch") and its velocity (in the column "velocity"). The training data will also include what composer wrote the piece in the column "composer".
Output
For all test cases, you should print one row that contains a
string: the composer’s name.
Note that some name can be spelled in a way you might not
expect, so make sure to copy the names from the training
data.
Scoring
If $x$ is how many
percent you guess correctly of the composer, your final score
is:
At the end of the competition, all solutions will be retested on the remaining 70% of the data. Your final score at the end of the competition will only be based on the remaining 70% of the data; the 30% tested during the competition will have no effect. It is guaranteed that the 30% tested during the competition were chosen uniformly at random and are entirely disjoint from the 70% tested at the end. Therefore, the results on the 30% tested during the competition should be seen as a strong indicator of how well your solution performs. At the same time, it is detrimental to overfit your solution to the test data.