Research Proposal: Listening Data
Introduction
A long-term study on musical taste and listening habits.
Objectives
- Gather long-term (6 month at least) data about user's listening habits: what
tracks they listen to, how often, how this changes over time.
- Use the data for evaluation and development of music recommendation
systems.
Users
How many users do we need? Ideally, 50-100. minimally, 10.
Representative sample of taste groups. Rock only, or classical/jazz?
Motivating Participation
Why do users participate? What kind of effort is required
of them? iTunes fanatics may participate out of sheer love, if minimal
effort required. Alternatively, we can compensate subjects financially,
or pay them in kind, with free music. Gift certificates to iTunes store
(not available yet), e.g.
Data
Automatic temporal listening data
This data is gathered passively, that is, with no special action required from
the user.
- track name
- each play: date/time, length
- number of plays (can infer from play time stats)
- play order (can infer from play time stats)
- rating(? depending on whether the player supports it. not really passive)
Preference or ratings data would be wonderful if possible, but
probably would greatly limit the participation and may not be worth
the cost.
User-labeled Metadata
listening time: bedtime, work, relaxing, commute, etc. Can we
simply infer this from date/time?
current mood
I expect that users will never report their mood both accurately and
consistently if they are required to remember to do so. We could prompt
them, if we can write code for their listening platform. This may
be difficult for a portable device like the iPod, but may be simple
enough on a PC. We could periodically pop up a window asking "How are
you feeling now?" with several choices from a pull-down menu. However,
even if technically possible, this may prove too invasive and annoying,
especially if the study extends over several weeks or months. Perhaps a
small set of subjects could be asked to do this for a shorter period
of time (a week, say), in return for more compensation. An alternative
is to investigate other mood indicators that can be gathered passively,
for instance email style or facial expression (consult with the affective
computing experts).
Analysis
What will we do with the data?
- Evaluation of recommendation agents
- Analysis of taste trajectories, mood and time-of-day effects
- Model listening as an optimization problem
- Temporal proximity as similarity measure; mood-sensitive recommendation
Evaluation
One straightforward way to use the data is to run a predictive leave-one-out
experiment. In other words, based on a user's listening habits until time T,
predict future listening events, the addition of new artists to the collection,
and so on.
Is there any difference between temporal data and simply a
collection? Is a predictive, time-based, leave-one-out experiment all that
different than a standard leave-one-out experiment? Perhaps, but it's subtle.
For example, I may not like Mr. Bungle if I hear it cold, but after exposure
to Mike Patton's style in the more mainstream Faith No More, I become more
receptive to the outlandishness of Mr. Bungle.
Taste Trajectory Classes
- Hot burnout
- Slow grow
- flat
Can we classify trajectories into types? How long are the regimes?
Does the type, shape, and length of a trajectory change much across
users for a given song? Across songs for a given user? Based on the
current collection, can we predict the type of a new song? If so, this
information can be applied in a recommender system, for example by only
previewing "hot" items, and introducing slow burners in personalized radio
streams.
Music Value Optimization
Another way to look at the data is the economist's view: optimization of
listening time. Suppose the user has a budget of listening time, and seeks to
maximize her listening "gain" (what is this? pleasure? a more subtle
sympathy with the music?) How does she allocate her listening time? A model
would need to account for the inefficiency of the listening system, i.e.
search cost. An interesting direction would be to model how overall value is
increased with better search efficiency, how this changes listening habits,
and then look for evidence in the user data that supports the model.
Daily/Mood Cycles
Temporal listening data allows us to examine such questions as:
- Which songs are likely to be played at certain hours? (bedtime, work
hours, weekends, morning commute)
- Which songs co-occur in time? Temporal proximity could be an indicator of
similarity.
One experiment could be to model mood or time of day as a hidden variable
in a probabilistic state model such as an HMM or LSA (latent semantic
analysis) model. Trained on the listening data, states may take on clear
meanings that correspond to moods or listening situations. The model could
then be used generatively, to extend playlists based on a few seed songs.
The result would be mood-sensitive playlist extension.
Implementation
Data Collection
Collecting from the client: We write a simple client or script to run on user's
machines that will upload their data automatically to us periodically.
Alternatively, we ask users to bring in their machines periodically or at the
end of the study, and we upload data manually. Obviously, an automated approach
is preferable: scales better, more timely, less risk of data loss.
Audio
Do we need the audio for every track in all user's collections? Can we get it
from them, legally? Perform feature calculation on their clients?
Name regularization
Users will likely have tracks named with different conventions; we'll need to
regularize them.
iTunes
- software player and portable device integrated
- built in statistics collection:
last played
play count
rating (manual)
- synced between portable and PC? separate stats, i think
- Where is the data stored? is it easy to get out?
- potentially can reach back and get useful data for usage in the past,
since the stats collection is native.
Windows Media Player plugin
- write plugin ourselves, should have lots of control
- won't have any retroactive stats: only from now forward.
- Easier to implement pop-up survey questions (current mood, etc)
Muse.net client
Muse.net is a new project that allows access to a
user's home media collection from any web browser. There is a published API and
they encourage people to build new clients.
Posted by madadam at October 9, 2003 11:32 AM