MP3.com 救援驳船驳船

MP3.com 救援驳船驳船
The MP3.com Rescue Barge Barge

原始链接: https://blog.somnolescent.net/2025/09/mp3-com-rescue-barge-barge/

受对互联网档案（Internet Archive）法律挑战的担忧驱动，一个项目被启动以全面存档MP3.com上的音乐。利用互联网档案的“mp3.com Rescue Barge”（960.6 GB）以及之前从时光机（Wayback Machine）收集的链接，共收集了1.78TB的音频数据。该过程涉及克服存储限制，并最终在朋友的帮助下将数据整合到3TB驱动器上。一个主要障碍是整理这个庞大的音乐库——最初是一个混乱的文件夹结构，通过使用Winamp 5 (WACUP)来索引和导出包含533,046首歌曲元数据（艺术家、曲目名称、URL、日期）的详细CSV文件来解决。现在以CSV和Excel格式提供的结果数据集，允许用户轻松搜索和浏览存档的MP3.com音乐。虽然不完美，但该项目旨在保存这段数字音乐历史，并有可能在未来将其与存档的艺术家页面集成。创建者承认潜在的改进，例如编码器识别以及为CSV提供更易于机器读取的格式。

原文

So back in November, I grabbed 1.78TB of media from the Internet Archive’s mp3.com Rescue Barge, and their Wayback Machine both, in pursuit of creating the most thoroughly complete archive I can muster of MP3.com’s music.

The Rescue Barge is a set of collections on Archive.org’s own site, up for grabs in a cool 960.6 GB. But I had a massive text file list of download.mp3.com links sitting around from Reddit from my previous Random MP3.com Playlist project that, while not fully archived by the Wayback Machine with lots of 404 urls, carried more audio even than represented in the Archive’s own barge! Realizing this drove me to grab all I could of both of them.

The Internet Archive’s legal status then in question in the news following a series of lawsuits from book publishers and music labels (this is kind of old news, but this project was cooking a while), I felt a sense of responsibility to make sure all the MP3.com downloads I could find on their servers was safe somewhere from the barge and beyond. Thus forming what I like to call, the MP3.com Rescue Barge.. barge – and subsequently, the metadata set I set to create from it with artist, track names and other information that you’ll be able to sift through at home in a nifty CSV set and spreadsheet.

(The resultant set all uses Archive.org URLs, so it’s not exactly resistant to if Internet Archive went down. This means I now have a copy on my own, though, so a bit of redundancy should the need arise, woot!)

Grabbing the audio

I wasn’t sure how much storage I needed at first. I started just on the terabyte data drive in my previous desktop, but I soon run up into a wall downloading a chunk of the Wayback Machine links. Fortunately, one of my friends in college let me use a 1.5 TB drive to finish the job, and that helped.

Though, that wasn’t big enough for the whole thing, so for a while, the Barge barge was split across an internal drive, and an external one. Fortune struck again when I caught wind of a 3 TB drive over at Free Geek, with a light amount of uptime hours, to boot! I swapped in the new drive on the tower I was working with this on, which was just enough for all of my data already on the terabyte drive, and both grabs.

With it all in one place, the obstacle for a while was, well, now what? The scripts I wrote to grab everything, admittedly, were a bit of a bodge. In hopes of preserving the source URLs of all the mp3s I let wget make a folder for each one it grabbed, making searching for anything a somewhat… inefficient affair. This was fortunately remedied by voidtools’ Everything, a tool I use every day for quickly indexing and finding files on disk, but I hoped to make a database that included all the mp3 tags, and maybe even mp3guessenc results in the future, for important research… How could I do that easily?

Compiling the data

Enter WACUP! Well, just Winamp 5, technically. It’s a music player that, turns out, handles extremely large music libraries quite well. The first time I tried this, under the x86 version, after I wanna say an hour, but maybe less, it did crash at 250k songs and corrupt the database while I tried to look up a song. Remember that this is while loading in and indexing all the tags!

I had to clear it under the same version before trying it again at 64-bit, but once all was said and done, I really envy the UI prowess the Nullsoft guys had, in terms of actually browsing the library. With this many songs, it still felt smooth as silk! I had just tried the same thing with a MAGIX MP3 Deluxe trial, and I felt bad offering the poor thing such a Sisyphean task. (I’ll admit that I’m assuming that the underlying Media Library plugin in WACUP is the same as originally from Nullsoft. I could be wrong, but I am very fond of what the project is doing either way to support the old player.)

Once that was all in WACUP, I realized you could pick and choose what fields you wanted it to display, including paths; and furthermore, that there was not only an m3u8 export option with just file paths, but a CSV one that included all the tags and information I wanted for my database! The resulting file was 137 MB, though NTFS compresses it down to 60.1 MB.

Look at the CSV format, and the player itself below:

A big motivator for this project was to index the tags of all of these loose URLs I had of mp3.com downloads, and possibly combine it with already existing mirrors of the website as it was for an easily browsable set – a possible future project, integrating this with archives of the artist pages themselves.

And now I had all of them for 533,046 songs! (In files I was able to download, anyway; a few of them were corrupted or non-audio, but because they returned a 200 and weren’t 404ed entirely, I left them in.)

Cleanup

So this is somewhat useful to people other than me, the next step after exporting the library in WACUP was to change the local paths to their appropriate source URLs. There are more complex ways I could have done this with scripting, but I just whipped out Excel after exporting the CSV.

In the process, I noticed that one of the MP3s slyly stuck a newline in the comment field, messing up the flow of the file a little (many programs that support CSV, including Excel, count a newline as the end of a record). Usually, this is just reserved for the username or URL to the author’s artist page, so I’m not too sure how that happened. Curious.

In any case, accounting for that and running a few find and replaces on the file path column to convert the format from what it is locally to where the source URL was completes the data set. (I realized I also had to be careful for Excel not to try to evaluate some of the artist names and song titles as math expressions.)

That all taken care of, this spreadsheet (and CSV I re-exported) contains a lot of useful data. Track names, artist names, and for a great deal of them, even the year and the mp3.com URL from which the mp3 was submitted. With some exceptions that mostly stemming from the Barge downloads (a few others are my error), most carry the modified date as they were on the server. There are duplicate rows as well to cover both the Barge upload and the web.archive.org download.mp3.com mirror.

Try it!!

It is not the most perfectly machine readable CSV (I think I could go further with UNIX times or something), and I think in the future I’d like to write a script that’ll make a new column for mp3 encoder identification with mp3guessenc, but for now I’m pleased with having it done and out there. For purposes of looking up an artist or title and seeing what’s out there it should work pretty well.

By the time this post has published, I’ll have updated my site with it in CSV and Excel format on the MP3.com Preservation page, so download and enjoy. I have an abbreviated history and how to use it there.