Mp3 file magic number identification

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by ‘magic number’?
I never trust file extensions to be correct. It’s to easy for users
to accidentally munge file names in a GUI or even for malicious users
to try bad things by simply changing file names.

Any library or code is welcome!
Daniel B. said he’d even add it to Ptools or a similar library if
it gets posted on Ruby-Talk.

I did find this online as a purported mp3 magic number (in hex of
course),
49 44 33
but I’m not even going to bother using it since I don’t know
definitively that all mp3’s will have it, and I don’t know where to
expect it in the file.

Thanks,
John J.

John J. wrote:

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by ‘magic number’?

I did find this online as a purported mp3 magic number (in hex of course),
49 44 33
but I’m not even going to bother using it since I don’t know
definitively that all mp3’s will have it, and I don’t know where to
expect it in the file.

does this help: http://raa.ruby-lang.org/project/filemagic/

Stefan

On 8/15/07, John J. [email protected] wrote:

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by ‘magic number’?

I know for sure that:
= wave files start with the characters ‘RIFF’ followed by 4 bytes
(filesize-8) followed by ‘WAVE’.
= ogg vorbis files start with ‘oggS’ followed by 24 bytes then 0x01
and the string ‘vorbis’
= MIDI files start with ‘MThd’

and according to wikipedia (and verified with one file on my system)
= MP3 files should start with 0xFF FB or 0xFF FA.

-Adam

On Aug 15, 2007, at 5:45 PM, Stefan M. wrote:

does this help: http://raa.ruby-lang.org/project/filemagic/

Stefan

Thanks Stefan, but I should have said, I was hoping for the pure Ruby
implementation. More portable that way.

On Thu, Aug 16, 2007, John J. wrote:

Thanks Adam, that’s the kind of thing I’m looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

If you’re on a *nix system, you should have a “magic” file someplace
that describes the magic of every filetype that the “file” command can
understand.

If you’re not, find someone who is that can send you the file :slight_smile: You
might also look at the libmagic source or the filemagic source.

Ben

Thanks Adam, Ben, and others…
found the magic number file in
/usr/share/file/magic
(on OS X, but likely in the same place on any *nix, I’m guessing it’s
one of those files that is often used by people more sophisticated
than myself who write C for a living)
There is a LOT of stuff in there!!
Wish I had looked in there before!
So I’ve written a minimal bit of Ruby like an lazy person. Copying
D.Berger’s Ptools style basically, by simply adding to the File class
my mini methods.

Though I’m going to need some testing… the magic file (not always
easy to read)
says this :

MPEG 1.0 Layer 3

0 beshort&0xfffe =0xfffa \bMP3

I’m not 100% sure, but Adam said 0xFFFB or 0xFFFA, and the magic file
lists only FFFA or does it mean FFFE and/or FFFA ?

On Aug 15, 2007, at 7:23 PM, Adam S. wrote:

and according to wikipedia (and verified with one file on my system)
= MP3 files should start with 0xFF FB or 0xFF FA.

-Adam

Thanks Adam, that’s the kind of thing I’m looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

I guess video/AV files should be next as well, primarily things
like .mov, .wmv, etc…
Oh, and I think we should find out if smaf is the same as midi.

These kind of validation tools can be useful to us all these days.

On Aug 16, 2007, at 1:30 AM, Ben B. wrote:

Do “man magic” (or possibly man 5 magic, or man -s 5 magic), and it
should describe the format of the file. Basically, it’s offset, type,
magic, message. Numeric types can be specified with &0xnnnn, where
the
number is ANDed with the magic. I’m basically just quoting from the
manpage, though, so give it a gander.

Unix is cool :slight_smile:

Ben

Yeah, I read that already. Seemed simple. Many file descriptions are
readable, but the MP3 one is one of many that don’t make sense to me.
" Numeric types can be specified with &0xnnnn, where the

number is ANDed with the magic."
Makes no sense to me at all. I’m not a C person really.

MPEG 1.0 Layer 3

0 beshort&0xfffe =0xfffa \bMP3
So what does the above mean??
I see hex numbers. but what is that ‘=’ doing ?
That’s cryptic.
The other lines after that make sense. They all describe the second
byte and that it determines the bitrate.
so do I care about 0xfffe? or 0xfffa?
or both?
I’m hoping I’m doing this right.

On Thu, Aug 16, 2007, John J. wrote:

Though I’m going to need some testing… the magic file (not always
easy to read)
says this :

MPEG 1.0 Layer 3

0 beshort&0xfffe =0xfffa \bMP3

I’m not 100% sure, but Adam said 0xFFFB or 0xFFFA, and the magic file
lists only FFFA or does it mean FFFE and/or FFFA ?

Do “man magic” (or possibly man 5 magic, or man -s 5 magic), and it
should describe the format of the file. Basically, it’s offset, type,
magic, message. Numeric types can be specified with &0xnnnn, where the
number is ANDed with the magic. I’m basically just quoting from the
manpage, though, so give it a gander.

Unix is cool :slight_smile:

Ben

On Thu, Aug 16, 2007, John J. wrote:

Yeah, I read that already. Seemed simple. Many file descriptions are
readable, but the MP3 one is one of many that don’t make sense to me.
" Numeric types can be specified with &0xnnnn, where the

number is ANDed with the magic."
Makes no sense to me at all. I’m not a C person really.

That’s not a C thing, that’s just general math.

MPEG 1.0 Layer 3

0 beshort&0xfffe =0xfffa \bMP3
So what does the above mean??
I see hex numbers. but what is that ‘=’ doing ?

Okay, from left to right:

0: that’s the offset. It means the magic starts at byte 0

beshort&0xfffe: the magic is a big-endian short (2bytes), and you should
take the value you get from the file and AND it with 0xfffe

=0xfffa: this is what you’re looking for

\bMP3: this is what file will print if it matches this magic.

That’s cryptic.

Sure, but it’s all explained in the man page.

The other lines after that make sense. They all describe the second
byte and that it determines the bitrate.
so do I care about 0xfffe? or 0xfffa?
or both?

Yes, that’s the magic.

I’m hoping I’m doing this right.

If your script is correctly identifying MP3 files you’re using as a
control, then you’re probably doing it just fine :slight_smile:

One thing to be careful of is that there are multiple definitions of
what an MP3 looks like (at least, there are in my magic file). For
instance, MP3s with an ID3v2 tag will start with “ID3” instead of the
magic described above.

Make sure you search through your whole magic file for any given type
before you commit to writing code for it. You might find exceptions or
easier cases.

Cheers,
Ben

Hi,

On 8/16/07, John J. [email protected] wrote:

I did find this online as a purported mp3 magic number (in hex of
course),
49 44 33
but I’m not even going to bother using it since I don’t know
definitively that all mp3’s will have it, and I don’t know where to
expect it in the file.

This explains it all:
http://upload.wikimedia.org/wikipedia/commons/0/01/Mp3filestructure.svg

So first byte should be 0xFF, second byte & 0xFE should equal 0xFA.
that is only for layer-3.

However if the MP3 has ID3v1 tags then it will start with “ID3”.

Best regards.

On Thu, Aug 16, 2007, Felipe C. wrote:

So first byte should be 0xFF, second byte & 0xFE should equal 0xFA.
that is only for layer-3.

However if the MP3 has ID3v1 tags then it will start with “ID3”.

Actually, ID3v1 tags go at the end of the file, ID3v2 tags go at the
beginning (usually; they’re supported in both locations).

Ben

On Aug 16, 2007, at 10:18 AM, Ben B. wrote:

That’s one point I was definitely concerned about. Some sites
describe one or the other, but don’t always carefully make the
distinction which ID3 version.

Well, my script seems to work. For my current purposes it should be
enough, but I’m still a little fuzzy on what it means to AND the
bytes FFFE and FFFA ?

What kind of AND?

I’m not only trying to have a working script, I want to know what I’m
doing here so next time I don’t have to ask
(this is the first time I’ve delved into binary file structures, so
bear with me here.)
I am learning a lot with this. thanks

On Fri, Aug 17, 2007, John J. wrote:

That’s one point I was definitely concerned about. Some sites
describe one or the other, but don’t always carefully make the
distinction which ID3 version.

Yeah. MP3s are complex :slight_smile:

Well, my script seems to work. For my current purposes it should be
enough, but I’m still a little fuzzy on what it means to AND the
bytes FFFE and FFFA ?

So you take the first 2 bytes of the file and AND them with FFFE. If
the result is FFFA, then you’ve got an mp3 file (albeit one with no
tag).

What kind of AND?

Bitwise. Boolean AND doesn’t make any sense in this context.

Say the first two bytes are 0xFFFB:

magic = 0xFFFB
check = 0xFFFE

if (magic & check) == 0xFFFA
puts “you’ve got an mp3”
end

I’m not only trying to have a working script, I want to know what I’m
doing here so next time I don’t have to ask
(this is the first time I’ve delved into binary file structures, so
bear with me here.)
I am learning a lot with this. thanks

No problem. This stuff is pretty trivial in the grand scheme of things,
but can definitely be confusing if you’ve never worked with binary
before.

Ben

quoth the John J.:

= MIDI files start with ‘MThd’

and according to wikipedia (and verified with one file on my system)
= MP3 files should start with 0xFF FB or 0xFF FA.

-Adam

Thanks Adam, that’s the kind of thing I’m looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

The first 4 bytes of a Flac file must be 0x66, 0x4C, 0x61, and 0x43,
ie: ‘fLaC’.

I guess video/AV files should be next as well, primarily things
like .mov, .wmv, etc…

.wma/.wmv are a bit trickier. You can do:


def byteStringToGUID(byteString)
guidString = sprintf("%02X", byteString[3])
guidString += sprintf("%02X", byteString[2])
guidString += sprintf("%02X", byteString[1])
guidString += sprintf("%02X", byteString[0])
guidString += ‘-’
guidString += sprintf("%02X", byteString[5])
guidString += sprintf("%02X", byteString[4])
guidString += ‘-’
guidString += sprintf("%02X", byteString[7])
guidString += sprintf("%02X", byteString[6])
guidString += ‘-’
guidString += sprintf("%02X", byteString[8])
guidString += sprintf("%02X", byteString[9])
guidString += ‘-’
guidString += sprintf("%02X", byteString[10])
guidString += sprintf("%02X", byteString[11])
guidString += sprintf("%02X", byteString[12])
guidString += sprintf("%02X", byteString[13])
guidString += sprintf("%02X", byteString[14])
guidString += sprintf("%02X", byteString[15])
end

fh = File.new(“example.wma”, “rb”)
id = byteStringToGUID(fh.read(16))
if id == ‘75B22630-668E-11CF-A6D9-00AA0062CE6C’
puts “Valid wma/wmv file”
else
“Not a wma/wmv”
end

This will work for anything in an ASF wrapper.

Your best bet to find this info for other files is to find and read the
respective specs. These should be easy to track down using Wikipedia’s
audio
and video codec categories. Usually there is a direct link to the spec,
or at
least the official site for the codec.

HTH

-d