On Sat, Sep 6, 2008 at 4:45 AM, Valery K.
[email protected] wrote:
- multipart/form-data does not bloat the size of the file, it doesn’t
encode anything. rfc 1867 doesn’t explicitly say that there any encoding
should be applied;
that’s weird. i thought someone said it bloats it. i will have to
update that next time i post about this.
- the ideal solution is to have byte ranges instead of Segment ID, since
concatenation of parts is not a scalable operation. With byte ranges the
server will be able to put the part into the proper place in the file, while
leaving other parts empty. I.e. if I have two parts
with byte ranges 0-19999/80000 and 40000-59999/80000, I can lseek to the
second part and start writing two parts simultaneously:
|<-- part1 0-19999/80000 -->|<-- zeroes 000000 -->|<-- part2
40000-59999/80000 -->|
The reason I decided segments is due to being able to transfer
multiple segments in parallel, and I don’t know enough about server
side code and shared filesystems to know if it would work properly
over NFS or something else. I am thinking of a solution that has no
“it may not work” type of restrictions.
If only one file can be sent at a time, then I was thinking PHP (since
this was a first attempt into it, and I only know PHP) can seek to the
specific byte range; however, being able to split up the file into
segments and send them at will allows for multiple segments to be
uploaded for the same file and does not have any NFS/locking risks.
If you want to code an extension that does this cleaner and uses
byteranges that will be safe over a network filesystem like NFS that
works for me I only know PHP, and have assumptions based on how
other things do it.
A 2 gig file at 128k chunks (segments) would wind up being 2000 * 8 =
16000 chunks. Thats a lot of files. I started thinking of making
“superchunks” which would be groupings of 100 chunks or something, so
after 100 chunks (in a row) were successful it would glue those
together, and reduce the number of files…
If this idea at least sounds viable, I think I could scrap together a
decent amount of cash from my side business and my company to fund
this. It would have to support operations safely over NFS, CIFS, or
single servers (so a local /tmp file wouldn’t work for NFS, since the
requests could be sent to any of the webservers, so it would have to
be on a shared directory, which should be user configurable)
I suppose client-side isn’t too hard to seek throughout a file as it
doesn’t have to worry about odd locking issues and writing. It would
be great if the server end could be created in a way that it could be
come an “unofficial” standard on how to upload large files or with
unreliable or slow connections.
It would also take care of progress bar stuff as it could give
feedback when chunks get completed back to the client, and the client
knows how fast it is sending data… so during a chunk it would be
relying on it’s own internal transfer stats, and it would be able to
confirm up to byte 70000 is completed on the server for example…
Also, there’s an issue of garbage collection. A job would have to
clean out the [shared] temp directory after a while - I thought
something like 4 days would be nice [user configurable is best],
because a 2 gig file could take a long time and people might have to
resume it over the course of a couple days, but any longer we’d have
to assume it’s an orphan file that won’t be resumed again.
If you’re interested in this, I would love to have someone as
experienced as you - who has already dealt with handling file uploads
and created nginx modules! Let me know, feel free to contact me off
list. We might have to work out some more specifics, and you might
want to know how much $$ - I’d have to ask at work what they would pay
for it, but I’d pledge $500 out of my own pocket. I don’t believe any
other webserver or anything out there has anything like this (besides
maybe some thick client apps with specific servers that only handle
file uploads…)
Ideally, this would be something that other people could create
modules for Apache, etc. as well and it could be adopted by browsers
directly and alleviate the need for thick Java/Flash/etc apps. If it’s
done in a “standard” enough way to be re-creatable on the client and
the server…
I’d love to hear any thoughts, opinions, get any code going, etc. I’ve
actually got a PHP version of the server component that I think is
actually functional (with minimal amounts of code, surprisingly) but
don’t have a client to test it with yet. Was going to create a PHP
client to test it too.