3740: uploading a 6.8MB diff over python API takes 30 seconds (220Kb/s). At least 100MB/s connection between computers

vlo****@gmai***** (Google Code) (Is this you? Claim this profile.)
June 1, 2015
What version are you running?
Reviewboard 2.0.12

What's the URL of the page containing the problem?
N/A

What steps will reproduce the problem?
diff = ... # 6.8MB diff in-memory
draft_diffs = review_draft.get_draft_diffs()   
draft_diffs.upload_diff(diff)

What is the expected output? What do you see instead?
I would expect this to take 680ms on a 100mb connection (or 68ms on the GigE connection I believe exists between my machines).

What operating system are you using? What browser?
Client: OSX, python2.7
Server: Ubuntu 12.04 LTS, python2.7 w/ memcached
chipx86
#1 chipx86
It may RBTools, but it may also be the processing of the diff and the file validation. Sounds like you've done measurements. Can you confirm this is all before the server does any processing?
  • +NeedInfo
#2 vlo****@gmai***** (Google Code) (Is this you? Claim this profile.)
I have not really done any measurements beyond looking at the length of the
diff I'm passing in and how long the review lard API to post takes.

How would I check if it's before the server does processing?

I believe the API waits until the server processes, so it could be that's
where it's spending it's time.  Maybe the validation isn't sufficiently
parallelized so there's a lot of overhead per-file?

I'm also looking into why I'm trying to post 6MB in the first place: a git
diff is 16kb.

-Vitali
chipx86
#3 chipx86
How are you generating the diff? It's probably diffing from origin/master..HEAD, and not from your desired starting point.

The server, after getting the diff, will start asking the repository for file existence on all the files in the diff, and validate them. That's likely what's taking the time you're seeing. A 6MB diff likely contains changes to quite a number of files, so this will take a while.

The validation is not parallelized. It needs to happen in one go, as there are relationships between files in the diff and parent diff that need to be checked, and we don't want to flood the repository with too many concurrent checks (given that there many be other users also trying to post at the same time), or the server can end up causing some timeouts or bad responses.

So as it is, it sounds like this isn't really a bug. The problem is more that you're getting a 6MB diff, which we can help with once I know how it's being generated.
david
#4 david
  • -NeedInfo
    +Incomplete