4693: Long time loading diff chunks for interdiffs

alexG

What version are you running?

2.5.17

What's the URL of the page containing the problem?

/r/\d+/diff/\d-\d/

What steps will reproduce the problem?

  1. Reviewrequest with 8 or more diff revisions with 30 or more files
    If chunks are not in memcache, it will take about 1 min to load diff chunks fo 20 files (1 page).

  2. Move marker from 'orig' to 4. It will start downloading chunks for interdiff 4-8. Then move other marker to 5 (because you want diff 4-5). The first marker will move to 'orig'(or other position), and after 1 min will start downloading chunks for other interdiff, and it can never finish (evrything messed-up and you may need to type 4-5 in the link and refresh page o get what you want)

  3. When you want to expand changes, it again takes long time to load 20 files

What is the expected output? What do you see instead?

  1. for issue 1, suggest to calculate chunks at the beginng(when saving new diff in database) convert to json and save in the new table DIFF_CHUNKS with fields (requestID, diff1,diff2, filename, chunk). Need row for every interdiff combination. Than use the saved chunks to render them to the page. Downloading time will always be as it taken from memcache

  2. for issue 2, suggest do not send request to server on every move of the marker, but to have refresh button for downloading new chunks

  3. for issue 3, suggest to have expand button on every file-chunk html, and expand/collaps every file separately, using ajax request

What operating system are you using? What browser?

Ubuntu 14, 16, any browser

Please provide any additional information below.

If you agree for these 3 proposals, I can implement them

chipx86
#1 chipx86

Hi,

So first, let's address the slowdown. This is absolutely not normal behavior, and points to a problem communicating with the repository. What type of repository are you using? Can you make sure debug logging is enabled in Admin UI -> Logging, reproduce this behavior, and then look in reviewboard.log and look for the log timings for fetching files from the repository?

Also, what are the specs for the server you're running Review Board on?

As for the proposals:

  1. This would actually make the problem worse. We calculate chunks on-demand so that we don't have to do all of it in one HTTP request (which will definitely time out in your situation). We also don't want to store them permanently, as that will take up a lot of space and will cause problems as we fix or tweak the diff viewer.
  2. We wouldn't want to add an explicit button (as this is not normally an issue on installs, and people are used to the current behavior). However, Review Board 3.0 does make improvements to the interaction with the slider behind the scenes, being smarter about cancelling previous loads and starting new ones.
  3. I'm not opposed to having per-file full-expand buttons (we already allow for expanding individual chunks), but it should not matter in your case. Once the diffs have already been processed and put in memcached, an expand does not have to re-process all the diffs.

Is it possible your memcached isn't actually working?

  • -New
    +NeedInfo
alexG
#2 alexG

In the log, I see WARNINGs like "Generating diff chunks for self.filediff id ... took 8.655728 seconds" or "Generating diff file info for interdiffset ids ... took 24.726037 seconds".
But no warnings for "Fetching file ..". Usually, fetching takes 0.01-0.2 sec.
We will send you the specs for our ReviewBoard server, tomorrow.
Thank you very much for your help

alexG
#3 alexG

Our ReviewBoard server is on VM, and have the following:
CPU: 8, Memory: 24GB,Hard Disk: 469GB
we will increase memcached from 1024MB to 4096MB.

chipx86
#4 chipx86

Wow, 8 seconds and 24 seconds is way too long. That doesn't seem right at all for the specs you're listing. I'd expect < 1 second. How big are these files on the server?

What VM technology are you using? Are those specs the VM's or the underlying server's?

What else is running on that server?

Can you show me a span of the log file containing timings to help me get a complete picture?

alexG
#5 alexG

The server spec is for VM. It is probably some other staff running on this machine

alexG
#6 alexG

The log:

2018-05-01 18:25:49,020 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720833/ - root - Begin: Generating diff chunks for self.filediff id 720833 (File_name.h)
2018-05-01 18:25:49,138 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720833/ - root - End: Generating diff chunks for self.filediff id 720833 (File_name.h)
2018-05-01 18:25:49,139 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720833/ - root - Generating diff chunks for self.filediff id 720833 (File_name.h) took 0.118354 seconds
2018-05-01 18:25:49,259 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Begin: Generating diff file info for diffset id 32493, filediff 720834
2018-05-01 18:25:49,261 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - End: Generating diff file info for diffset id 32493, filediff 720834
2018-05-01 18:25:49,261 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Generating diff file info for diffset id 32493, filediff 720834 took 0.001912 seconds
2018-05-01 18:25:49,262 - DEBUG - - djblets.cache.backend - Cache miss for key diffviewer/diff_file_fragment.html-1-2-720834-collapsed-highlighting-en-1516673975.
2018-05-01 18:25:49,262 - DEBUG - - djblets.cache.backend - Cache miss for key diff-sidebyside-hl-720834-en.
2018-05-01 18:25:49,429 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Begin: Patching file File_name_AR9888v2.h
2018-05-01 18:25:49,485 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - End: Patching file File_name_AR9888v2.h
2018-05-01 18:25:49,485 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Patching file File_name_AR9888v2.h took 0.055174 seconds
2018-05-01 18:25:49,501 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Begin: Generating diff chunks for self.filediff id 720834 (File_name_AR9888v2.h)
2018-05-01 18:25:54,865 - DEBUG - - root - Begin: Generating diff file info for diffset id 28621, filediff 628965
2018-05-01 18:25:54,869 - DEBUG - - root - End: Generating diff file info for diffset id 28621, filediff 628965
2018-05-01 18:25:54,870 - DEBUG - - root - Generating diff file info for diffset id 28621, filediff 628965 took 0.003926 seconds
2018-05-01 18:25:55,082 - DEBUG - - root - Begin: Generating diff file info for diffset id 28621, filediff 628965
2018-05-01 18:25:55,085 - DEBUG - - root - End: Generating diff file info for diffset id 28621, filediff 628965
2018-05-01 18:25:55,085 - DEBUG - - root - Generating diff file info for diffset id 28621, filediff 628965 took 0.003231 seconds
2018-05-01 18:25:56,031 - WARNING - - root - Unable to store all filenames in the X-ReviewBoard-Diff-For headers when sending e-mail for review request 117665: The header size exceeds the limit of 8192. Remaining headers have been omitted.
2018-05-01 18:25:56,139 - INFO - - fortinet.email - reply email sent, time emailed:2018-05-01 18:25:56.139850+00:00, rqst_id:117665, review_id:84698, reply_id:93028, cc:cc_listI, to:zglei
2018-05-01 18:25:56,225 - DEBUG - None - kau - /r/119080/diff/ - root - Generating diff viewer page for filediff id 30058
2018-05-01 18:25:56,239 - DEBUG - None - kau - /r/119080/diff/ - root - Begin: Generating diff file info for diffset id 30058
2018-05-01 18:25:56,241 - DEBUG - None - kau - /r/119080/diff/ - root - End: Generating diff file info for diffset id 30058
2018-05-01 18:25:56,241 - DEBUG - None - kau - /r/119080/diff/ - root - Generating diff file info for diffset id 30058 took 0.001950 seconds
2018-05-01 18:25:56,244 - DEBUG - None - kau - /r/119080/diff/ - root - Done generating diff viewer page for filediff id 30058
2018-05-01 18:26:00,180 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - End: Generating diff chunks for self.filediff id 720834 (File_name_AR9888v2.h)
2018-05-01 18:26:00,180 - WARNING - None - zxqiu - /r/120588/diff/2/fragment/720834/ - root - Generating diff chunks for self.filediff id 720834 (File_name_AR9888v2.h) took 10.678755 seconds
2018-05-01 18:26:02,707 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - Begin: Generating diff file info for diffset id 32493, filediff 720835
2018-05-01 18:26:02,709 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - End: Generating diff file info for diffset id 32493, filediff 720835
2018-05-01 18:26:02,709 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - Generating diff file info for diffset id 32493, filediff 720835 took 0.001829 seconds
2018-05-01 18:26:02,710 - DEBUG - - djblets.cache.backend - Cache miss for key diffviewer/diff_file_fragment.html-2-2-720835-collapsed-highlighting-en-1516673975.
2018-05-01 18:26:02,710 - DEBUG - - djblets.cache.backend - Cache miss for key diff-sidebyside-hl-720835-en.
2018-05-01 18:26:02,714 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - Begin: Patching file File_name.h
2018-05-01 18:26:02,744 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - End: Patching file File_name.h
2018-05-01 18:26:02,744 - DEBUG - None - zxqiu - /r/120588/diff/2/fragment/720835/ - root - Patching file File_name.h took 0.029219 seconds

chipx86
#7 chipx86

Just to check, can you go into the Admin UI -> Diff Viewer and see what value, if any, is set for "Syntax highlighting threshold"?

alexG
#8 alexG

I found that the problematic file 'File_name_AR9888v2.h', has 13649 lines, and has a change almost in every line ( it is an array of 218321 hex numbers)

alexG
#9 alexG

I have another question-proposal: What if we do not load all chunks after page loaded, but load chunks for each one file after clicking file name on a list, and display only one file on a page. It will help user to easy see whole file list and select particular file, if there are 20+ files changed ( I see request with 260 files). Also if there are many big files, it will reduce long mouse scrolling. also it will eliminate the issue that we are discussing now

alexG
#10 alexG

the Syntax highlighting threshold is set to 0

chipx86
#11 chipx86

Try setting a line limit there (say, 8000 lines). That will turn off syntax highlighting if the file is too big, which should cut down on the time taken.

alexG
#12 alexG

Thank you very much. We set this limit, but we still see slow loading for this file, even chunks taken from memcashed. Looks like it caused by network transfer time of this big file

chipx86
#13 chipx86

At the point of the diff chunk generation, it should have already fetched the files, and is now just patching them, diffing into chunks, and generating HTML from them. Something is definitely going very wrong. Larger files than that take a lot less time on our servers with much lower specs, with large numbers of active users. I'm not sure what to point to, but we're probably getting to a point where the kind of diagnostics I'd need to do would require a support contract in place (custom builds and maybe a live session to test things on the server).

chipx86
#14 chipx86

It's worth noting that Review Board 3.0 introduces a number of improvements to the process, and might speed things up to some degree.