3670: scmtools repository unique constraint path:local prevents using mysql 5.5 with wide charsets
- WontFix
- Review Board
thom.******@gmai***** (Google Code) (Is this you? Claim this profile.) | |
Nov. 12, 2014 |
What version are you running? 2.1 alpha, head of master; Also present in 2.0 branch. What's the URL of the page containing the problem? ./manage.py syncdb What steps will reproduce the problem? 1. Create a blank mysql database in mysql 5.5 with a 4byte max width charset (utf8mb4 or utf32): `CREATE DATABASE review DEFAULT CHARACTER SET utf8mb4;` 2. Configure a new reviewboard site to use this db: DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'OPTIONS': {'charset': 'utf8mb4'}, 'NAME': 'review', } } 3. Run python reviewboard/manage.py syncdb. What is the expected output? What do you see instead? Expected: A new DB is successfully created. Actual: django.db.utils.OperationalError: (1071, 'Specified key was too long; max key length is 767 bytes') when creating the scmtools_repository table because of the unique_together ('archived_timestamp', 'path', 'local_site') constraint. What operating system are you using? What browser? Debian Wheezy, Mysql 5.5 Please provide any additional information below. There is already a TODO on this file saying this constraint causes problems for other things and should be eliminated: "# TODO: the path:local_site unique constraint causes problems when # archiving repositories. We should really remove this constraint from # the tables and enforce it in code whenever visible=True" Additional workarounds: Create the database as utf8, and convert all other tables to utf8mb4 when complete, and more convert the scmtools_repository table leaving `path` as a narrower column so it fits in to a unique constraint.
Hey Thom, That error isn't about the key length. Actually, that error is old and the specific case it was referring to was fixed long ago. We just never removed the comment. I don't think we can support utf32 or utf8mb4. Is there a reason plain ol' utf8 can't be used? The reason is that MySQL's design involving key lengths is completely busted. It does not care about the character length of fields. It will deal exclusively with byte length. That means that the more bytes required per character, the closer you are to hitting that error. We cannot ensure that our key lengths will work for anything higher than utf8.
> I don't think we can support utf32 or utf8mb4. Is there a reason plain ol' utf8 can't be used? Well in prevents using anything outside of the BMP. Things like emoji, mathematical symbols, musical symbols live on the SMP. Using utf8, it's impossible to store these characters in review requests, reviews, comments, etc. Since django doesn't let you be choosy about these things, the only way to enable them anywhere by default is to enable them everywhere by default. I suppose I can be pickier about which tables and columns we convert -- it might be unlikely that the path will contain these characters. Maybe I should really just get around to looking into converting our db to postgres...
Makes sense. I wish this wasn't a problem, or that Django provided a mechanism for letting you specify a max key length, but neither of those things are really in our control. The bytes-based key length issue has bitten us so many times in the past, and has been many other developers out there. Unfortunately, the MySQL guys don't seem to care and won't budge on the issue: http://bugs.mysql.com/bug.php?id=6604 I do believe there is a configuration argument for bumping up the key length on InnoDB, which may work for you, but the MySQL guys say you'll hit performance issues by doing so.
OK, well, feel free to drop as won't fix. We can work around. But I do suggest dropping your stale todo ;-).