• Chris Angelico's avatar
    objstrunicode: Basic implementation of unicode handling. · 64b468d8
    Chris Angelico authored
    Squashed commit of the following:
    
    commit 99dc21b67a895dc10d3c846bc158d27c839cee48
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Thu Jun 12 02:18:54 2014 +1000
    
        Optimize as per TODO (thanks Damien!)
    
    commit 5bf0153ecad8348443058d449d74504fc458fe51
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 08:42:06 2014 +1000
    
        Test a default (= UTF-8) encode and decode
    
    commit c962057ac340832c4fde60896f656a3fe3ad78a9
    Merge: e2c9782 195de324
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 05:23:03 2014 +1000
    
        Merge branch 'master' into unicode, resolving conflict on py/obj.h
    
    commit e2c9782a65eb57f481d441d40161de427e1940ba
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 05:05:57 2014 +1000
    
        More whitespace fixups
    
    commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 05:04:20 2014 +1000
    
        Properly implement string slicing
    
    commit 0d339a143e2b6442366145e7f3d64aada293eaa0
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 02:24:11 2014 +1000
    
        Support slicing in str_index_to_ptr, and fix a bounds error
    
    commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 02:10:22 2014 +1000
    
        Break out index-to-pointer calculation into a function
    
    commit 616c24ac014c3ca56008428c506034dd1bfff7a8
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 02:03:11 2014 +1000
    
        Add tests of string slicing, which currently fail
    
    commit a24d19f676fe8cc21dad512d91b826892e162a5b
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Tue Jun 10 01:56:53 2014 +1000
    
        Change string indexing to not precalculate the charlen, and add test for neg indexing
    
    commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 22:09:17 2014 +1000
    
        Clean up constant qstr declarations now that charlen isn't needed
    
    commit 5473e1a1dba2124b7b0c207f2964293cfbe80167
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 07:18:42 2014 +1000
    
        Remove the charlen field from strings, calculating it when required
    
    commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 07:11:27 2014 +1000
    
        Get rid of mp_obj_str_get_data_len() which was used in only one place
    
    commit a019ba968b4e8daf7f3674f63c5cc400e304c509
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 06:58:26 2014 +1000
    
        Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
    
    commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 06:32:44 2014 +1000
    
        Use utf8_get/next_char in building up a string's repr
    
    commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 06:10:45 2014 +1000
    
        Make utf8_get_char() and utf8_next_char() actually do what their names say
    
    commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sun Jun 8 02:10:59 2014 +1000
    
        Revert "Add PEP 393-flags to strings and stub usage."
    
        This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba.
    
    commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 15:41:48 2014 +1000
    
        Whitespace fixes
    
    commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 15:28:35 2014 +1000
    
        Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
    
    commit f1911f53d56da809c97b07245f5728a419e8fb30
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 11:56:02 2014 +1000
    
        Make chr() Unicode-aware
    
    commit f51ad737b48ac04c161197a4012821d50885c4c7
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 11:44:07 2014 +1000
    
        Make a string's repr Unicode-aware
    
    commit 01bd68684611585d437982dccdf05b33cbedc630
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 11:33:43 2014 +1000
    
        Expand the Unicode tests
    
    commit 7bc91904f899f8012089fc14a06495680a51e590
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 11:27:30 2014 +1000
    
        Record byte lengths for byte strings
    
    commit bb132120717cf176dcfb26f87fa309378f76ab5f
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 11:25:06 2014 +1000
    
        Make ord() Unicode-aware
    
    commit 03f0cbe9051b62192be97b59f84f63f9216668bf
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 10:24:35 2014 +1000
    
        Retain characters as UTF-8 encoded Unicode
    
    commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 08:37:27 2014 +1000
    
        Add support for \u and \U escapes, but not \N (with explanatory comment)
    
    commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Sat Jun 7 05:09:35 2014 +1000
    
        Add character length to qstr
    
    commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 13:48:36 2014 +1000
    
        Add test of UTF-8 encoded source file resulting in properly formed string
    
    commit 16429b81a8483cf25865ed11afd81a7d9c253c26
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 13:44:15 2014 +1000
    
        Make len(s) return character length (even though creation's still buggy)
    
    commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 13:15:36 2014 +1000
    
        HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
    
        All tests pass now, though string creation is still buggy.
    
    commit 47c234584d3358dfa6b4003d5e7264105d17b8f7
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 13:15:32 2014 +1000
    
        objstr: Record character length separately from byte length
    
        CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
    
    commit b0f41c72af27d3b361027146025877b3d7e8785c
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 05:37:36 2014 +1000
    
        Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
    
    commit 89452be641674601e9bfce86dc71c17c3140a6cf
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Fri Jun 6 05:28:47 2014 +1000
    
        Update comments - now aiming for UTF-8 rather than PEP 393 strings
    
    commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba
    Author: Chris Angelico <rosuav@gmail.com>
    Date:   Wed Jun 4 05:28:12 2014 +1000
    
        Add PEP 393-flags to strings and stub usage.
    
        The test suite all passes, but nothing has actually been changed.
    64b468d8
objstrunicode.c 68.3 KB