1. 26 Jun, 2014 4 commits
    • Paul Sokolovsky's avatar
    • Paul Sokolovsky's avatar
    • Chris Angelico's avatar
      objstrunicode: Basic implementation of unicode handling. · 64b468d8
      Chris Angelico authored
      Squashed commit of the following:
      
      commit 99dc21b67a895dc10d3c846bc158d27c839cee48
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Thu Jun 12 02:18:54 2014 +1000
      
          Optimize as per TODO (thanks Damien!)
      
      commit 5bf0153ecad8348443058d449d74504fc458fe51
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 08:42:06 2014 +1000
      
          Test a default (= UTF-8) encode and decode
      
      commit c962057ac340832c4fde60896f656a3fe3ad78a9
      Merge: e2c9782 195de324
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 05:23:03 2014 +1000
      
          Merge branch 'master' into unicode, resolving conflict on py/obj.h
      
      commit e2c9782a65eb57f481d441d40161de427e1940ba
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 05:05:57 2014 +1000
      
          More whitespace fixups
      
      commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 05:04:20 2014 +1000
      
          Properly implement string slicing
      
      commit 0d339a143e2b6442366145e7f3d64aada293eaa0
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 02:24:11 2014 +1000
      
          Support slicing in str_index_to_ptr, and fix a bounds error
      
      commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 02:10:22 2014 +1000
      
          Break out index-to-pointer calculation into a function
      
      commit 616c24ac014c3ca56008428c506034dd1bfff7a8
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 02:03:11 2014 +1000
      
          Add tests of string slicing, which currently fail
      
      commit a24d19f676fe8cc21dad512d91b826892e162a5b
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Tue Jun 10 01:56:53 2014 +1000
      
          Change string indexing to not precalculate the charlen, and add test for neg indexing
      
      commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 22:09:17 2014 +1000
      
          Clean up constant qstr declarations now that charlen isn't needed
      
      commit 5473e1a1dba2124b7b0c207f2964293cfbe80167
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 07:18:42 2014 +1000
      
          Remove the charlen field from strings, calculating it when required
      
      commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 07:11:27 2014 +1000
      
          Get rid of mp_obj_str_get_data_len() which was used in only one place
      
      commit a019ba968b4e8daf7f3674f63c5cc400e304c509
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 06:58:26 2014 +1000
      
          Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
      
      commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 06:32:44 2014 +1000
      
          Use utf8_get/next_char in building up a string's repr
      
      commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 06:10:45 2014 +1000
      
          Make utf8_get_char() and utf8_next_char() actually do what their names say
      
      commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sun Jun 8 02:10:59 2014 +1000
      
          Revert "Add PEP 393-flags to strings and stub usage."
      
          This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba.
      
      commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 15:41:48 2014 +1000
      
          Whitespace fixes
      
      commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 15:28:35 2014 +1000
      
          Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
      
      commit f1911f53d56da809c97b07245f5728a419e8fb30
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 11:56:02 2014 +1000
      
          Make chr() Unicode-aware
      
      commit f51ad737b48ac04c161197a4012821d50885c4c7
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 11:44:07 2014 +1000
      
          Make a string's repr Unicode-aware
      
      commit 01bd68684611585d437982dccdf05b33cbedc630
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 11:33:43 2014 +1000
      
          Expand the Unicode tests
      
      commit 7bc91904f899f8012089fc14a06495680a51e590
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 11:27:30 2014 +1000
      
          Record byte lengths for byte strings
      
      commit bb132120717cf176dcfb26f87fa309378f76ab5f
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 11:25:06 2014 +1000
      
          Make ord() Unicode-aware
      
      commit 03f0cbe9051b62192be97b59f84f63f9216668bf
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 10:24:35 2014 +1000
      
          Retain characters as UTF-8 encoded Unicode
      
      commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 08:37:27 2014 +1000
      
          Add support for \u and \U escapes, but not \N (with explanatory comment)
      
      commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Sat Jun 7 05:09:35 2014 +1000
      
          Add character length to qstr
      
      commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 13:48:36 2014 +1000
      
          Add test of UTF-8 encoded source file resulting in properly formed string
      
      commit 16429b81a8483cf25865ed11afd81a7d9c253c26
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 13:44:15 2014 +1000
      
          Make len(s) return character length (even though creation's still buggy)
      
      commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 13:15:36 2014 +1000
      
          HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
      
          All tests pass now, though string creation is still buggy.
      
      commit 47c234584d3358dfa6b4003d5e7264105d17b8f7
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 13:15:32 2014 +1000
      
          objstr: Record character length separately from byte length
      
          CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
      
      commit b0f41c72af27d3b361027146025877b3d7e8785c
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 05:37:36 2014 +1000
      
          Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
      
      commit 89452be641674601e9bfce86dc71c17c3140a6cf
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Fri Jun 6 05:28:47 2014 +1000
      
          Update comments - now aiming for UTF-8 rather than PEP 393 strings
      
      commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba
      Author: Chris Angelico <rosuav@gmail.com>
      Date:   Wed Jun 4 05:28:12 2014 +1000
      
          Add PEP 393-flags to strings and stub usage.
      
          The test suite all passes, but nothing has actually been changed.
      64b468d8
    • Paul Sokolovsky's avatar
  2. 14 Jun, 2014 1 commit
  3. 13 Jun, 2014 1 commit
    • Paul Sokolovsky's avatar
      objstr: Be 8-bit clean even for repr(). · 2ec38a17
      Paul Sokolovsky authored
      This will allow roughly the same behavior as Python3 for non-ASCII strings,
      for example, print("<phrase in non-Latin script>".split()) will print list
      of words, not weird hex dump (like Python2 behaves). (Of course, that it
      will print list of words, if there're "words" in that phrase at all, separated
      by ASCII-compatible whitespace; that surely won't apply to every human
      language in existence).
      2ec38a17
  4. 07 Jun, 2014 1 commit
  5. 06 Jun, 2014 1 commit
  6. 05 Jun, 2014 3 commits
  7. 04 Jun, 2014 1 commit
  8. 03 Jun, 2014 1 commit
  9. 01 Jun, 2014 2 commits
  10. 31 May, 2014 3 commits
  11. 30 May, 2014 2 commits
  12. 25 May, 2014 4 commits
  13. 24 May, 2014 4 commits
  14. 21 May, 2014 1 commit
  15. 15 May, 2014 3 commits
  16. 13 May, 2014 2 commits
  17. 11 May, 2014 6 commits