Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ Bug reports
➜ DatebaseStep bug?
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
3
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Thu 10 Dec 2009 08:31 AM (UTC) |
| Message
| Hi Nick,
I am not sure there is a bug or not .
This is for DatebaseStep.
I have inserted into some data. like some chinese and encode them to utf8.
But when i try to select them from the database , it can't give me correct string.
for example. i insert a string "大家好" (this is a chinese string means "Hello").
The string length is 6. when i change it to 'utf8' the length wiil be 9.
The problem is when i get it from database use DatebaseStep, the string length changed to 8. it looks lose the last.
I am not sure the reason.
Please have a check.
Thanks.
Ron | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #1 on Thu 10 Dec 2009 07:50 PM (UTC) Amended on Thu 10 Dec 2009 08:08 PM (UTC) by Nick Gammon
|
| Message
| When I paste those three characters into the Game -> Test Trigger window using the "Insert Unicode" button (as decimal) I get the following UTF-8 codes:
\E5\A4\A7\E5\AE\B6\E5\A5\BD
I make that to be 9 bytes long (3 Chinese characters, namely 大家好). According to Google Translate that means "Hello everyone".
Now I'll try using that in DatabaseStep...
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #2 on Thu 10 Dec 2009 08:05 PM (UTC) |
| Message
| I can't reproduce your problem with DatabaseStep - perhaps it is something to do with Perl or the way you are converting the UTF-8?
Here is my test code (run in the Immediate window using Lua):
DatabaseOpen ("db", GetInfo (66) .. "utf8_test.db", 6)
DatabaseExec ("db", [[
DROP TABLE IF EXISTS test;
CREATE TABLE test(
name TEXT NOT NULL
);
]])
hello = utils.fromhex ("E5A4A7E5AEB6E5A5BD") -- utf8: 大家好
print ("length before add=", #hello)
-- insert a record
DatabasePrepare ("db", "INSERT INTO test (name) VALUES ('" .. hello .. "')")
DatabaseStep ("db")
DatabaseFinalize ("db")
-- prepare a query
DatabasePrepare ("db", "SELECT * from test")
-- execute to get the first row
rc = DatabaseStep ("db") -- read first row
-- now loop, displaying each row, and getting the next one
while rc == 100 do
values = DatabaseColumnValues ("db")
print ("length after add=", #values [1])
print ("name=", utils.tohex (values [1]))
rc = DatabaseStep ("db") -- read next row
end -- while loop
-- finished with the statement
DatabaseFinalize ("db")
DatabaseClose ("db") -- close it
Output is:
length before add= 9
length after add= 9
name= E5A4A7E5AEB6E5A5BD
As you can see I have sent in the 9 bytes of UTF-8 data, and after adding it, and retrieving it with a SELECT, I get 9 bytes back, and they are the same 9. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #3 on Thu 10 Dec 2009 08:11 PM (UTC) Amended on Thu 10 Dec 2009 08:12 PM (UTC) by Nick Gammon
|
| Message
| Another way of checking my UTF-8 encoding is to do this (in Lua):
/print (utils.tohex (utils.utf8encode ( { 22823, 23478, 22909 } )))
Results:
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #4 on Thu 10 Dec 2009 08:13 PM (UTC) |
| Message
|
Ron said:
The string length is 6. when i change it to 'utf8' the length will be 9.
I don't agree about the 6. Where does that figure come from? This may be part of your problem. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #5 on Fri 11 Dec 2009 09:04 AM (UTC) Amended on Fri 11 Dec 2009 01:02 PM (UTC) by Ron
|
| Message
| Hi Nick,
This is my test code:
use Encode;
DatabaseOpen ("db", GetInfo (66) . "utf8_test.db", 6);
DatabaseExec ("db", qq{
DROP TABLE IF EXISTS test;
CREATE TABLE test(
name TEXT NOT NULL
);
});
my $hello = "\xE5\xA4\xA7\xE5\xAE\xB6\xE5\xA5\xBD";
Note("length before insert " . length $hello);
DatabasePrepare ("db", "INSERT INTO test (name) VALUES ('" . $hello . "')");
Note("1 > " . DatabaseError("db"));
DatabaseStep ("db") ;
Note("2 > " . DatabaseError("db"));
DatabaseFinalize ("db");
Note("3 > " . DatabaseError("db"));
my $statue = DatabasePrepare("db", "SELECT * from test");
if ($statue != 0) {
Note("SQLERROR : " . DatabaseError("db"));
return -1;
}
my $rc = DatabaseStep("db");
my $values;
if ($rc == 100) {
$values = DatabaseColumnValues("db");
my $string;
for my $v(Win32::OLE::in $values){
printDec($v);
$string .= $v;
$string .= ', ';
}
Note("$string");
}
DatabaseFinalize("db");
DatabaseClose ("db");
###########################
sub printDec{
my ($string) = @_;
my $i;
Note("whole word length : " . length $string);
for my $char (split //,$string){
$i++;
Note("$i : ". ord $char);
}
}
On my local computer, it can't run correctly.
The output is
length before insert 9
1 > unrecognized token: "'澶у濂?)"
2 > unrecognized token: "'澶у濂?)"
3 > unrecognized token: "'澶у濂?)"
I have to change the line
DatabasePrepare ("db", "INSERT INTO test (name) VALUES ('$hello')");
to
DatabasePrepare ("db", "INSERT INTO test (name) VALUES (substr('" . $hello . "\\')");
or
DatabasePrepare ("db", "INSERT INTO test (name) VALUES (substr('$hello\\',1,length('$hello\\') - 1))");
For the first the output is
length before insert 9
1 > not an error
2 > unknown error
3 > not an error
whole word length : 10
1 : 229
2 : 164
3 : 167
4 : 229
5 : 174
6 : 182
7 : 229
8 : 165
9 : 189
10 : 92
The second output is
length before insert 9
1 > not an error
2 > unknown error
3 > not an error
whole word length : 8
1 : 229
2 : 164
3 : 167
4 : 229
5 : 174
6 : 182
7 : 229
8 : 165
| | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #6 on Fri 11 Dec 2009 10:53 AM (UTC) |
| Message
|
Nick Gammon said:
Ron said:
The string length is 6. when i change it to 'utf8' the length will be 9.
I don't agree about the 6. Where does that figure come from? This may be part of your problem.
In gbk (an encode for chinese char), one chinese char use 2 byte , so I said three chinese chars length is 6 | | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #7 on Fri 11 Dec 2009 11:56 AM (UTC) Amended on Fri 11 Dec 2009 12:00 PM (UTC) by Ron
|
| Message
|
Ron said:
The second output is
length before insert 9
1 > not an error
2 > unknown error
3 > not an error
whole word length : 8
1 : 229
2 : 164
3 : 167
4 : 229
5 : 174
6 : 182
7 : 229
8 : 165
For this case I tried to view the database file on another way. i used a GUI. In this way It shows correct string.
The GUI which I used is SQLiteSpy | | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #8 on Fri 11 Dec 2009 02:33 PM (UTC) |
| Message
|
Nick Gammon said:
I can't reproduce your problem with DatabaseStep - perhaps it is something to do with Perl or the way you are converting the UTF-8?
Here is my test code (run in the Immediate window using Lua):
Output is:
length before add= 9
length after add= 9
name= E5A4A7E5AEB6E5A5BD
As you can see I have sent in the 9 bytes of UTF-8 data, and after adding it, and retrieving it with a SELECT, I get 9 bytes back, and they are the same 9.
When i used your test code , I got another output :
length before add= 9
length after add= 8
name= E5A4A7E5AEB6E5A5
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #9 on Fri 11 Dec 2009 07:09 PM (UTC) |
| Message
| Let's start by understanding your results. In Lua, this code:
hello = utils.fromhex ("E5A4A7E5AEB6E5A5BD")
for i = 1, #hello do
local c = string.byte (hello:sub (i, i))
print (string.format ("%02X = %i", c, c))
end
produces this result:
E5 = 229
A4 = 164
A7 = 167
E5 = 229
AE = 174
B6 = 182
E5 = 229
A5 = 165
BD = 189
Also, 92 is a backslash: \
So, your results which gave those numbers, plus a backslash, seem almost right. The code to produce that was:
DatabasePrepare ("db", "INSERT INTO test (name) VALUES (substr('" . $hello . "\\')");
But, you have added a backslash after $hello, so you expect to see it in the results!
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #10 on Fri 11 Dec 2009 07:26 PM (UTC) |
| Message
| I installed ActivePerl from:
http://downloads.activestate.com/ActivePerl/Windows/5.10/ActivePerl-5.10.1.1006-MSWin32-x86-291086.msi
I then copied and pasted your example from the above post without changes, and ran it. I got:
length before insert 9
1 > not an error
2 > unknown error
3 > not an error
whole word length : 9
1 : 229
2 : 164
3 : 167
4 : 229
5 : 174
6 : 182
7 : 229
8 : 165
9 : 189
大家好,
So you can see it worked perfectly for me. I don't think it is a DatabaseStep bug, as you said yourself you could add the data ok and if you used SQLiteSpy it showed the correct string. DatabaseStep is used for adding as well as retrieving, so that function clearly works.
Since your problem seems to be retrieving the data, perhaps avoiding using a table would help.
If you call that, perhaps the results would be better. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #11 on Sat 12 Dec 2009 07:06 PM (UTC) |
| Message
| Hi Nick,
I got the correct string with the function "DatabaseColumnValue" .
Thanks for your help. :)
PS. Mushclient is the best mud client i used. :)
Ron
| | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #12 on Sat 12 Dec 2009 07:13 PM (UTC) |
| Message
| Unfortunately, for the first test string it work well. But for other string it met the same problem too.
function "DatabaseColumnValue" looks returned the correct number 9, but the last char is '0x00'.
Ron | | Top |
|
| Posted by
| Ron
China (15 posts) Bio
|
| Date
| Reply #13 on Sat 12 Dec 2009 07:18 PM (UTC) |
| Message
| For the first test string it is "230 184 172 232 169 166 228 184 128". This is the only one string which work correct in my test.
I am confused it worked well but for another string the last char is '0'. I guess the problem is the last char only can be from 1 - 128, if it is not in this range will auto-change to 0.
Please have a check.
Ron | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #14 on Sat 12 Dec 2009 07:55 PM (UTC) |
| Message
|
Ron said:
Unfortunately, for the first test string it work well. But for other string it met the same problem too.
function "DatabaseColumnValue" looks returned the correct number 9, but the last char is '0x00'.
Can you give an example of a bad string?
The function DatabaseColumnValue, if given a string, turns it into a null-terminated string. Thus it will always have a 0x00 at the end, however that is the terminator. That should not count as part of the length. UTF-8 never has the value 0x00 in its strings (except for the value 0x00 itself, the string terminator).
Other strings, which have a Unicode code point of 128 or above, always have the high-order bit set.
I suspect the problem lies in the way Perl is handling Unicode. However one thing you could try is change the type of the string in SQLite from TEXT to BLOB. The BLOB type says it never changes the data internally (eg. you can store images or sounds in it). This may possibly help.
However I think it is more likely Perl is to blame. If possible, I would use Lua, as my examples worked with Lua.
However if you want to keep using Perl you can Google "perl utf-8".
Some results I got were:
- "The use utf8 pragma tells the Perl parser to allow UTF-8 ..."
- "Unicode-processing issues in Perl and how to cope with it ..."
- "The Perl UTF-8 and utf8 Encoding Mess ..."
- "Lessons Learned with Perl and UTF-8"
And that was just the first few matches!
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
119,144 views.
This is page 1, subject is 3 pages long: 1 2
3
It is now over 60 days since the last post. This thread is closed.
Refresh page
top