Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to "verify" your details, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Bug reports ➜ Column functions not utf8 compatible

Column functions not utf8 compatible

You need to log onto the forum to reply or create new threads.

  Refresh page


Posted by Fiendish   USA  (2,533 posts)  Bio   Global Moderator
Date Wed 31 Jul 2024 12:39 PM (UTC)

Amended on Fri 02 Aug 2024 12:28 PM (UTC) by Fiendish

Message
Column selection metrics appear to use bytes instead of columns, so GetSelectionStartColumn, GetSelectionEndColumn, and SetSelection all have incorrect column values for UTF8.

This means e.g. that selecting a single UTF8 character on a row of 3-byte characters would start at (index*3)+1 and end at startcol+3, etc instead of at the displayed positions.

https://github.com/fiendish/aardwolfclientpackage
Top

Posted by Nick Gammon   Australia  (23,046 posts)  Bio   Forum Administrator
Date Reply #1 on Wed 31 Jul 2024 09:40 PM (UTC)
Message
Hmm. They are using:


   pmyView->GetEditCtrl().GetSel(nStartChar, nEndChar);	


https://github.com/nickgammon/mushclient/blob/64e3670eaa08de05cac8decc8a66ca992356d831/scripting/methods/methods_info.cpp#L729

I'm not sure how the edit control behaves when it is dealing with UTF8, wrongly according to you.

What problems is this causing you?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Fiendish   USA  (2,533 posts)  Bio   Global Moderator
Date Reply #2 on Thu 01 Aug 2024 07:04 PM (UTC)

Amended on Thu 01 Aug 2024 07:21 PM (UTC) by Fiendish

Message
Quote:
What problems is this causing you?

You mean in a more practical sense than just the fact that the results are off?

I have a plugin that fakes split-screen output history scrolling with a miniwindow overlay. That plugin transfers selection back and forth so that the right text stays selected. But it fails when the selected text includes utf8 symbols, because the columns end up being wrong.

Here's a screen recording comparing text (working) and non-text (broken) selection:

https://imgur.com/a/iD75n8A

ChatGPT suggests rerieving the text from the edit control and computing the selection columns manually :-|


Straight out of chatgpt, completely untested:

#include <afxwin.h>
#include <string>

// Function to convert byte offset to character offset in a UTF-8 string
int ByteOffsetToCharOffset(const std::string& utf8Str, int byteOffset) {
    int charOffset = 0;
    int i = 0;

    while (i < byteOffset) {
        if ((utf8Str[i] & 0x80) == 0) {
            i += 1;
        } else if ((utf8Str[i] & 0xE0) == 0xC0) {
            i += 2;
        } else if ((utf8Str[i] & 0xF0) == 0xE0) {
            i += 3;
        } else if ((utf8Str[i] & 0xF8) == 0xF0) {
            i += 4;
        } else {
            i += 1;
        }
        charOffset++;
    }

    return charOffset;
}

// Function to convert character offset to byte offset in a UTF-8 string
int CharOffsetToByteOffset(const std::string& utf8Str, int charOffset) {
    int byteOffset = 0;
    int charCount = 0;

    while (charCount < charOffset && byteOffset < utf8Str.size()) {
        if ((utf8Str[byteOffset] & 0x80) == 0) {
            byteOffset += 1;
        } else if ((utf8Str[byteOffset] & 0xE0) == 0xC0) {
            byteOffset += 2;
        } else if ((utf8Str[byteOffset] & 0xF0) == 0xE0) {
            byteOffset += 3;
        } else if ((utf8Str[byteOffset] & 0xF8) == 0xF0) {
            byteOffset += 4;
        } else {
            byteOffset += 1;
        }
        charCount++;
    }

    return byteOffset;
}

// Function to get the corrected selection offsets (character offsets) from a CEdit control
void GetCorrectedSelection(CEdit& editCtrl, int& startCharOffset, int& endCharOffset) {
    CString text;
    editCtrl.GetWindowText(text);
    std::string utf8Str = CT2A(text, CP_UTF8);

    DWORD startByteOffset, endByteOffset;
    editCtrl.GetSel(startByteOffset, endByteOffset);

    startCharOffset = ByteOffsetToCharOffset(utf8Str, startByteOffset);
    endCharOffset = ByteOffsetToCharOffset(utf8Str, endByteOffset);
}

// Function to set the selection in a CEdit control using character offsets
void SetCorrectedSelection(CEdit& editCtrl, int startCharOffset, int endCharOffset) {
    CString text;
    editCtrl.GetWindowText(text);
    std::string utf8Str = CT2A(text, CP_UTF8);

    int startByteOffset = CharOffsetToByteOffset(utf8Str, startCharOffset);
    int endByteOffset = CharOffsetToByteOffset(utf8Str, endCharOffset);

    editCtrl.SetSel(startByteOffset, endByteOffset);
}

// Example usage in your message handler or wherever appropriate
void SomeFunction() {
    CEdit editCtrl; // Assuming this is properly initialized and points to your edit control
    int startCharOffset, endCharOffset;

    // Get the corrected selection
    GetCorrectedSelection(editCtrl, startCharOffset, endCharOffset);

    // Now startCharOffset and endCharOffset are the correct character offsets
    // For demonstration, let's set the selection back to the same offsets
    SetCorrectedSelection(editCtrl, startCharOffset, endCharOffset);
}


I might try it on the Lua side first.

https://github.com/fiendish/aardwolfclientpackage
Top

Posted by Nick Gammon   Australia  (23,046 posts)  Bio   Forum Administrator
Date Reply #3 on Fri 02 Aug 2024 03:16 AM (UTC)
Message
Quote:

You mean in a more practical sense than just the fact that the results are off?


Yeah.


Looking at your video, it appears to me you are selecting text in the output area, not the command area.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Fiendish   USA  (2,533 posts)  Bio   Global Moderator
Date Reply #4 on Fri 02 Aug 2024 12:29 PM (UTC)

Amended on Fri 02 Aug 2024 12:30 PM (UTC) by Fiendish

Message
Nick Gammon said:

Looking at your video, it appears to me you are selecting text in the output area, not the command area.

Right. Sorry. I've edited the original post to remove mention of GetInfo 236/237. I'm pretty sure the command area doesn't support utf8 anyway. I only care about the output area.

https://github.com/fiendish/aardwolfclientpackage
Top

Posted by Nick Gammon   Australia  (23,046 posts)  Bio   Forum Administrator
Date Reply #5 on Fri 02 Aug 2024 09:18 PM (UTC)
Message
There is code here to skip utf8 characters:

https://github.com/nickgammon/mushclient/blob/64e3670eaa08de05cac8decc8a66ca992356d831/mushview.cpp#L1918

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


771 views.

You need to log onto the forum to reply or create new threads.

  Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.