Author Topic: About Sibiac  (Read 6388 times)

Offline azslow3

  • Administrator
  • Hero Member
  • *****
  • Posts: 1706
About Sibiac
« on: November 21, 2018, 07:00:45 PM »
Many applications, especially for music, for the reason I have analyzed in another post "Accessibility in applications, my point of view", are not accessible at all. They can not be controlled with keyboard and they do not expose any text to Screen Readers.

There are several concepts already used in general and application specific screen reader add-ons. It is possible to bind some keyboard shortcut to mouse clicking at specific position. It is possible till some degree find required position by analyzing colors on the screen. It is possible to grab a part of the screen and use OCR to extract text from it.

What I have not found is any attempt to combine all these methods for extracting sufficient information from the image and reconstruct complete interface element.
For example a button on the screen is some area with text or image, sometimes with color indication about its current state. Taking that colors, grabbing corresponding image area and feeding OCR, it should be possible to re-build the information available from accessible button directly. Clicking on that screen coordinates is the equivalent of pressing the button.  By scanning colors in a list box it should be possible to find the position of current active element, then grab it and feed OCR, click on it or on the next/previous element, imitating accessible list. And so on.

With all information about primitives it is possible to make accessibility layer just from the image. For any program, in ideal case even from real-time webcam video of the monitor. I mean completely independent from any accessibility level provided or not provided by the application and OS.

And so Sibiac is not just OCR wrapper. And it is not just point clicker. It tries to reconstruct complete accessibility information from the image.

While that sounds like a plan, that approach has two difficult aspects:
1. how to find user interface element in the image? In other words, how to understand where is a button and where is a list? While humans are smart and interface developers normally do not try to puzzle them, the answer on such questions are sometimes not trivial even for sighted experienced person. That is an interesting topic for image processing and pattern recognition students, yet I have not found published works in that direction (may be I have to search more). I have decided to bypass that for the moment. Fortunately, for music applications I currently target, the interface is mostly fixed dialogs. So looking  at the interface I extract that information manually and write it into particular layer definition.
2. how to work with particular control? Some controls, like mentioned button, are strait forward to operate. Other are way more tricky. Almost any new type of control is a challenge, but I am slowly moving on.

The package includes binary version of Tesseract OCR, compiled using GCC. Corresponding license files can be found in the tesseract subdirectory

Binary proxy library libsibiac.dll except CRC32 code is currently under Apache License 2.0, the source code can be provided on request

sibiact.exe is provided "as is" WITHOUT ANY WARRANTY and at the moment without the source code. It is not essential for this add-on operations but can be helpful during new applications definitions

All other files are covered by the GNU General Public License (Version 2).