I recently started a diet. One thing I decided this time around was to be
heavily data driven in my weight loss, and therefore I decided to get some
‘smart’ weighing scales. I got the Eufy P1 scales which seemed to be reasonably
well reviewed, and it turns out that Eufy is another name for Anker, another
Chinese brand I’ve had success with in the past. I was a little irritated though
when I figured out there was absolutely no way to actually get your data out of
the app they provide. I wanted to get the data out of the app since the graphs
that the app provides were woefully inadequate, and to me were useless. This set me
on the path of building a pretty fun Goldberg-esque system to extract the data.
What it does
The whole process is started when I weigh myself in the morning. After I
conclude weighing myself, I take a screenshot of the screen which shows my stats
from that weighing session.
Next, I have Nextcloud set up locally on a home server, and my phone is
configured to automatically upload all screenshots to my Nextcloud instance. My home
server then will occasionally (via a cron job) upload the screenshots to my
server that hosts kn100.me.
Once the screenshot has found its way to my server, a Go service I wrote notices
the new file, and kicks into action. It immediately calls out to a Python script
which does some pretty crazy things which I will describe in more detail later,
and then returns a JSON representation of the data in the screenshot. The Go
program then takes that JSON from stdout (sorry), and writes it to a Sqlite DB,
and eventually the frontend will make a call out to an endpoint which returns
more different JSON for the graphs to display.
What about that Python script?
This is fun. My first attempt was completely written in Go. As you can see from
the screenshot above, there are metrics spread evenly throughout the image. What
I decided to do was to attempt to perform Optical Character Recognition on each
metric. I found Tesseract to be a pretty good candidate for OCR, but running the
entire screenshot through Tesseract was not an option, since the results of
that, were…not fantastic. Tesseract seems mostly optimised for parsing blocks
of text like pages from a book or receipts. Not so good for parsing freeform
screenshots with text in various colours.
Instead, I firstly wrote some code that given some measurements from a
screenshot, calculates where each metric is, and then crops out a square around
I then ran these through Tesseract, and got merely OK results. It seemed that
Tesseract wasn’t getting on with the fact the unit text was smaller than the
metric itself, and was occasionally confusing the last digit and the first part
of the unit text to be part of the same character.
I thought about it for a while, and decided I needed some way of trimming off
that unit from the end. All I cared about were the numbers, given I knew what
each number at each position meant. I firstly tried a Tesseract whitelist, which
would tell Tesseract only to look for certain characters. I set the whitelist to
only allow digits and periods. This helped a bit, but occasionally the unit text
would then be picked up as numbers, messing everything up. I really didn’t want
to get into writing my own bounding box logic however, so I thought about cheap
hacks to get rid of the smaller text to the right.
What I was doing in the beginning was silly. What I’d do is enormously gaussian
blur the image, increase the contrast, and then sharpen it. Because the smaller
details in the image (the unit text) would be less dark on the blurred image,
when I re-sharpened it, those details would mostly be gone. By fine tuning how
much I blurred the image along with how much brighter I made it, I was able to
achieve surprisingly acceptable results. This improved OCR accuracy
significantly. This approach had its own problems though, and would either
remove the periods, or the remnants of what was left of the unit text would be
interpreted as characters. I did try fudging it by attempting to parse the
output to add in the dot where it would likely be, and to trim trailing dots,
but still the accuracy wasn’t up to the standard I expected. Below is a visualisation of the blur, increase contrast, sharpen approach I was taking
For illustrative purposes I tried to reproduce the effect in Gimp,
but wasn’t as successful as I was in code. The above above was as good as I got.
You can see all four stages of the process here though. Firstly we blur the crap
out of the metric. Then, we jack the contrast way up, and then we sharpen. I was
actually able to get much better results programatically by fine tuning the
numbers. The results I got from the blur, increase contrast, and sharpen approach were far better than the above visualisation lets on.
I was at my wits end at this point, and in my frustration joined #tesseract on
Libera, an IRC network. I immediately asked my question, and had an absolutely
hilarious conversation with a very kind soul there about my problem. Why
hilarious you ask? Because #tesseract has absolutely nothing to do with the
OCR software, and instead is a community around an open source FPS game.
Amusingly, one of the members there still provided me with an absolutely genius
solution which almost completely solved my problem.
kn100: I’ve got images which look like this
https://nextcloud.kn100.me/s/8enRb4d9KQ2z2oE and all I want to do is detect the
numbers, not the ‘kcal’ portion. With a whitelist set to just numbers, it
occasionally detects the ‘kcal’ portion as numbers - I’m guessing Tesseract is
choking on the fact there is a size difference in the font. Any suggestions on
how I can configure Tesseract to better handle this? It’s not really an option
to crop off the kcal before feeding it to tesseract - without something smart
since it is unknown where exactly the numbers and the text appear in the image
@graphitemaster: has changed the topic to: Tesseract |
http://tesseract.gg | Open-Source FPS Game | First Edition (May 11, 2014)
released! | not tesseract-ocr | Calinou: we do OCR using pulse rifles here
@graphitemaster:kn100, Afraid you got the wrong channel. The topic
wasn’t up to date since the channel moved from Freenode to Liberachat. This is a
video game, not the OCR software. Though to answer your question, you could
preprocess the input with opencv in Python first to remove the kcal before
passing to Tesseract.
@graphitemaster: I’ve done some OCR stuff before to scrape codes from
still frames and had similar problems. I found it was easier to just preprocess
the input first with some CV to normalize and make Tesseract’s job easier.
Something like that, if you use cv2, transform to greyscale, then use
cv2.findContours, you’ll get actual bounding boxes around each character, you
can very quickly tell that some of the characters will have a smaller area
(maybe height is a better criteron) and you can just throw them out.
kn100:graphitemaster colour me surprised that I got help from you at all
then :D Thank you so much, and sorry for misunderstanding :P
@graphitemaster: Stick around, might be some code in a minute XD
kn100: that is rather hilarious though. I might have to give Tesseract a go!
Their suggested approach was to use OpenCV. With OpenCV, I could do something a
little more clever than just blurring and resharpening the image. Instead, I
could use OpenCV to to find contours in the image (the edges of areas of high
contrast), and measure the area of the bounding box around each contour. If a
given contours bounding box wasn’t more than half of the area of the largest
we’ve seen, we can throw away the pixels at that location. What was left were
just the characters of the metric, which could then be stitched together again.
This left me with just the numbers.
This was absolutely wonderful, and I immediately tried to port this approach to
Go. Problem is, (and I mean no disrespect to the OpenCV devs!), OpenCV is a
large beast. The lib that interfaces with it for Go was one called GoCV, and I
just couldn’t get it to work. It was definitely something I was doing wrong but
I was more interested in solving the problem at hand, so I quickly decided
to do all the data processing in a Python script, which I would call out to from
Go. It was a little messy, but whatever, it would work!
Onto Python, it worked great! There’s a package called opencv-python which
brings along some precompiled binaries, which worked for my use case. Upon
testing the approach that was kindly suggested to me in #tesseract, I realised
it only partially solved my problem, as it omitted periods in metrics. This was
easily rectifiable however, by adding a lower bound to the area of characters
that get thrown away. Again, by fine tuning, I was able to reliably extract just
the number portion of the screenshot.
Next up was the date parsing. This would be straight forward, but the EufyLife
app has a fairly insane date format that seems to arbitrarily change. Sometimes
days would have one digit, sometimes they’d have two. Sometimes hours would have
one digit, sometimes they’d have two. This led to me writing some fairly
horrific date parsing logic that I’m not proud of to get the date into a regular
format to be parsed.
Past this point, the OCR results were almost perfect, but still not 100%. Often,
the OCR would confuse the digit ‘5’ for ‘9’. What seemed to help a lot here was
’eroding’ the text to reduce how ‘bold’ it was, as well as introducing more
space between each character. After this, the output from Tesseract has been
entirely accurate for every metric I’ve checked.
Deploying it somewhere
One thing I was not prepared for is what a pain in the proverbials deploying
Python code is. I’ve gotten so used to compiling a Go binary and having it just
work, that I wasn’t prepared for deploying Python. Suffice to say, I ended up
containerising it, but for reasons I am currently not completely aware of, the
Docker container has to have a bunch of X11 related dependencies installed for
OpenCV to work. I know that I’ve got some learning to do here. Python is not a
language I’ve ever really used before, but writing Python was fairly enjoyable.
I can see why Data people seem to really like it. I might end up rewriting the
entire lot in Python, or I might end up trying harder to get OpenCV to work in
Go, but for now this solution means I can weigh myself in the morning, take a
screenshot, and about 20 seconds later it’s in a dashboard with graphs that
aren’t entirely useless. Result!
A message to Eufy
Please, please provide some easy way to access this data. I don’t care if it
gets exported as a CSV or if I have to query some API, just please give your
users their data on request. Your in app graphs are absolutely useless, and even
if they weren’t, why not just give this data up? For people like me who are a
bit insane, we like data. we collect stupid amounts about ourselves. Your
product becomes infinitely more valuable when you allow the user to extend it as
they see fit. I know you’ve got the data, you’re logging it to some form of API
somewhere, and even exporting it to Google Fit/Fitbit/Apple Health. Why not just
allow me to query that API? The data is mine, give it to me!