How do you do my dear reader? – Lately, due to unknown circumstances, I somehow became more popular among bots that share adult content such as porn pictures.
Like any human, I care about the time I spend on chores like this, so I don’t like the fact that I need to pay several minutes a week to parse and block them. I don’t want my friends or whoever clicked on the account to be trapped, fooled or something similar. Besides that, I don’t want to be indexed by Google where I hold any kind of relation to these pictures. – I am a married man with an adult-baby cat so would miss Bryant and miss Hart mind to fuck off?
I love creating automated solutions, but unlike Simone Giertz, I usually spend my time implementing software instead of robots.
Looking at the screenshot and comparing it to other similar bots what would you say? Can you see a pattern?
- These bots follow lots of accounts that won’t follow them back
- They have only one tweet
- They tend to have a few followers because yeah, there is a small percentage of people who would follow them back
On a more sophisticated level, they usually share nude photos that you can analyze with OpenCV and recognize as something you don’t want to see. However, keep in mind that algorithm can make a mistake.
Besides all that is said you can find out that they usually want you or your friends to click on the shortened Google link which leads you to their bloody website.
If you didn’t know, the website you open might use a security hole of your browser or lets you download a virus that later can be used to get the private information as credit cards or your passwords. Even though I bet these times are gone because there are less risky ways to benefit out of traffic – you never know. So in few words: Do not push your luck.
To have in mind: First, my intention was to find whether tweets contain the nude content and if so to block it. But afterwards, I have found a simpler way to do it, where I don’t need to go over hundreds of dick picks to train our OpenCV detector.
So there’s no need to invent anything extraordinary here. We just need to apply all the statements mentioned above and to conduct whether it is someone you want to block or not.
Fun fact: You may have a long list of users you want to go through. In that case, you’ll need to hold your next operations for 15 minutes.
So as we said, we are going to do that in an uncomplicated way. How? Well.
Let’s assume that the user that is older than 30 days would not be a bot because Twitter hopefully would not let them last for that long on the platform.
Forget about it. Unfortunately, it turned out they can live up to 3 and more months.
Let’s better concentrate on the fact that owners of these accounts are lazy. We can notice that practically they won’t spend their time to publish other content in the same profile more than once. Therefore, they have only one tweet.
What is the tweet about? They don’t care much about what kind of messages they share on the platform they want to get as much traffic as possible. So there is usually a shortened link, some adult picture and maybe dirty words to draw the attention of “non-expert”.
So back to our program, I don’t have anything special or tricky that would surprise you, but it’s still the chore we need to automate.
[gist https://gist.github.com/rudkovskyi/f75d47d203b99ea970ee78109daea2bd /]
To keep things simple, we check the count of tweets the user has. As described above we check if the amount is less or equal than N = 1. And then parsing all the tweets, we search for the tweets containing a link.
I checked the algorithm on a list of my followers, and abruptly it decided that one of them was a bot which wasn’t true. – There are always people who fall into the wrong category because they are a little bit different I thought.
It doesn’t look like the account I am hunting for has a tweet with a link. Moreover, he has only two tweets and two followers, he is probably not a person who shares something terrible.
I imagine it would be wise to consider the number of people that follow them back. Besides, let’s make it in percents instead of just numbers.
In an example with user @wojbal, we see that he follows 36 people and gets followed back by 2. Our formula is 2.0 / 36.0 * 100 = 5.555555555555555 which is quite high in comparison with bots which try to fool us.
[gist https://gist.github.com/rudkovskyi/f85fab510ce1567b7696c205e6a04cf6 / ]
Wait. But can’t we apply the formula for tweets?
As long as some of the users exist only for certain months, we calculate how many tweets user got a day rather than a year.
It’s entirely possible that there are users that don’t post as much as you. But my amount of tweets is basically 0.03321917808219178 a day which is quite low for 8 years but not as low as bots share.
So combining these two formulas, we can assume should we block the user or not and is it something we were looking for to achieve in this article.
[gist https://gist.github.com/rudkovskyi/641531da299e960951b91439221cf4b1 /]
We also have added a statement to check if the user’s account is not protected. Because we don’t care about them. In some circumstances, users have no tweets, because they have never used twitter so why to block them?
Hereafter, pay attention to escape_error method. This way we escape the error caused by Rate Limiting, besides we retry the block but inside the loop excluding the processed twitter accounts so in other words, we continue from the place we have been interrupted by limits.
P.S. If you tried this on your account let me know what was success rate! And if you liked the article, I would appreciate if you’d share it with your friends. Happy coding!