When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.

Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.

Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.

Which is what lead me to this question.

What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?

Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.

  • smiletolerantly@awful.systems
    link
    fedilink
    English
    arrow-up
    0
    ·
    26 days ago

    I am scratching my head here: why open up ports at all? It it just to avoid having to pay for a domain? The usual way to go about this is to only proxy 443 traffic to the intended host/vm/port based on the (sub) domain, and just drop everything else, including requests on 443 that do not match your subdomains.

    Granted, there are some services actually requiring open ports, but the majority don’t (and you mention a webserver, where we’re definitely back to: why open anything beyond 443?).

    • Lka1988@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      25 days ago

      I have one for beammp opened, but that machine is also on its own DMZ’d VLAN and only runs when I play BeamNG.Drive. Other than that, it’s just 443 to my reverse proxy.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      26 days ago

      My ISP blocks incoming data to common ports unless you get a business account. That’s why I used Cloudflare’s tunnel service initially. I changed my plans with the domain name I currently own and I don’t feel comfortable giving more power and data to an American Tech company so this is my alternative path.

      I use Caddy as my reverse proxy so I only have one uncommon port open. My plans changed from many people accessing my site to just me and very few select friends of mine which does not need a business account.

      • smiletolerantly@awful.systems
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        26 days ago

        My ISP blocks incoming data to common ports unless you get a business account.

        Oof, sorry, that sucks. I think you could still go the route I described though: For your domain example.com and example service myservice, listen on port :12345 and drop everything that isn’t requesting myservice.example.com:12345. Then forward the matching requests to your service’s actual port, e.g. 23456, which is closed to the internet.

        Edit: and just to clarify, for service otherservice, you do not need to open a second port; stick with the one, but in addition to myservice.example.com:12345, also accept requests for otherservice.example.com:12345, but proxy that to the (again, closed-to-the-internet) port :34567.

        The advantage here is that bots cannot guess from your ports what software you are running, and since caddy (or any of the mature reverse proxies) can be expected to be reasonably secure, I would not worry about bots being able to exploit the reverse proxy’s port. Bots also no longer have a direct line of communication to your services. In short, the routine of “let’s scan ports; ah, port x is open indicating use of service y; try automated exploit z” gets prevented.

        • confusedpuppy@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          26 days ago

          I think I am already doing that. My Kiwix docker container port is set to 127.0.0.1:8080:8080 and my reverse proxy is only open to port 12345 but will redirect kiwi.example. com:12345 to port 8080 on the local machine.

          I’ve learned that docker likes to manipulate iptables without any notice to other programs like UFW. I have to be specific in making sure docker containers only announce themselves to the local machine only.

          I’ve also used this guide to harden Caddy and adjusted that to my needs. I took the advice from another user and use wildcard domain certs instead of issuing certs for each sub domain, that way only the wildcard domain is visible when I search it up at https://crt.sh/ . That way I’m not advertising my sub domains that I am using.

          • smiletolerantly@awful.systems
            link
            fedilink
            English
            arrow-up
            0
            ·
            26 days ago

            TBH, it sounds like you have nothing to worry about then! Open ports aren’t really an issue in-and-on itself, they are problematic because the software listening on them might be vulnerable, and the (standard-) ports can provide knowledge about the nature pf the application, making it easier to target specific software with an exploit.

            Since a bot has no way of finding out what services you are running, they could only attack caddy - which I’d put down as a negligible danger.

            • confusedpuppy@lemmy.dbzer0.comOP
              link
              fedilink
              English
              arrow-up
              0
              ·
              26 days ago

              Yeah, a few weeks ago a achieved my state of “secure” for my server. I just happened to notice a dramatic decrease in activity and that’s what prompted this question that’s been sitting in the back of my mind for weeks now.

              I do think it’s important to talk about it though because there seems to be a lack of talk about security in general for self hosting. So many guides focus on getting services up and running as fast as possible but don’t give security much thought.

              I just so happened to have gained an interest for the security aspect of self hosting over hosting actual services. My risks for self hosting is extremely low so I’ve reached a point of diminishing returns on security but the mind is still curious and wants to know more.

              I might write up a guide/walkthrough of my setup in the future but that’s low priority. I have some other not self hosting related things I want to focus on first.

  • CameronDev@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    When I used to have SSH on a nonstandard port, I got login failures from bots. It really depends on the bot and how aggressive they have set it up.

  • CriticalMiss@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    Moving your port over to a nonstandard one is not a solution (unless the problem you experience is too many logs from sshd, and even then, logrotate exists), its security by obscurity which doesn’t really solve anything at all. Only way your server will be safe is by ensuring the packages on your server are up to date and that you harden it to the point where it isn’t too much of nuisance.

  • JASN_DE@feddit.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    There is no hiding in that sense. Bots will scan all IPs on all ports over time.

    Will it be less on nonstandard ports? Likely. Will it matter? Not really, the attack vectors would be exactly the same.

    Secure your systems and running on default or nonstandard ports won’t be an issue.

  • i_am_not_a_robot@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    Some attackers check services that have already cataloged the services you are running, even on uncommon ports. You won’t hear from them unless you are running a potentially vulnerable service.

  • derek@infosec.pub
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    You can meaningfully portscan the entire internet in a trivial amount of time. Security by obscurity doesn’t work. You just get blindsided. Switching to a non-standard port cleans the logs up because most of the background noise targets standard ports.

    It sounds like you’re doing alright so far. Trying not to get got is only part of the puzzle though. You also ought to have a backup and recovery strategy (one tactic is not a strategy). Figuring out how to turn worst-case scenarios into solvable annoyances instead of apocalypse is another (and almost equally as important). If you’re trying to increase your resiliency, and if your Disaster Recovery isn’t fully baked yet, then I’d toss effort that way.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      26 days ago

      Early when I was learning self hosting, I lost my work and progress a lot. Through all that I learned how to make a really solid backup/restore system that works consistently.

      Each device I own has it’s own local backup. I copy those backups to a partition on my computer dedicated to backups, and that partition gets copied again to an external SSD which can be disconnected. Restoring from external SSD to my Computer’s backup partition to each device all works to my liking. I feel quite confident with my setup. It took a lot of failure to gain that confidence.

      I also spent time hardening my system. I went through this Linux hardening guide and applied what I thought would be appropriate for my web facing server. Since the guide seems more for a personal computer (I think), the majority of it didn’t apply to my use case. I also use Alpine Linux so there was even less I could do for my system but it was still helpful in understanding how much effort it is to secure a computer.

      • derek@infosec.pub
        link
        fedilink
        English
        arrow-up
        0
        ·
        25 days ago

        That sounds pretty good to me for self-hosted services you’re running just for you and yours. The only addition I have on the DR front is implementing an off-site backup as well. I prefer restic for file-level backups, Proxmox Backup Server for image backups (clonezilla works in a pinch), and Backblaze B2 for off-site storage. They’re reliable and reasonably priced. If a third party service isn’t in the cards then get a second SSD and put it in a safety deposit box or bury it on the other side of town or something. Swap the two backup disks once a month.

        The point is to make sure you’re following the 3-2-1 principal. Three copies of your data. Two different storage mediums. One remote location (at least). If disaster strikes and your home disappears you want something to restore from rather than losing absolutely everything.

        Extending your current set up to ship the external SSD’s contents out to B2 would likely just be pointing rsync at your B2 bucket and scheduling a cron or systemd timer to run it.

        After that if you’re itching for more I’d suggest reading/watching some Red Team content like the stuff at hacker101 dot com and sans dot org. OWASP dot org is also building some neat educational tools. Getting a better understanding of the what and why around internet background noise and threat actor patterns is powerful.

        You could also play around with Wazuh if you want to launch straight into the Blue Team weeds. Education of the attacking side is essential for us to be effective as defenders but deeper learning anywhere across the spectrum is always a good thing. Standing up a full blown SIEM XDR, for free, offers a lot of education.

        P. S. I realize this is all tangential to your OP. I don’t care for the grizzled killjoys who chime in with “that’s dumb don’t do that” or similar, offer little helpful insight, and trot off arrogantly over the horizon on their high horse. I wanted to be sure I offered actionable suggestions for improvement and was tangibly helpful.

    • DarkAri@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      26 days ago

      Also doing basic things like running your webserver in a VM, and you can write some script or something to just block any IP that is port scanning I’m pretty sure. I would do that if I was hosting. Also remember to block port scanning in Firefox. It’s not enabled by default. This helps to keep you safe when you land on a scanning webpage.

      • derek@infosec.pub
        link
        fedilink
        English
        arrow-up
        0
        ·
        25 days ago

        Absolutely. VMs and Containers are the wise sysadmin’s friends. Instead of rolling my own ip blocker I use Fail2Ban on public-facing machines. It’s invaluable.

  • Cyberflunk@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    27 days ago

    Read up on shodan.io. bot networks and scrapers can use the database as a seed to find open ports.

    The cli massscan can (under reasonable conditions) scan the the entire ipv4 address space for a single port in 3 minutes. It would take an estimated 74 years for massscan to scan all 64k ports for the entire ipv4 network.

    So, using a seed like shodan, can compliment scanners/scrapers to isolate ip addresses to further recon.

    I honestly don’t know if this helps your question, I don’t actually know how services in general deal with nonstandard ports, but I’ve written a lot of scanning agents (not ai, old school agents) to recon for red/blue teams. I never started with raw internet guesses, I always used a seed. Shodan, or other scan results.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    Yes, they do. Most just search the common ports, but some scan all.

    Being tiny and obscure doesn’t mean they won’t find you, it might just take longer.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      26 days ago

      That’s been my main goal throughout securing my personal devices including my web facing server. To make things inconvenient as possible for potential outside interference. Even if it means simply wasting their time.

      With how complex computers and other electronic devices have become, I never expect anything I own to be 100% secure even if I take steps I think will make me secure.

      I’ve been on the internet long enough to have built a habit of obscuring my online or digital presence. It won’t save me but it makes me less or a target.

      • frongt@lemmy.zip
        link
        fedilink
        English
        arrow-up
        0
        ·
        26 days ago

        There’s no “wasting their time”. These attacks are all automated, not some guy sitting at a keyboard running stuff interactively.

  • A_norny_mousse@feddit.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    27 days ago

    There are a few very simple things that don’t improve security per se but help break the onslaught. One of them would be to not use standard ports for ssh etc. Another could be to use non-standard usernames (not “admin”). Or rename URLs from the standard “admin.php” or “/contact” to something else.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      26 days ago

      I use a different port for SSH, I also have use authorized keys. My SSHD is setup to only accept keys with no passwords and no keyboard input. Also when I run nmap on my server, the SSH port does not show up. I’ve never been too sure how hidden the SSH port is beyond the nmap scan but just assumed it would be discovered somehow if someone was determined enough.

      In the past month I did rename my devices and account names to things less obvious. I also took the suggestion from someone in this community and setup my TLS to use wildcard domain certs. That way my sub domains aren’t being advertised on the public list used by Certificate Authorities. I simply don’t use the base domain name anymore.

      • A_norny_mousse@feddit.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        26 days ago

        SSH keys are absolutely essential, but those are actual security as opposed to what I wrote above. I should’ve made that clearer.

        My SSHD is setup to only accept keys with no passwords and no keyboard input.

        I don’t see how that improves security. Surely an SSH key with an additional passphrase is more secure than one without.

        • confusedpuppy@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          26 days ago

          I agree with the last point, I only mentioned that because I don’t really know what other setting in my SSHD config is hiding my SSH port from nmap scans. That just happened to be the last change I remember doing before running an nmap scan again and finding my SSH port no longer showed up.

          Accessing SSH still works as expected with my keys and for my use case, I don’t believe I need an additional passphrase. Self hosting is just a hobby for me and I am very intentional with what I place on my web facing server.

          I want to be secure enough but I’m also very willing to unplug and walk away if I happen to catch unwanted attention.

          • A_norny_mousse@feddit.org
            link
            fedilink
            English
            arrow-up
            0
            ·
            26 days ago

            Sounds like a healthy attitude towards online security.

            I’m doing my first ever nmap scan right now, thanks for the inspiration. It’s taking a long time - either my ISP does not like what I’m doing there or I’m being too thorough - but it looks like it does not see my SSH port either.