curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: 403 Forbidden Error

From: Timothe Litt <litt_at_acm.org>
Date: Fri, 17 May 2024 05:40:10 -0400

You might also consider whether circumventing the website's controls is
the right thing to do.  Here are a few considerations (it's not an
exhaustive list):

  * The site obviously doesn't want to be a distribution point for others
      o It may have content under license from 3rd parties
      o It may have bandwidth or capacity constraints
      o It may have to pay for bandwidth or capacity
      o It may depend on income from paying subscribers
      o There may be technical reasons that it password-protects user
        pages, but not images.
  * Do the site's terms of service allow what you're trying to do?
      o If allowed only by subscribers, are you one?
      o If a license is required, do you have one?
      o Lack of password protection isn't a license to take content
      o Having a password doesn't necessarily give you unrestricted rights
  * Will your use of the content you download be permitted under the
    site's TOS (and copyright)?
  * Will you (whether or not required by the TOS) credit the source in
    your use of the content?
  * If someone were to scrape your website in the same way, would you
    approve?  Would you be happy?

These questions apply to any downloads, but especially to those where
circumventing access controls is necessary.

Sometimes it's a perfectly reasonable thing to do: the controls may be
intended to block non-subscribers, robots, image theft, or or embedding
on another website.  But you're a licensed subscriber, the controls
interfere with your licensed uses, and the website operator can't or
won't help.

Other times, it's not.  Maybe your use falls into one of the
intentionally blocked categories, even without malicious intent. Or
maybe the website operator is applying a blanket policy without the
knowledge of the website's owner.

And sometimes it's not obvious, and more thought and/or research is
required to decide.

People put content on the web for many reasons, from the commercial, to
getting recognition, to a desire to anonymously give to the world.  To
do those things, most have license terms that express their intent and
may limit the legality and/or ethics of acquiring them for another use.
   Copyright may be implicit and not require an explicit notice. 
Licenses may be unrestricted (e.g. the MIT/BSD open source licenses),
come with conditions (e.g. GPL, Creative Commons), or be highly
restricted (many commercial licenses).  Compliance may be easy or hard;
free or expensive.

cURL is a tool.  You are responsible for how you use it.  But whenever
you have to do something to circumvent access controls, you should stop
and think: am I doing the right thing?

Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.

On 17-May-24 03:05, Hans Henrik Bergan via curl-users wrote:
> oops my bad, the httpS website is so broken that it won't allow images
> downloaded over httpS,
> thus it's important that you use http (not the https in the original command):
>
> wget --referer='http://gfaarchive.info' --mirror
> 'http://gfaarchive.info/gfaDisplay.php'
>
> PS it's over 10GB worth of images.
>
>
> On Fri, 17 May 2024 at 07:24, Hans Henrik Bergan
> <divinity76+curl_at_gmail.com> wrote:
>> That website is using a referer-check to block direct access to images.
>> This works:
>> curl -H 'Referer:http://gfaarchive.info/'
>> 'http://gfaarchive.info/gfa/gfacn32/20240517T0600_cldwx_012.png'
>> If you want to download the whole website, try wget mirror:
>> wget --no-check-certificate --referer='http://gfaarchive.info'
>> --mirror 'https://gfaarchive.info/gfaDisplay.php'
>>
>>
>>
>> On Fri, 17 May 2024 at 01:34, Ralph M via curl-users
>> <curl-users_at_lists.haxx.se> wrote:
>>> On Thu, May 16, 2024 at 6:15 PM Daniel Stenberg via curl-users<curl-users_at_lists.haxx.se> wrote:
>>>> On Thu, 16 May 2024, Geoff Sindel via curl-users wrote:
>>>>
>>>>> You are getting a 403. That is, you are not allowed to access that resource.
>>>>>
>>>>> Are you supplying any value for Authorization in the headers?
>>>> Missing or wrong authorization gives you a 401.
>>>>
>>> Going to that URL with a browser, I get this:
>>>
>>> 403
>>> Forbidden
>>> Access to this resource on the server is denied!
>>>
>>> There's probably a cookie and/or login information missing when just trying to pull the file. I had a similar problem once with a server that wanted me to collect a frames page first, followed by frame contents in a specific order, because each frame section returned a specific cookie. It wouldn't just let me collect the main frame, I had to get headers and sidebar first.
>>>
>>> OP will probably need to start at the main page and craft a script to drill down to the images in the same manner as doing it via a browser.
>>>
>>> Ralph Mitchell
>>>
>>> --
>>> Unsubscribe:https://lists.haxx.se/mailman/listinfo/curl-users
>>> Etiquette:https://curl.se/mail/etiquette.html

-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2024-05-17