Question about NPM on Linux Fundamentals module of Accademy

Q. Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.
steps 1
curl https://put given link > test.txt

step 2
cat test.txt | tr " " “\n” | cut -d “'” -f2 | cut -d ‘"’ -f2 | grep “put given link” > data.txt
step 3
open data.txt file in subl text and delete duplicates

Now click “Edit” > “Sort Lines” to sort the lines by value.
Now click “Edit” > “Permute Lines” > “Unique” to remove duplicate values.
Save the file
step 4
cat date.txt | wc -l

This is answer 34

Q. Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.
steps 1
curl https://put given link > test.txt

step 2
cat test.txt | tr " " “\n” | cut -d “'” -f2 | cut -d ‘"’ -f2 | grep “put given link” > data.txt
step 3
open data.txt file in subl text and delete duplicates

Now click “Edit” > “Sort Lines” to sort the lines by value.
Now click “Edit” > “Permute Lines” > “Unique” to remove duplicate values.
Save the file
step 4
cat date.txt | wc -l

answer= 34

1 Like

cat data.txt | sort | uniq | grep “https://www.inlanefreight.com” | wc -l

gives ans = 34

1 Like

curl https://www.inlanefreight.com > source_code.txt && cat source_code.txt | tr " " “\n” | cut -d “'” -f2 | cut -d ‘"’ -f2 | sort | uniq | grep “https://www.inlanefreight.com” | wc -l

Hi all. I can see this topic had been covered, but if you fancy an alternative approach, I used python script I wrote on the output of curl/wget. The script filters out all lines with target website and then cleans up the lines. Not very terminal based, and quick and dirty, but worked nonetheless.

  1. curl https://www.inlanefreight.com > inlanefreight.txt
  2. run with python:
dicLine = set()

file = "./inlanefreight.txt"
with open(file, "r") as inl:
    for line in inl:
        lineArr = line.split(" ")
        for item in lineArr:
            if 'https://www.inlanefreight.com/' in item:
                if "'https:" in item:
                    i = item.split("'")
                else:
                    i = item.split('"')
                dicLine.add(i[1])
for link in dicLine:
    print(link)
print(len(dicLine))

The best!

grep -Po “(?<=["'])https://www.inlanefreight.com/.*?(?=["'])”|sort -u | wc -l

I used this curl https://www.inlanefreight.com | grep -Po "(?<=[\"\'])https:\/\/www\.inlanefreight\.com\/.*?(?=[\"\'])" | sort | uniq | wc -l

1 Like

curl "https://www.inlanefreight.com" > htb.txt | cat htb.txt | tr " " "\n" | cut -d "'" -f2 | cut -d '"' -f2 | grep www.inlanefreight.com | sort -u | wc -l

  • curl DOMAIN_NAME > htb.txt - save HTML code to htb.txt file
  • | cat htb.txt - sends the content to STDIN
  • | tr " " "\n" - Replaces all with a newline
  • | cut -d "'" -f2 - cut 2nd field after single quote
  • | cut -d '"' -f2 - cut second field after double quote → had to do both cause links can be created with single or double quotes
  • | grep DOMAIN_NAME - filters out all lines containing the required domain
  • | sort -u - sorts the links, the u flag gets rid of all duplicates and returns the unique entries
  • | wc -l - basically count all lines

Amazing solution zikyfranky. Thank you. Could you please explain me how does the thinking process looks like when you’re making orders like this? How do you determine the sequence?

Mine looks like this (and please tell me why am I wrong):

  1. curl the domain, then put it in the text - now I have a bunch of text
  2. grep that domain so I can see where they are - okay I saw them with different variations
  3. I tried to find something in common (as many others in this group)
  4. the rest after just pure suffer

So basically I’m thinking kinda linear. I type a command and try to find out how it affects my results. But after I type the (tr " " “\n”) command (which I would never even consider to do and I have no idea why) I can only see a lot of gap in the text which would make me think: that’s not good.

And why is it “-f2” and not “-f1” after the quotes? The link comes right after the quote. Is the quote itself the “-f1”?

I hope you can understand the logic. After that maybe I could use the following cut commands… But without the “tr” command the whole process going sideways.

If you could give me any advice that would be great. Thank you.

yes please.
can you guide me how to do it.