There is no function to get text from html plain text? ? ?

Category: Java SE
 
hecuichan52191
2012-09-25 04:26:41

Sponsored Links
I need to get text from html file content, there is no function to get text from html plain text? ? ? Haste! ! ! ! Thank you,

Sponsored Links

mengqing12
2012-09-25 04:40:09
java files can be used to read the stream.
then judge html flag Keywords: If the text is read.
html flag keywords, go check it (eg: <html> </html> <title> ...)
woainikak
2012-09-25 04:50:42
not recommend your own treatment, it is best to find Html parser
a suitable deal with their own if necessary, please use regular expressions to match
gl1509
2012-09-25 04:54:25
HTML parser you want to do, I have here, to give you EMAIL me, I sent letter to you.
but I have not tested.
s996277796
2012-09-25 05:00:12
I suggest that you look at:

import java.io. *;
import ; java.nio. *;
import java.nio.channels. *;

; public class worldheart {
public static void main (String args [])
throws IOException {

; / / check command-line arguments

if (args.length! = 2) {
System.err.println ("missing filenames");
System.exit (1);
;}

/ / get channels

FileInputStream fis =
new FileInputStream (args [0]);
FileOutputStream fos =
; new FileOutputStream (args [1]);
FileChannel fcin = fis.getChannel ();
FileChannel fcout = fos.getChannel ();

/ / allocate buffer

ByteBuffer buf =
; ByteBuffer.allocateDirect (8192);

/ / do copy

long size = fcin.size ();
long n = 0;
; while (n <size) {
; buf.clear ();
; if (fcin.read (buf) <0) {
; break;
;}
buf.flip ( );
n + = fcout.write (buf) ;
}

/ / finish up

; fcin.close ();
; fcout.close ();
fis.close ();
; fos.close ();
}
}
w899t
2012-09-25 05:08:52
see if it is this! ! ! ! ! ! ! ! !
import java.net. *;
import java.io. *;

public class GetHTML {
public static void main (String args []) {
if (args.length <1) {
System.out.println ("USAGE: java GetHTML httpaddress");
System.exit (1);
}
String sURLAddress = new String (args [0]);
URL url ; = null;
try {
url = new URL (sURLAddress);
} catch (MalformedURLException e) {
System.err.println (e.toString ());
; System.exit (1);
}
try {
; InputStream ins = url.openStream ();
BufferedReader breader = ; new BufferedReader (new InputStreamReader (ins));
; String info = breader.readLine ();
while (info! = null) {
; System.out.println (info);
info = breader.readLine ();
}
}
catch (IOException e) {
System.err.println (e.toString ());
; System.exit (1);
}
}
}
y5536967
2012-09-25 05:10:54
You use datainputstream, dataoutputstream,
pages on the specified URL to access the data stream, idea is certainly possible.
but my machine can not, can not be achieved.
I wish you success!
conan4d
2012-09-25 05:25:36
The above method is pretty good
more23rtghtyn
2012-09-25 05:39:30
there is no such function? I do not want to write a function to analyze the html content, I am afraid that if they write function is complicated because so many html tags, I have analyzed one by one, too much trouble
wzbise
2012-09-25 05:49:41
It seems that you have said the body should be just text, free format, so you just remove all <> beyond the contents of the can, not a one analysis, anyway, do not need to format.
xiaonulla
2012-09-25 06:05:11
tomxutomxu (shprog): Your understanding is good, but you said method does not work, because the text body weight may also include <>
parable:
<html>
<body>
<font size="5">
<hello world>
</font>
</body>
</html>
I want to get one of the <hello world>

========== ====
I get the message from the message body, and then analyze the text, if the plain text message body is very easy to handle, you can directly analyze large and sometimes outlook or web users are likely to be sent in html format, so I had to extract useful html format text
moon614
2012-09-25 06:20:22
Brothers, help out! ! ! ! ! 11
zxp_2010
2012-09-25 06:25:28
big trouble :)
PiscesV1
2012-09-25 06:44:31
Yes, think about it really troublesome ah, my dear friend who does not know there is no good way?
ckkern
2012-09-25 06:55:12
Actually, the most simple treatment is to remove the <> among the strings
but this simple treatment will inevitably leave some junk you do not expand it should be no problem
ppbslilei
2012-09-25 07:06:52
Upstairs brother means good ah,
but there are a lot of new cases to consider Luo:
for example:
<6> 3>
<I bought this << thinking in java >>!>
; ~ ~ ; ~ ~
y11032032
2012-09-25 07:20:21
Simple removal <> is certainly not among the contents, because the provisions are also included in plain text format <> And I also need to pass <> to identify the various sub-segments, using html parser may resolve, I look up information, Xianxie, and those who understand this aspect of the exhibitions
wangjinok
2012-09-25 07:28:54
I A few days ago people looking for a very simple, but it should be able to meet your needs
roadcases
2012-09-25 07:32:07
problem has not been fully resolved, please continue to provide advice, Xianxie
Domain and server ip had changed since 8/23/2013. Suspend the user registration and posts for program maintenance.